Troubleshooting Common Ruby on Rails Errors in Production

Posted by By telliott on March 22, 2019

Once a new Rails app or a new feature for an existing app is “ready”—meaning that everything works as expected locally and all tests pass—it is moved to production, which brings a new set of problems. In this article, we’re going to explore a number of common issues that new Rails developers might face when deploying and running their apps in production, that result in server errors, missing resources, and even timeouts.

One of the greatest advantages of Ruby on Rails (RoR) is its focus on convention over configuration. RoR convention allows programmers—willing to play by the rule book—to develop a Rails application in significantly less time than other frameworks, as well as with significantly less code. How? Well, by lowering the number of decisions a programmer must make when building out their application.

However, even when you follow the rules, you will still run into unforeseen issues. You will need to have a process in place for identifying, investigating, and resolving issues in production. You can’t rely on serving static 40X and 50X pages with a “contact@support.com if the problem persists” prompt, or grepping error logs. In order to be proactive about handling infrastructure and application issues to avoid negatively impacting your customers, you must be able to aggregate log data, tail and search logs, and receive alerts.

*Example of application uptime monitoring dashboard in Pingdom*

Troubleshooting Errors in a Heroku App

We’ll be deploying a simple Rails 5.2 app to Heroku and hook up SolarWinds^® Papertrail^™ for log management and Pingdom^® for uptime monitoring. This app is very simple; a system for tracking Events that belong to Producers. This producer/consumer design pattern is quite common for event-driven architectures. The front end is simple because this pattern is used for back-end processing, but it allows us to demonstrate problems running this app in a production environment. The app includes a SQL database and several routes to display the events.

Detected sqlite3 Gem Which is Not Supported

When developing our app we seed the database with several dozen model records, and it works great on our development machine. However, when we deploy it to Heroku our build fails almost immediately. “It worked on my machine” is not a good excuse! Let’s take a look at that error message.

Feb 19 10:33:33 common-problems-ror app/api: Build failed -- check your build output: https://dashboard.heroku.com/apps/8eg95d03-3129-487a-9b20-dae9a3c1b1a9/activity/builds/35441564-2645-4792-6473-50a21aafe2c7

remote:  !     Failed to install gems via Bundler.
remote:  !     Detected sqlite3 gem which is not supported on Heroku:
remote:  !     https://devcenter.heroku.com/articles/sqlite3
remote:  !
remote:  !     Push rejected, failed to compile Ruby app.

Sqlite3 is the default database engine that’s initialized in new Rails apps. It’s a super lightweight engine that supports a large set of the SQL standard—though not all—and needs no additional components or programs to run. That said, its compact nature comes at a cost of functionality that makes it somewhat unsuitable for most production applications.

In development, an app can often get away with using sqlite3. Once that app is moved to a production server it’s encouraged, and required by Heroku, to switch over to a more robust database like MySQL, PostgreSQL, or Mongoid for NoSQL.

We’ll switch over to PostgreSQL with just a few steps:

1) Update your Gemfile by removing the gem sqlite3 and inserting Postgres (PG).

+ gem 'pg'
- gem 'sqlite3'

2) Run $ bundle install in your shell.

3) Update the database.yml file to use PostgreSQL as the database adapter.

 default: &default
     adapter: postgresql
     pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
     timeout: 5000   
 development:
     <<: *default
     database: db/development_db
   
 test:
     <<: *default
     database: db/test_db
   
 production:
     <<: *default
     database: db/production_db

Missing Encryption Key to Decrypt File

Now with PostgreSQL properly configured and our app once again ready to be deployed, we try pushing it to Heroku. Again, our push is rejected during the build process. This time, the error states that some type of encryption key is missing.

Feb 21 10:51:47 common-problems-ror   app/api: Build failed -- check your build output:   https://dashboard.heroku.com/apps/5ec95d03-9999-487a-9b20-da3c1b1a9/activity/builds/93ebeb70-33fb-2222-bb48-ff5ac493feeb   
   
   remote:        Missing encryption key   to decrypt file with. Ask your team for your master key and write it to   /tmp/build_1321bb4816c7881987346c96f54f5ef1/config/master.key or put it in   the ENV['RAILS_MASTER_KEY'].
   remote:
   remote:  !
   remote:  !     Precompiling assets failed.
   remote:  !
   remote:  !     Push rejected, failed to compile Ruby app.

Credentials—like an app’s secret_key_base—are stored in a config file called credentials.yml.enc. This file is sent with the rest of our app to the production server, where Rails expects to find a secret set as an environment variable called RAILS_MASTER_KEY. Using that key, it can safely decrypt and read the credentials.yml.enc file. That master key can be found in a Rails app at config/master.key located in the config folder.

The important thing to note here, whether you’re using a newer or older version of Rails, is that deploying your application doesn’t send along or set up the necessary environment variables. There are often manual steps that need to be performed to make sure that your production environment contains needed credentials, API tokens, and other static variables. In this situation, we’re able to read and set the RAILS_MASTER_KEY variable in Heroku with the following command:

$ heroku config:set RAILS_MASTER_KEY="$(<   config/master.key)"

The error message seen in here was very detailed. It’s not uncommon though to run into the same situation—missing environment variables—where the warning is not so straight forward; for example, using an SDK like Amazon Web Services (AWS) and not being able to authenticate requests in production. Before investigating the service, make sure to confirm the SDK has access to the same or another set of valid credentials on the production server.

HTTP 500 Error: We’re Sorry, But Something Went Wrong.

Attempting yet again to deploy the app, and this time the build succeeds. Excitedly, we visit our live app and are immediately greeted with an application error message that reads, “We’re sorry, but something went wrong”. Everything was working in development, so we’re not sure what’s happening in production. To find out, we decide to open Papertrail and search our logs for any request with a 500 status.

*Uncovering cause of 500 error using Papertrail.*

Looking at the trace, we quickly uncover that PG is raising an error complaining that the table events is undefined. Why? We forgot to run our migrations.

If running migrations is not automatically handled by the deployment process, manually running them is easy to forget. Simply running rails db:migrate on your server will bring the database schema up to date; however, it’s better to automate this and not have to rely on memory.

Adding the popular buildpack Ruby Rake Deploy Tasks to our app, we set the environment variable DEPLOY_TASKS to db:migrate. Now our rake tasks will run automatically on subsequent builds.

$ heroku buildpacks:add   https://github.com/gunpowderlabs/buildpack-ruby-rake-deploy-tasks
   
Buildpack added. Next release on common-problems-ror will use:
   ...
   
$   heroku config:set DEPLOY_TASKS='db:migrate'
   
Setting DEPLOY_TASKS and restarting app:
...
   
$   git push heroku master

HTTP 404 Error: Page Not Found

The app has been live for a while now. Not long after the launch, a public route was created at domain.com/pages/story and shared across multiple social media platforms. However, it was later decided that domain.com/pages/about has a better ring to it. So, the route and menu navigate links were all updated.

While reviewing the logs in Papertrail, we notice a few page not found (404) responses being logged. Searching the logs for any request with a 404 status, it becomes clear that there are a lot more than a few and all of them at the domain.com/pages/story path—the old endpoint. Leaving these requests unhandled is adding a lot of noise to the logs, and we want to clean that up.

*Searching for Page Not Found (404) errors using Papertrail*

Using catch-all routes for dynamic navigation can help handle these scenarios in a clean and concise manner. What’s important to remember in this situation is that people are trying to access a page that doesn’t exist, not a resource that doesn’t exist. Therefore, at the very end of our routes.rb file, let’s add a catch-all route for any GET requests, sending them to the page controller’s not_found method.

Rails.application.routes.draw do
  root 'pages#index'
  ... 
  # Last entry in routes.rb file
  match '*_missing_page', to: 'pages#not_found', via: :get
end

This will remove these 404 errors from our logs, as well as give us the ability to customize the not_found page within our application; not the default static 404.html page in the /public directory. Meanwhile, we’re not affecting scenarios where a missing resource is requested since a route would still match. For example, a request to /events/not_a_valid_id would match the event resource route /events/:id.

HTTP 503: Service Not Available

The app’s running great. Users love it. A suggestion is made that it would be nice to have more information about each event on the homepage. This seems like a great idea that would be valuable on multiple pages, so we created an event details partial. In the homepage template, we add the new partial to render each event within our iterator like so:

   <h1>Events</h1>
   <div>
     <ul>
       <%   @events.each do |event| %>
           <%=   render 'events/event_details', event: event %>
       <%   end %>
     </ul>
   </div>

A few hours after deploying the update, Pingdom sends an alert that the app is crashing. Navigating through their dashboard, we look at the uptime report. Sure enough, all the tests calls are responding 503 and theservice is unavailable.

*Responding to downtime alert and accessing report using Pingdom.*

Immediately we jump over to Papertrail and search for requests with a 503 status. We discovered that all requests made to our homepage are timing out. Additionally, each request is trailed and followed by a never-ending list of log messages that read:

...Rendered   events/_event_details.html.erb (0.2ms)
...Rendered   events/_event_details.html.erb (0.3ms)
...Rendered   events/_event_details.html.erb (0.4ms)
...Rendered   events/_event_details.html.erb (0.1ms)
...Rendered   events/_event_details.html.erb (0.3ms)

Every time Rails templates a view or partial, it has to open and close the file. Remember the partial that we placed inside the iterator? It created a situation where our server is having to perform—because there are a lot of events—thousands of file reads on every request. As a result, the time it’s taking to render the template is creating a backlog of requests that time out.

There are multiple ways to handle this situation. However, we’re going to use fragment caching to save the details of each event after it’s been rendered once. This way, we ensure our app isn’t repeating itself for work it has already done, and our homepage can once again render quickly.

<h1>Events</h1>
   
   <div>
     <ul>
       <%   @events.each do |event| %>
         <%   cache event do %>
           <%=   render 'events/event_details', event: event %>
         <%   end %>
       <%   end %>
     </ul>
   </div>

Error – Request Timeout

Again, the app’s running great. Users love it. A suggestion is made that being able to filter events by the producers they belong to would be nice. It’s a great idea. To implement it, we add some code to the relevant controller method so that if a producer_title param is present, only events belonging to that producer are returned.

 class PagesController <   ApplicationController
     def index
       @events = Event.all
   
       if params[:producer_title]
         @events.keep_if do |event| 
           event.producer.title == params[:producer_title]
         end
       end
     end
     ...
   end

Everything works! Additionally, several comprehensive specs are written and all of them pass with flying colors—actually, just green. Once again, the app is deployed and, within just a few hours after shipping the update, Pingdom alerts that requests are timing out.

*Tests requests from global servers returning 503 Service Unavailable errors in Pingdom.*

Back in Papertrail to investigate what’s happening, we find ourselves scrolling through a very, very long list of database queries. Looking deeper into the trace, we uncover that these transactions are being triggered when receiving a request using the new filter feature—like so:

...desc="Request timeout"   method=GET path="/?producer_title=awesome_event_thrower"

Similar to the templating error we resolved using fragment caching, the underlying problem is an expensive operation being executed within an indefinite loop. In this case, that expensive operation is occurring inside the keep_if iterator block. How? Every time event.producer.title == params[:producer_title] runs, the event’s producer record is queried from our database.

*Reviewing repetitive database queries from server logs in Papertrail.*

Rarely, if ever, should you rely on iterators to filter query results; instead, push yourself to design queries that only return the records that are needed. Doing so can significantly improve the speed at which data gets returned to the user. Not doing so can result in this exact situation that we are handling.

In order to fix the issue, we rewrite our page controller index method. Ensuring whether or not there is a search parameter, the expected records are returned in a single query.

       def index
       if params[:producer_title]
         @events = Event.includes(:producer).where(producers: { title: params[:producer_title] })
       else
         @events = Event.all
       end
     end
     ...

Wrap Up

It’s very common to experience issues in production that are unlikely to come up during development. For the most part, it’s difficult to anticipate such problems, especially when you’re constantly shipping new features. The complexity and constant change in the production environment makes it essential to have the right tools, like Papertrail, for monitoring logs and identifying issues early. The right tools can both alert you to issues before they impact users, and streamline problem solving so you can get back to working on the next app.