Speed up ActiveRecord with a little tweaking

Development

When you’re building a new Rails app, ActiveRecord’s defaults will take you far. Querying with .where, inserting with .save — it’s all so easy, and it’s fast enough.

But after a while — when a page of mostly simple content takes a second or more to come back from the server, when you start seeing 504 Gateway Timeout errors coming back from nginx because it’s taking too long to process the CSV you uploaded — that’s when you know you’re going to have to spend some time on performance.

You could solve a lot of these problems with caching. But that adds a whole extra layer of complication. Between expiration, nesting partials, and bugs that only reproduce in production, it’s a headache you don’t need right now.

Instead, you can spend some time fixing the most common performance problem I’ve seen in Rails apps: hitting your database too much.

Even if you’re running the database on the same machine, there’s a lot of connection overhead that’ll slow you down. And if your database is on another machine, fetching data that often will just destroy you.

But you don’t have to go too far from the simplicity of Rails to see drastic improvements in your app’s response time.

Grab all the data at once in ActiveRecord

If you look at the logs in an unoptimized app, they’ll probably look like this:


Processing by RestaurantsController#index as HTML
  Restaurant Load (1.6ms)  SELECT `restaurants`.* FROM `restaurants`
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 1
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 2
  Review Load (1.1ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 3
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 4
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 5
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 6
  Review Load (1.2ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 7
  Review Load (1.0ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 8
  Review Load (1.0ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 9
  Review Load (1.0ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` = 10

You’re trying to find ten restaurants along with with their reviews, and you’re doing eleven SQL calls!

This is called the “N+1 query problem”: for every restaurant, you’re doing one query for the restaurant data, plus one query for each of their associated reviews. You can probably imagine how bad it becomes the deeper you go. Imagine if you also wanted to grab each restaurant’s address, as well as each address’ phone number.

You’ll run into this problem when you loop over a list of objects and try to query their associations:

app/views/restaurants/index.html.erb


<% @restaurants.each do |restaurant| %>
  <tr>
    <td><%= restaurant.name %></td>
    <td><%= restaurant.review_average %></td>
    ...

You don’t need to hit the database N+1 times. You want to hit it at most twice: once for the restaurants you’re trying to find, and once for all of the reviews associated with all of those restaurants.

This is called “eager loading,” and you can do it really easily with .includes:

app/controllers/restaurants_controller.rb


def index
  @restaurants = Restaurant.all.includes(:reviews)
end

Or, if you want to do something more complicated, like preload all the addresses and the reviews’ authors:

app/controllers/restaurants_controller.rb


def index
  @restaurants = Restaurant.all.includes([{:reviews => author}, :address])
end

You have to specify the associations you want to preload, using that array and hash syntax. Rails will do the best it can at consolidating down those calls:


Restaurant Load (1.2ms)  SELECT `restaurants`.* FROM `restaurants`
Review Load (3.0ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`restaurant_id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Much better.

If you’re not the one fetching the parent records, it’s a little more complicated. You have to do the preloading on your own, using ActiveRecord::Associations::Preloader:


ActiveRecord::Associations::Preloader.new.preload(@restaurants, [:reviews])

But it mostly works the same way.

The N+1 problem is easy to fix, and it’s pretty easy to avoid once you know the pattern. But it’s also easy to miss, especially if your code is spread out across a number of partials.

The bullet gem will try to automatically detect N+1 queries as they happen, so you can chase them down and fix them. And if you’re using skylight.io to monitor your app’s performance in production, it will also detect and report them to you.

A little tip about preloading

Some apps have different ways to show the same data on the same page. Maybe you want to show addresses ordered by distance in one section, and you only want to show addresses that have associated phone numbers on the other.

But there’s no point in eager loading both of these associations. It’s the same data, just presented in a different way. So for these, I usually create an unsorted_unfiltered_addresses association and preload that:

app/models/restaurant.rb


has_many :unsorted_unfiltered_addresses, :class_name => "Address"

Restaurant.includes(:unsorted_unfiltered_addresses)

Then, I’ll do the rest of the sorting and filtering in Ruby code:


def addresses_sorted_by_distance(point)
  unsorted_unfiltered_addresses.sort_by do |address|
    address.distance_from(point)
  end
end

def addresses_with_phone_numbers
  unsorted_unfiltered_addresses.select do |address|
    address.phone_number.present?
  end
end

It saves a round trip but makes your Ruby code a little more complex. For us at Avvo, it usually helps, but it’s worth benchmarking to make sure it’s worth doing.

Crafting your own SQL in ActiveRecord

What about grabbing the number of reviews for a restaurant? Or the review average or the date of the earliest review? If you don’t need any of the review data, calling all of the review objects just to calculate that stuff is a waste, especially since the database is great at doing that work for you.

Instead, use the select method to do those aggregations for you as you query:

app/controllers/restaurants_controller.rb


@restaurants = Restaurant.all
  .select("restaurants.*, AVG(reviews.rating) AS review_average")
  .joins(:reviews)
  .group("restaurants.id") 

Using SQL aggregations, you can easily calculate that information before it gets to your app. When you use AS in your select, Rails will magically create the right attributes for you:


@restaurants.first.review_average # => 2.3

They just show up as one-off properties on your model.

Going even further, you can drop entirely into SQL to save yourself even more calls. For example, what if you wanted to find only the first ten restaurants that had at least ten reviews in the last six months, along with a count of all those reviews?

You could do this with Rails:


@restaurants = Restaurant.all
  .select("restaurants.*, COUNT(reviews.id) AS review_count")
  .joins(:reviews)
  .group("restaurants.id")
  .where("reviews.created_at > ?", 3.months.ago)
  .having("COUNT(reviews.id) > 10")
  .limit(10)

However, as a query gets more complicated, I tend to use find_by_sql, so I don’t have to remember the quirks of the querying methods:


@restaurants = Restaurant.find_by_sql(["
  SELECT  restaurants.*, COUNT(reviews.id) AS review_count
    FROM `restaurants`
      INNER JOIN `reviews` ON `reviews`.`restaurant_id` = `restaurants`.`id`
    WHERE (reviews.created_at > ?)
    GROUP BY restaurants.id
    HAVING COUNT(reviews.id) > 10
    LIMIT 10", 3.months.ago])

And just like before, Rails will create attributes for all of the columns you SELECT.

Using SQL like this is its own skill. It’s something you’ll have to study before you get great at it. But SQL is helpful for so many complicated data-processing situations. It’s worth spending some time to learn it.

find_by_sql isn’t used too often. In our pretty large codebase at Avvo, we use it fewer than five times. But in the right place, when you really care about how you find your data, it works extremely well.

Of course, having SQL strings littered throughout your codebase isn’t always the most maintainable way to do things. If you find yourself writing the same kind of aggregations and selects over and over again, you could generate a SQL view.

So, let’s revisit my earlier example about restaurants. If you frequently also want a restaurant’s review average and review count, you could create a view with those extra columns:

db/migrate/20150501213118_create_restaurant_with_stats_view.rb


class CreateEnhancedRestaurantsView < ActiveRecord::Migration
  def up
    execute "
      CREATE VIEW enhanced_restaurants AS
      SELECT restaurants.*, count(reviews.id) AS review_count, avg(reviews.rating) AS average_rating
      FROM `restaurants`
      INNER JOIN `reviews` ON `reviews`.`restaurant_id` = `restaurants`.`id`
      GROUP BY restaurants.id"
  end

  def down
    execute "DROP VIEW enhanced_restaurants"
  end
end

Wrap it in an ActiveRecord model:

app/models/enhanced_restaurant.rb


class EnhancedRestaurant < ActiveRecord::Base
  self.primary_key = :id  
end

and use it!


$ EnhancedRestaurant.first
EnhancedRestaurant Load (1.2ms)  SELECT  `enhanced_restaurants`.* FROM `enhanced_restaurants` LIMIT 1
=> #<EnhancedRestaurant id: nil, name: "Judd's Pub", created_at: "2015-04-26 03:33:21", updated_at: "2015-04-26 03:33:21", review_count: 14, average_rating: #<BigDecimal:7ff3180238d0,'0.29286E1',18(18)>>

The view works just like a SQL table, and it’s automatically kept up to date as you change the data it depends on. All of your complex SQL code is in a migration, and you can treat the object backed by a SQL view as if it’s any other ActiveRecord object. Well, almost:


$ EnhancedRestaurant.find(10).update(name: "New Name")
ActiveRecord::StatementInvalid: Mysql2::Error: The target table enhanced_restaurants of the UPDATE is not updatable: UPDATE `enhanced_restaurants` SET `name` = 'New Name' WHERE `enhanced_restaurants`.`id` = 10

Updating data through a view usually won’t work out of the box.

If that’s too much SQL for you, you’re in luck. Now that Arel is part of Rails, you don’t have to write any of your own SQL. If you want, you can write Ruby code to generate almost any SQL queries you need. So you could turn that find_by_sql query into this:


reviews = Review.arel_table 
restaurants = Restaurant.arel_table
    
sql = restaurants.project(Arel.star, reviews[:id].count.as("review_count"))
 .join(reviews, Arel::Nodes::InnerJoin)
 .on(restaurants[:id].eq(reviews[:restaurant_id]))
 .where(reviews[:created_at].gt(3.months.ago))
 .group(restaurants[:id]) .having(reviews[:id].count.gt(10))
 .take(10)
 .to_sql
    
@restaurants = Restaurant.find_by_sql(sql)

To me, that’s not as easy to read as the SQL. But because it’s Ruby code, you can manipulate it more easily than a string of raw SQL, which can be very handy.

If you want to learn more about Arel, Jiří Pospíšil has a great guide to using Arel to generate and organize some even crazier queries.

Try Codeship – The simplest Continuous Delivery service out there.

Create tons of records, all at the same time

Between includes, select, and find_by_sql, you should be able to improve the performance of even your most data-filled pages. But many apps, especially as they get bigger, don’t just select data — they also process it.

Maybe you get CSV files from a data-processing platform that you regularly import into your own database. Or maybe you want to create a table, based on data you already have in your database but structured differently to make it faster to look things up.

Whatever the problem, you have to get that data into your database. And calling create a half million times is SLOW. If you’re inserting objects with associations, you’ll also run into the N+1 problem.

At Avvo, we’ve seen this problem a few times. And when we do, we handle it with Zach Dennis’ activerecord-import gem.

The gem makes importing and updating data in bulk incredibly easy. For example, this simple import will do 10 SQL calls:


# Probably loaded from a CSV or API
pricing_data = [
 ["New York", 130],
 ["Los Angeles", 130],
 ["Chicago", 120],
 ["Miami", 110],
 ["Dallas", 110],
 ["Seattle", 100],
 ["San Francisco", 150],
 ["Denver", 80],
 ["Boston", 120],
 ["Washington, D.C.", 100]]
    
pricing_data.each do |location, price|
 Inventory.create(location: location, price: price) 
end 

With activerecord-import, you’d just add your records to an array and call import:


records_to_import = pricing_data.map do |location, price|
 Inventory.new(location: location, price: price) 
end
    
Inventory.import records_to_import 

activerecord-import will figure out the best way to get that data in the database. Usually, it only takes a single SQL call.

If you’re just parsing, modifying, and importing CSV data, you don’t even need an object. You can bulk insert arrays of columns and values:


Inventory.import [:location, :price], pricing_data

This can be a lot more convenient. Importing data using activerecord-import has led us to order-of-magnitude improvements in speed:


                                  user     system      total        real
with activerecord-import      2.850000   0.050000   2.900000 (  2.989576)
without activerecord-import  16.030000   6.330000  22.360000 ( 34.894255)

Processes that used to take hours to run now take minutes.

Vanilla Rails will take you a long way. But in every app, there are places where speed is a big deal. And when you find those places, reducing SQL calls is an easy place to start optimizing. Each SQL call might take a little longer. But in my experience, it’s always been a drastic improvement.

Ready for further software development speed improvements? Get started with Codeship!

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles.
Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.



We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.

  • dougc84

    Why use activerecord-import at all? Using #create, given an array of hashes, will create multiple records for you.

    • A few reasons:

      #create, with an array of hashes, will do a separate INSERT statement per hash you pass it. #import will do one INSERT statement which will insert all the records at once.

      Also, if you have invalid records, activerecord-import will tell you specifically which ones failed. With #create, you have to iterate through all of the possibly created instances and look for errors or nil ids.

      activerecord-import will handle nested data well — it will often do one INSERT per class, rather than one per record.

      Finally, and this is more of a matter of opinion than anything else, #import feels better when importing a bunch of data from a CSV file or data dump. It’s often easier to mangle data to be a list of columns + a bunch of arrays of data rather than a bunch of records, or even a bunch of hashes.

      • Oleg Keene

        In addition, it’s not creating an instances of inserted records

  • Awesome article! I’ve learned a couple of new tricks here, thanks for sharing @justinweiss83:disqus!

  • Philippe Van Eerdenbrugghe

    Eager loading in Rails is very often a solution which we have to put in a wrong place related to where to problem will be caused. Most of the time the N+1 problem will be caused in the rendering step and the eager loading must be setup in the setup step (I treat most of my request in a 3 steps workflow : setup, processing, rendering )

    I don’t like it but I must admit it’s a lesser evil than keeping N+1 requests.

    Most of the time though, I’ve found I could use the goldiloader gem ( https://github.com/salsify/goldiloader ) to lazily-auto-preload what I will need when I will browser my ActiveRecord::Relation without having to do it during the setup.

    As with any lazy strategy , it also allows me to defer the choice of what I want to preload until I know it which I might not know during the setup step : just imagine that I have to display additional info if current user is admin. This is clearly presentation which I do not want to know when I initially setup the resource collection.

    • Whoa. That gem looks amazing. How well has it worked for you? Anything someone using it should watch out for?

      I totally agree with you about eager loading seeming awkward — it’d be nice to have an API that fits in the right place.

      • Philippe Van Eerdenbrugghe

        First of all sorry for my very poor english, I realize how difficult it is to read my previous comment.

        That gem has worked very well for me in the vast majority ( I would say more than 90% ) of the cases.

        The two cases when it didn’t work well were :
        – when the first item of the collection has less/different associations to preload than the rest of the collection. This case is very rare and is actually well documented. Basically, just disable goldiloader when it happens and preload manually what you want.
        – when I have to preload a tree and associations of every node of the tree. This is by far the worse eager loading problem I have encountered with activerecord because relational data is meant to be represented as table and not tree. There are different gems using different techniques to solve this problem but at the end of the day none of them allow to easily grab the following collection in a way that will not require another hit to the DB when I display it : “I want all the product matching condition x, and for each of them I want to preload their full category hierarchy (I imagine a product is bound to a category node having indefinite levels of parent category) and each category node must also preload its tags association”
        Goldiloader does not resolve that problem neither so for those cases I often find myself disabling goldiloader and use the specific methods proposed by the gem used to handle activerecord trees

  • Eric Chernuka

    Great post. I’ll definitely be using some of this in my current project.

  • Fantastic write up @justinweiss83:disqus! Thanks for sharing.

  • zx1986

    nice tips!

  • David Runger

    Great article. :) FYI, I saw a couple typos. First, `Restaurant.all.includes([{:reviews => author}, :address])` is missing a colon at the beginning of `author`. Second, a couple of your code examples use `3.months.ago`, but the prose lead-up mentions a time period of 6 months.

    I have been subscribed to your newsletter for a few weeks now, and I’m loving the content. Thanks! Keep it coming! :)

  • Doel

    Loved the entire piece. Thanks a lot for sharing such great Rails articles, put simply and explained thougrolly.

  • GuskiS

    Hey, nice article. I have been strugling with these issues on my project too, one task at a time :)
    By the way, what I found usefull was to pre load all data and then do some variable checking in Ruby side. For example, using .where(type: some_type) when some_type can range from 1-10. Instead of querieng for each type, I query all and then use .select or something like that to get right type in needed situation.

  • Matt Campbell

    What do you mean by this in the first part of the article:

    If you’re not the one fetching the parent records…

    Can you give an example?

  • Bruno Paulino

    That is a great source. Thank you for sharing this amazing content with us Justin.

  • Lots of good stuff here, especially how to use views. Using them for refactoring and performance gains would be a great follow-up topic (or ebook).

  • Pingback: A Retrospective on Ruby in 2015()

  • Pingback: 2015 年 Ruby 大事件盘点-深圳新媒体门户-一个实用的网站|www.wxshenzhen.cn()