Folding Postgres Window Functions into Rails

Development

You’ve heard of window functions in PostgreSQL, but you aren’t quite sure what they are or how to use them. On the surface they seem esoteric, and their use cases are ambiguous. Something concrete would really help cement when window functions are the right tool for the job. We have our work cut out for us! Through this post we’ll see:

  1. How to recognize where a window function is helpful.
  2. How to implement a scope to introduce window functions.
  3. How to use tests to drive a switch from pure Ruby to a window function.

An Application Use Case for Postgres Window Functions

You’ve recently finished shipping a suite of features for an application that helps travelers book golf trips. Things are looking good, and a request comes in from your client:

Our application started by being the go-to place to find golf trips, and our users love it. Some of the resorts that list trips with us also offer some non-golf events, such as tennis, badminton, and pickleball. When we begin listing other trips, it would be great to highlight our user’s favorite trips for each category. Can you do that for us?

Why, of course you can do that! The application lets potential travelers flag trips they’re interested in as favorites, providing a reliable metric that we can use to rank trips. With the simple addition of a category for each trip, we can also filter our group trips together. This seems straightforward enough, so what are we waiting for?

Adding and testing application categories in Ruby

After a refresher look through the schema, you’re ready to get to work. A look at the trips table reveals that it currently has these relevant fields:

  • name
  • category
  • favorites

Where the application currently lists the top ranked trips, we will instead list the top two for each category. Some tests will help verify that we’re getting the expected results.

class TripTest < ActiveSupport::TestCase 
  setup do 
      Trip::CATEGORIES.each do |category| 
      5.times do |favorites| 
      Trip.create!( 
      name: "#{category}-#{favorites}", 
      category: category, favorites: favorites ) 
      end 
      end 
end

test 'find top trips by category' do 
      trips = Trip.popular_by_category(per: 2)

      assert_equal 8, trips.length
      assert_equal [2,2,2,2], trips.group_by(&:category).map(&:last).map(&:length)
      assert trips.all? { |trip| trip.favorites > 2 }
  end 
end

The test seeds the test database with twenty trips across four categories. The popular_by_category method should only return eight trips in total, the most popular two from each category.

Approaching this in pure Ruby is clear, readable, and concise. All of the trips are loaded into memory, grouped by category, ranked according to the number of favorites, and then the requested amount is sliced off of the top. Do note that sorting is comprised of both favorites and name, which is necessary to force deterministic sorting in the event of trips that are equally popular.

class Trip < ActiveRecord::Base 
  CATEGORIES = %w[golf tennis badminton pickleball]

  def self.popular_by_category_orig(per: 2) 
      all.group_by(&:category).flat_map do |category, subset| 
      subset.sort_by do |trip| 
      [-trip.favorites, trip.name] 
      end.take(per) 
      end 
  end 
end

As a wizened developer, you immediately recognize that loading all of the trips into memory simply to end up with only eight results is horribly inefficient. The pure Ruby code is concise and readable, but it isn’t suitable for production usage.

New Call-to-action

Moving the logic to PostgreSQL

Between various subselects, GROUP BY with aggregates and multiple queries, there is more than one way to do it in SQL. You may recall reading about one advanced feature of PostgreSQL that could be particularly adept at solving this category problem — window functions. Succinctly put, directly from the documentation:

A window function performs a calculation across a set of table rows that are somehow related to the current row. -Postgres Documentation

The important part of that phrase deals with the power of calculating across related rows. In our case, the rows are related by category, and the calculation being performed is ordering them within those categories. In the realm of window functions, this is handled with an OVER clause. There are additional expressions for fine tuning the window, but we can achieve all we need with PARTITION BY and ORDER BY expressions. Dropping into psql, partition the data set by category:

SELECT category, favorites, row_number() OVER (PARTITION BY category) FROM trips;
category  | favorites |  row_number
------------+-----------+-------------
 badminton  |       0 |         1
 badminton  |       1 |         2
 badminton  |       2 |         3
 badminton  |       3 |         4
 golf       |       0 |         1
 golf       |       1 |         2

The row_number is a window function that calculates the number of the current row within its partition. The row number becomes crucial when the partitioned data is then ordered:

SELECT category, favorites, row_number() OVER (
  PARTITION BY category ORDER BY favorites DESC
) FROM trips;
category  | favorites | row_number
------------+-----------+------------
 badminton  |       3 |         1
 badminton  |       2 |         2
 badminton  |       1 |         3
 badminton  |       0 |         4
 golf       |       3 |         1
 golf       |       2 |         2

All that remains is limiting the results to the top ranked rows, and that can be done in Ruby.

Moving Window Functions to Rails

There aren’t any constructs for OVER built into ActiveRecord. You must drop down and compose raw SQL to compose your queries. So long as you avoid find_by_sql and use relation-refining methods like from or select, there isn’t any loss in composability.

Only the Trip class itself needs to be modified. The tests can stay exactly as they are. A primary benefit of performing calculations in the database is that a scope can be used to return a relation for composability.

class Trip < ActiveRecord::Base 
  CATEGORIES = %w[golf tennis badminton pickleball]

  scope :popular_by_category, -> (per: 2) do 
      from_ranked.where('trips.row_number <= ?', per) 
  end

  private

  def self.from_ranked 
      from <<-SQL.strip_heredoc 
      (SELECT *, row_number() OVER ( 
      PARTITION BY category 
      ORDER BY favorites DESC, name ASC 
      ) FROM trips) AS trips 
      SQL 
  end 
end

The relation is defined as a scope using a custom from clause to build up partitioned trips. The where clauses filter out any row numbers below the desired threshold, and only the favorites for each category are returned. The test still passes, and inspecting the results yields:

category  | favorites | row_number
------------+-----------+------------
 badminton  |       3 |         1
 badminton  |       2 |         2
 golf       |       3 |         1
 golf       |       2 |         2
 pickleball |       3 |         1
 pickleball |       2 |         2
 tennis     |       3 |         1
 tennis     |       2 |         2

Those are precisely the results we’re looking for!

Application speed after implementing Postgres window function

Composability, offloading work to the database, and minimizing memory usage in Rails are all wonderful notions. But looking at numbers, how much faster is execution after the move to window functions? Clearly a test with twenty rows isn’t going to show much. Let’s look at a more realistic production-like database size of 5,000 trips.

Benchmark.ms { Trip.popular_by_category_original } #=> 149.14
Benchmark.ms { Trip.popular_by_category }       #=> 0.277

Not only is the window function version 539.3 times faster, it doesn’t trigger a garbage collection cycle after each run. Your client gets exactly what they want, quickly, and with ample room for their site to grow without bottlenecks.

Conclusion: Postgres Window Functions Simplify Queries

There was SQL before window functions and SQL after window functions: That’s how powerful this tool is. Being that [big of a game changer] unfortunately means that it can be quite hard to grasp the feature.

Dimitri Fontaine

Window functions are an advanced feature of PostgreSQL. Heck, it’s even listed under “advanced features” in the documentation. Having a relatively simple use case for them helps a lot when you’re trying to wrap your head around the concept. For some tasks, window functions can radically simplify queries; for other tasks, they are completely irreplaceable.

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles.
Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.



We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.

  • Sean Griffin

    I’m unclear as to why using `PARTITION BY` here is any better or more useful than adding it to the beginning of your order clause.

    • Jim Nasby

      Because PARTITION BY also does grouping. If you didn’t do that you would just end up with the first 8 rows of the sorted list, and they would all be in a single category (assuming the first category out of the sort had more than 8 rows…)

  • Pierre Y.

    I don’t understand what the “from” is in self.from_ranked. Can you explain it ? Or provide a pointer that describe what are “relation refining methods” ?

  • @pierrey:disqus The `from` here is one of the query methods provided by ActiveRecord. You can see the documentation here. It allows you to specify an alternate table, or build up a subquery to pull the records from.

  • Jim Nasby

    “Approaching this in pure Ruby is clear, readable, and concise.”

    I think it’s all a matter of perspective. The SQL solution is certainly more readable and concise to me. ;)

  • Pingback: Folding Postgres Window Functions into Ecto-IT大道()

  • Brad

    Alternatively, if you’re not a fan of SQL heredocs: build the ranked subquery in ActiveRecord. Don’t forget to alias in from‘s subquery_name argument!


    def self.from_ranked
    trips = self.select([
    "*",
    "row_number() OVER (PARTITION BY category)"
    ])
    .order(favorites: :desc, name: :asc)

    from(trips, :trips)
    end