Building Cross Model Search with Ember, Rails, and Elasticsearch, Part I

Development

Reading Time: 10 minutes

With its accurate algorithms and autocomplete UI, Google has set the bar extremely high for web apps implementing custom search features. When a user searches your website, they don’t expect their query to only get results for one type of entity. For example, if you had a grocery store web app and someone searched for “banana,” the query results could include products, brands, and ingredients. The point is, search needs to be contextually aware in order to be intuitive.

In this two part series, we’ll use Rails and Elasticsearch to build an ecommerce store called Tember. Then in Part Two, we’ll look at the front-end consumption of our new cross model search API in Ember.js.

Booting Up Rails 5

As you may have heard, Rails 5 is introducing API-only web apps. Among other things, it slims down the middleware stack, which means less code per request. Leigh Halliday wrote an excellent post outlining how to start using Rails 5, so I won’t cover that material.

Let’s start by creating our app:

rails new tember --api --database=postgresql

Nothing surprising here; we create a new Rails app in API mode, and we want the DB to be Postgres. Next up, let’s start creating some models. We’re building an ecommerce web app named Tember which sells different types of wood. Tember will have vendors, products, product reviews, and review authors, so let’s fill up the model directory.

rails g model vendor name description 
rails g model product name description vendor:references 
rails g model review_author name bio 
rails g model review body review_author:references

With a rake db:create db:migrate command, you will see all our models get their underlying tables.

Seed Your Database

It’s time to seed the database with our new models. I’ve included seed data for you to review because it’s valuable to see what we’ll be searching against.

/db/seeds.rb
# make seeding easy by resetting the primary key sequence
ActiveRecord::Base.connection.reset_pk_sequence!('vendors')

Vendor.create(name: 'Oakmont Oaks', description: 'Oakmont provides choice oak from Orlando, Louisville, and Seattle') 
Vendor.create(name: 'Windy Reserve', description: 'Windy Reserve has provides an array of timber from all across the south east.') 
Vendor.create(name: 'Big Hill', description: 'Big Hill provides oak, northern ash, and pine from various regions of Canada.')

ActiveRecord::Base.connection.reset_pk_sequence!('products')

Product.create(id: 1, vendor_id: 1, name: 'Sturdy Oak', description: 'This oak grain is of the highest quality, leaving it extra sturdy.') 
Product.create(id: 2, vendor_id: 1, name: 'Flimsy Oak', description: 'This oak plank is super flimsy and make for easy chairs.') 
Product.create(id: 3, vendor_id: 1, name: 'Flexible Oak', description: 'This oak plank is flexible, it can be bent four times before snapping.') 
Product.create(id: 4, vendor_id: 2, name: 'Northern Ash Planks', description: 'These northern ash planks are good for building or burning.') 
Product.create(id: 5, vendor_id: 2, name: 'Aged Mahogany', description: 'This aged Mahogany makes for an excellent coffee table wood.') 
Product.create(id: 6, vendor_id: 2, name: 'Aged Cherry', description: 'No chair or table is made correctly without Cherry.') 
Product.create(id: 7, vendor_id: 3, name: 'Walnut Planks', description: 'These walnut planks make for a perfect deck.') 
Product.create(id: 8, vendor_id: 3, name: 'Rosewood Pick', description: 'This special pick of Rosewood is perfect for carving.') 
Product.create(id: 9, vendor_id: 3, name: 'Waterproof Teak', description: 'These teak planks allow for waterproof and sun proof outdoor wood.')

ActiveRecord::Base.connection.reset_pk_sequence!('review_authors')

ReviewAuthor.create(id: 1, name: 'George Washington', bio: 'George is the co-founder of the United States') 
ReviewAuthor.create(id: 2, name: 'Abe Lincoln', bio: 'Abe never told a lie.') 
ReviewAuthor.create(id: 3, name: 'Thomas Jefferson', bio: 'TJ was a furniture maker.')

ActiveRecord::Base.connection.reset_pk_sequence!('reviews')

Review.create(id: 1, product_id: 1, review_author_id: 1, body: 'The sturdy oak is a solid building material that will last.') 
Review.create(id: 2, product_id: 2, review_author_id: 1, body: 'The flimsy oak is a super flimsy and light material.') 
Review.create(id: 3, product_id: 3, review_author_id: 1, body: 'The flexible oak, not to be confused with flimsy, is flexible.') 
Review.create(id: 4, product_id: 4, review_author_id: 2, body: 'Northern ash planks from Windy Reserve are awesome!') 
Review.create(id: 5, product_id: 5, review_author_id: 2, body: 'The aged mahogany is high grain old stuff.') 
Review.create(id: 6, product_id: 6, review_author_id: 2, body: 'The cherry we bought from Windy Reserve was awesome!') 
Review.create(id: 7, product_id: 7, review_author_id: 3, body: 'Have you ever had walnut this solid? We built our deck out of it.') 
Review.create(id: 8, product_id: 8, review_author_id: 3, body: 'We found these rosewoods to be really light and tough.') 
Review.create(id: 9, product_id: 9, review_author_id: 3, body: 'If there ever was a better wood for the outdoors I have not seen it.')

We’ve created three vendors, nine products, nine reviews, and three authors. Now that we have data in the Postgres store, let’s start working with Elasticsearch, our search store.

Setting Up Elasticsearch

The engine and storage behind our search will be handled by a technology called Elasticsearch. If you’ve never used it, you’re in for a real treat. Here is Wikipedia’s overview of it:

“Elasticsearch can be used to search all kinds of documents. It provides scalable search, has near real-time search, and supports multi-tenancy. Elasticsearch is distributed, which means that indices can be divided into shards, and each shard can have zero or more replicas. Each node hosts one or more shards and acts as a coordinator to delegate operations to the correct shard(s). Rebalancing and routing are done automatically.”

Elasticsearch is not your only option when it comes to search. In fact, PostgreSQL can be used to create an effective search, but we expect our search feature to be heavily consumed, so we want to move the load away from the database layer.

The easiest way to install Elasticsearch is via Homebrew (sorry, Windows and Linux).

$ brew install elasticsearch

This will install Elasticsearch, which can then be started by this command (which I make an alias to):

$ elasticsearch
--config=/usr/local/opt/elasticsearch/config/elasticsearch.yml

Add this to your /etc/.bash_profile to make it as easy as es:

alias es='elasticsearch 
--config=/usr/local/opt/elasticsearch/config/elasticsearch.yml'

After this command runs, Elasticsearch will be booted up and listening on port 9300. So far we have installed and booted up the Elasticsearch server. Let’s stop here for now and pick back up on Elasticsearch when we integrate Rails.

Using Toptal’s Chewy Gem to Integrate Rails and Elasticsearch

Elasticsearch has a few gems to make the integration with Rails easier; however, those gems leave quite a bit to be desired in terms of functionality. Toptal has created a gem to fill those voids called Chewy. Elasticsearch’s fundamental concept is building up indexes in order to query data. Toptal’s Chewy gem makes it easier to create the indexes, and it actually feels like a ‘Rails way’ solution.

Let’s install the gem:

gem install chewy

bundle

rails g chewy:install

The result of this will be a chewy.yml file, which you won’t need to edit for local development. The only thing left now is to start building the index. Our Tember store will have one autocomplete search for the entire store; therefore, we will create one index named store. If your web app has two search contexts, such as an invoice search and a user search, then you would create two indexes.

app/chewy/store_index.rb

class StoreIndex < Chewy::Index 
end

Next up we need to add the types of entities making up our index. In other words, what models will our index be responsible for searching against? These indexes inherit from the Chewy::Index class, which has a DSL for defining the search structure:

class StoreIndex < Chewy::Index 
  define_type Product.includes(:vendor) do 
    field :name, :description # multiple fields without additional options 
    field :vendor do 
      field :name 
      field :description # default data type is `string` 
    end 
  end 
end

As you can see, the first type is mapping to the Product model. You might also notice we’re including the vendor model here. This allows the index to dive into the actual relationships of the entity, giving our index a “contextual awareness.” Now we need to add the rest of our index types:

class StoreIndex < Chewy::Index 
  define_type Product.includes(:vendor) do 
    field :name, :description # multiple fields without additional options 
    field :vendor do 
      field :name 
      field :description # default data type is `string` 
    end 
  end

  define_type Vendor.includes(:products) do 
    field :name, :description 
    field :products do 
      field :name, :description 
    end 
  end

  define_type Review.includes(:review_author) do 
    field :body 
    field :review_author do 
      field :name, :bio 
    end 
  end

  define_type ReviewAuthor.includes(:reviews) do 
    field :name, :bio 
    field :reviews do 
      field :body 
    end 
  end 
end

Now that we have defined all of our searchable content in the Chewy index, we can insert the code responsible for updating the indexes in Elasticsearch. Each model will include one or many update_index method calls. These callbacks run to push data from Postgres to Elasticsearch when the record saves:

class Vendor < ActiveRecord::Base

  # es indexes 
  update_index('store#vendor') { self } # specifying index, type and back-reference 
  # for updating after user save or destroy 
  # associations 
  has_many :products 
end

The string passed to update_index represents <index_name>#<index_type>. This will be added to each of our models in order to update the appropriate indexes when a record is saved. Let’s take a look at this in the console:

2.2.2 :005 > Vendor.last.save!
  Vendor Load (2.4ms)  SELECT  "vendors".* FROM "vendors" ORDER BY "vendors"."id" DESC LIMIT 1
   (5.4ms)  BEGIN
   (0.2ms)  COMMIT
   (0.9ms)  SELECT COUNT(*) FROM "vendors" WHERE "vendors"."id" IN (3)
  Vendor Load (0.4ms)  SELECT "vendors".* FROM "vendors" WHERE "vendors"."id" IN (3)
  Product Load (0.6ms)  SELECT "products".* FROM "products" WHERE "products"."vendor_id" = 3
  StoreIndex::Vendor Import (255.8ms) {:index=>1}
 => true

You will notice quite a few queries are being executed here. Normally only one SQL command would be run if you are updating one record. However, we have Chewy callbacks updating the vendor and its owned products, so you’re seeing more records pulled in. This activity can and should be pushed to a background worker like Sidekiq so as not to delay your application on each save.

At this point, we have data being pushed to Elasticsearch each time our searchable models are saved. Let’s look at how to interact with Elasticsearch indexes to run queries.

Sign up for a free Codeship Account

Querying Elasticsearch with the Chewy Gem

Last up is for us to jump in the console and see how new records can be searched against using our new StoreIndex class. Earlier in this post, we updated the index for one vendor record. We need all the data to be synced with Elasticsearch; luckily, there is a rake task included with Chewy to import all data or “reset” the indexes:

$ rake chewy:reset 
Resetting StoreIndex 
  Imported StoreIndex::Product for 0.21s, documents total: 9 
  Imported StoreIndex::Vendor for 0.04s, documents total: 3 
  Imported StoreIndex::Review for 0.04s, documents total: 9 
  Imported StoreIndex::ReviewAuthor for 0.04s, documents total: 3

In terms of record count, you can see we have a one-to-one match from the database to Elasticsearch. Each row in our database has been imported to Elasticsearch. Let’s open up the console and see how we can query this new Elasticsearch index:

rails c

2.2.2 :003 > StoreIndex.query(term: {name: 'oak'})
 => #<StoreIndex::Query:0x007fdf0cf1ced8 @options={}, @_types=[], @_indexes=[StoreIndex], @criteria=#<Chewy::Query::Criteria:0x007fdf0cf1ceb0 @options={:query_mode=>:must, :filter_mode=>:and, :post_filter_mode=>:and}, @queries=[{:term=>{:name=>"oak"}}], @filters=[], @post_filters=[], @sort=[], @fields=[], @types=[], @scores=[], @request_options={}, @facets={}, @aggregations={}, @suggest={}, @script_fields={}>, @_request=nil, @_response=nil, @_results=nil, @_collection=nil>

We executed the query method on our index class with a term of {name: ‘oak’}. The return value is an instance of the class StoreIndex::Query. It can be used in many ways, including aggregations, filtering, merging to other queries, and so forth. More information can be found in the gem’s readme on those various functions and their use cases.

Now that we have a search for the term ‘oak’, let’s load the records associated with this query.

2.2.2 :016 > StoreIndex.query(term: {name: 'oak'}).load.to_a
  StoreIndex Search (13.5ms) {:body=>{:query=>{:term=>{:name=>"oak"}}}, :index=>["store"], :type=>[]}
  Product Load (0.5ms)  SELECT "products".* FROM "products" WHERE "products"."id" IN (1, 2, 3)
 => [#<Product id: 1, name: "Sturdy Oak", description: "This oak grain is of the highest quality, leaving ...", vendor_id: 1, created_at: "2015-11-21 15:28:26", updated_at: "2015-11-21 15:28:26">, #<Product id: 2, name: "Flimsy Oak", description: "This oak plank is super flimsy and make for easy c...", vendor_id: 1, created_at: "2015-11-21 15:28:27", updated_at: "2015-11-21 15:28:27">, #<Product id: 3, name: "Flexible Oak", description: "This oak plank is flexible, it can be bent four ti...", vendor_id: 1, created_at: "2015-11-21 15:28:27", updated_at: "2015-11-21 15:28:27">] 
2.2.2 :017 > 

We use the .load method to fetch the actual database records associated with the Elasticsearch results, then we use .to_a to push it into an array. There you have it. A search for the term ‘oak’ yields three records: two products and one vendor.

Conclusion

By now, you should have the ability to integrate Elasticsearch with a Rails 5 application. This gives you the ability to sync data between the two datastores, query the database, and return records across multiple models.

Elasticsearch is an extremely powerful search technology, and while I showed you how to set it up and pass data to it, there is much more to learn in order to optimize your Elasticsearch implementation. If you are making use of Elasticsearch, I highly recommend you take a dive into the analyzer and analysis APIs to dial in indexes.

This wraps up part one of our two-part series. Next up, we’ll build the APIs needed for our Ember.js front end to build a blazing fast autocomplete search feature.

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles.
Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.



We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.

  • Pingback: Building Cross Model Search with Ember, Rails, and Elasticsearch | Dinesh Ram Kali.()

  • Jeremiah

    You mention “This activity can and should be pushed to a background worker like Sidekiq so as not to delay your application on each save.” Can you share your preferred method for doing that? Thank you!

    • Rob Guilfoyle

      Hi @jeremiah , when using the Chewy gem you are given the ability to wrap code in whats called an “update strategy”. If you take a look at the chewy documentation on update strategies (https://github.com/toptal/chewy#index-update-strategies) it shows you how to integrate with your particular background workers. At edukate.com we use sidekiq as our background workers which is supported by chewy among others (resque, active_job). Make sense?

      • Jeremiah

        Hi Rob, thanks for the quick answer!

        I was hoping there was a means of setting the default strategy to sidekiq or similar – something like Chewy.root_strategy = :sidekiq in an initializer. Based on what i’m seeing in the docs the way to update via sidekiq/other delayed jobs is per-operation rather than globally/default. That would be adding Chewy.strategy(:sidekiq) before any CRUD operation in the controllers or wrapping those operations in a similar block. Does that sound right or am I (likely) missing something? Thanks again!

        • Rob Guilfoyle

          The owner of the gem answers this in an issue: https://github.com/toptal/chewy/issues/285. It is as easy as Chewy.request_strategy = :sidekiq in the chewy.yml file. The documentation could be a little clearer, but its open source so you can make a pull request ;)

          • Jeremiah

            I knew I was missing something simple, thank you again!

  • Oleg Keene

    Hardcode fk? Dah..

    instance = Model.create!(…)
    instance.relation.build(…)

  • Rob Guilfoyle

    Simply for display purposes, obviously in a production application hardcoding IDs is not something I would advocate towards.

  • Just a small hint: You forgot to reference the product model when creating the reviews. The command to generate the review model should be as follows:

    rails g model review body review_author:references product:references

  • Brennan Holtzclaw

    Do you know if something in particular has changed with the elasticsearch set up recently?
    When I input `$ elasticsearch
    –config=/usr/local/opt/elasticsearch/config/elasticsearch.yml`
    I get this error `[2016-06-01 09:32:54,641][INFO ][bootstrap ] es.config is no longer supported. elasticsearch.yml must be placed in the config directory and cannot be renamed.`

    Not sure if this is expected or if there’s a new step I’m missing.