25 Jun

Elasticsearch, Rails and Mongoid

Posted by Johan Tique — Search Mongoid Rails Elasticsearch Indexing

Last update on 2015-06-25

In most information systems chances are you will have some kind of data search and filtering features since they are commonly demanded by business processes, the selected tool (or tools) depends on how complex is the search itself. In the Rails domain we have access to several tools: DB through ODM/ORM/O*M tools, Searchcop, Ransack, Textacular (only for postgres). However, most of these search tools are dependant on the database directly and perform strict filters which may be limiting for more complex scenarios.

The main idea with these post series (this is the first one) is showing how is the integration process between an stable search engine and Rails and how make an advanced filters and searches with the information available in your system.

Leaving aside the simple and self-integrated solutions for searching, our team was looking for a complete and smooth approach for it. After an analysis for several external options and engines (most of them were old acquaintances) among which were Sphinx, Solr and Elasticssearch (ES) we decided to use Elasticsearch, we were keen about the new interesting features available in its newest versions (like mapping, scoring and boosting functions, scripting support, distributed model support, elegant API and so on), you can find a good comparison for both Solr and ES in this post (although it is about a year ago).

Once we had chosen our engine we needed to explore what would be our ruby client for it, the answer was not difficult to find. Before, we had worked with Tire, but it is now deprecated in favor of the elasticsearch-rails gem that is supported by Elasticsearch and Tire creator itself. Our first impression for this gem was that was pretty complete and stable, and at the end we were right, also the initial integration is pretty straightforward and there are a good post set for explaining this (for example here). We are going to show here the basics for implementing a simple search with elasticsearch and rails using the elasticsearch-rails gem.

INSTALLATION

Elasticsearch

In your development environment, you can install the elasticsearch server by using the brew package manager, however if you want a more portable solution (with edge support) you can download the specific version you desire from here. We will be using version 1.6.0.

using brew

$ brew install elasticsearch
$ elasticsearch --config=/usr/local/opt/elasticsearch/config/elasticsearch.yml

or downloading directly

$ wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.6.0.tar.gz
$ tar xopf elasticsearch-1.6.0.tar.gz
$ sh elasticsearch-1.6.0/bin/elasticsearch

In order to see if everything is ok you can run the next command (it assumes that the server has a default configuration)

$ curl -X GET http://localhost:9200/

IN YOUR RAILS APP

The installation process is pretty simple, you only need to add the correct dependencies in your Gemfile and the correct modules in your model. Also, we are going to deal with some issues and tricks associated with pagination, deployment on heroku, code inspection and a known caveat when using STI.

# Gemfile
# ...
gem 'elasticsearch-model', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
gem 'elasticsearch-rails', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'

you can see more information about these gems here.

Once you have added the associated gems, you can configure the elasticsearch url from rails (it will configure the way how Rails will connect to the elasticsearch server), by default the server is on http://localhost:9200.

# config/initializers/elasticsearch.rb
Elasticsearch::Model.client = Elasticsearch::Client.new url: ENV['ELASTICSEARCH_URL'] || "http://localhost:9200/"

PLAY WITH ELASTICSEARCH

You can find good articles related with the first inception between elasticsearch and rails here. The example on that post plays with ActiveRecord models, however we want to make an example using a NoSQL database (MongoDB in this case), the reason is nothing special, we want to save some time from running migrations and other things that are simpler compared to SQL databases (we understand that for a simple scenarios -learning purpose- the database type is not a big difference). Suppose that we have an existent user model who has a first_name, last_name and email fields.

# app/models/user.rb
# Code extracted from a devise model, also it is using Mongoid 4.0.2 as its ODM
class User
  include Mongoid::Document
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks

  # index name for keeping consistency among existing environments
  index_name "users-#{Rails.env}"

  field :email, type: String, default: ""
  field :first_name, type: String, default: ""
  field :last_name, type: String, default: ""

  def as_indexed_json
    as_json(except: [:id, :_id])
  end

end

You only need to include Elasticsearch::Model and Elasticsearch::Model::Callbacks modules for getting the default behavior, however if you are going to work with more than one environment you should configure an index name according to the specific environment, you can easily do that calling the index_name method. Finally, you need to create a method named as_indexed_json in your searchable model in order to tell Elasticsearch how the index document will be structured (We will analyze this method in other posts).

Also, you need to create the Elasticsearch index and perform the first importing process, you can do that inside an initializer file, we are going to reuse our previous initializer:

# config/initializers/elasticsearch.rb
Elasticsearch::Model.client = Elasticsearch::Client.new url: ENV['ELASTICSEARCH_URL'] || "http://localhost:9200/"

unless User.__elasticsearch__.index_exists?
  User.__elasticsearch__.create_index! force: true
  User.import
end

In this way, you can start using the search engine magic, we are going to create some seeds on the User model

user_names = ['John Doe', 'Allam Britto', 'Bob Dylan', 'Alice Cooper', 'Alice Pasquini']

user_names.each do |user_name|
  user_params = {
    first_name: user_name.split(' ').first,
    last_name: user_name.split(' ').last,
    email: "#{user_name.downcase.gsub(' ', '.')}@example.com",
    password: '12345678'
  }
  User.find_or_create_by user_params
end

And we could to execute some queries, like this

User.search('allam').count
#=> 1
User.search('alice').count
#=> 2
User.search('allam').first
#=> #<Elasticsearch::Model::Response::Result:0x007fc25c52e8a8 @result=#<Hashie::Mash _id="558885474a7561a99a010000" _index="users-development" _score=0.15342641 _source=#<Hashie::Mash email="allam.britto@example.com" first_name="Allam" last_name="Britto"> _type="user">>

Also, you have two choices for retrieving your results:

#records, this method returns the real instances of your model (it hits the database), it will return a genuine collection of model instances by your database, i.e. ActiveRecord::Relation for ActiveRecord, or Mongoid::Criteria in case of MongoDB (text extracted from the elasticsearch-rails gem).
#results, this method returns a collection of Elasticsearch objects (gives you information directly, without hitting the database).

User.search('alice').records.records.class
#=> Mongoid::Criteria
User.search('alice').results.results.class
#=> Array

We issue records.records and results.results instead of just records and results because the first call gives you elasticsearch instances representing the results or records and the second one is to actually fetch them.

AWESOME_PRINT AND PRY PROBLEMS?

We love both debugging and inspecting our applications and the readability is also important for us, if you are using some gem similar to jazz_hands or jazz_fingers that are related with awesome_print and you are using Mongoid like this example, perhaps you will probably have a problem similar to this:

User.search('allam').first
#=> #<NameError: undefined method `to_hash' for class `Elasticsearch::Model::Response::Result'>
User.search('allam').results.first
#=> #<NameError: undefined method `to_hash' for class `Elasticsearch::Model::Response::Result'>
User.search('allam').results.to_a
#=> #<NameError: undefined method `to_hash' for class `Elasticsearch::Model::Response::Result'>

In this case, when we try to invoke an output associated with an Array from our Elasticsearch::Model::Response::Result model and our inspecting system is working with awesome_print, our application will break. You can find a full explanation for this problem here.

As you can see in the referenced issue, the solution is pretty simple (actually, it took us nearly 5 hours to debug this obscure thing) and you can make a little monkey patch for solving the problem, first we are going to guarantee that the lib folder is autoloaded by Rails

# config/application.rb
class Application < Rails::Application
  # ...
  # Auto-loading lib files
  config.autoload_paths << Rails.root.join('lib')
end

Then, we are going to perform our monkey patch using a module strategy (the module will be named Hashable, but it's just a name we put to it)

# lib/extensions/elasticsearch/model/response/hashable.rb
module Extensions
  module Elasticsearch
    module Model
      module Response
        module Hashable
          #
          # Returns the result object as a plain ruby hash to support awesome_print benefits
          # it allows invoking the Elasticsearch::Model::Response::Result#method(:to_hash) method
          #
          #
          # @return [Hash]
          #
          def to_hash
            @result.to_h
          end
        end
      end
    end
  end
end

Finally, we are going to reuse our elasticsearch initializer for including our monkey patch

# config/initializers/elasticsearch.rb
# ...
if defined? Mongoid
  Elasticsearch::Model::Response::Result.include Extensions::Elasticsearch::Model::Response::Hashable
end

Cool, now we can use a better printer for our command’s output (using mongoid and awesome_print)

User.search('allam').results.first
# => {
#   "_index" => "users-development",
#    "_type" => "user",
#      "_id" => "558885474a7561a99a010000",
#   "_score" => 0.15342641,
#  "_source" => {
#         "email" => "allam.britto@example.com",
#    "first_name" => "Allam",
#     "last_name" => "Britto"
#  }
#}

STI (Single Table Inheritance) AND DOCUMENT TYPE APPROACHES

Ok we have just begun, and we already have made some simple searches using the indexing approach from elasticsearch, but what is an index? you can find a great explanation here, but essentially an index has two meanings: as a noun and as a verb.

noun: is the place for storing related documents.
verb: is to store a document inside an index (noun) so that it can be retrieved and queried.

On the Elasticsearch context an index can contain many types. You could map this relation as:

one Index => many Types

one Relational Database => many Tables

With this in mind, suppose that we have a new user type and for a business domain reason you need to apply a STI approach, something like this

# app/model/adviser.rb
class Adviser < User
  # ...
end

Let’s create some dummy data for our new model

adviser_names = ['Alex Morgan', 'Nadine Angerer', 'Michelle Akers', 'Mia Hamm', 'Lady Andrade']

adviser_names.each do |adviser_name|
  adviser_params = {
    first_name: adviser_name.split(' ').first,
    last_name: adviser_name.split(' ').last,
    email: "#{adviser_name.downcase.gsub(' ', '.')}@example.com",
    password: '12345678'
  }
  Adviser.find_or_create_by adviser_params
end

But what happens when we try to index this model?

Adviser.__elasticsearch__.create_index! force: true
#=> {
#  "acknowledged" => true
#}
Adviser.import
#=> SystemStackError: stack level too deep

Unfortunately, it does not work, you can find more information about this problem here, as you can see in the issue the simplest solution is to include the Elasticsearch::Model module once again in the child class. Also we will have to rename the document type taking into account the elasticsearch indexing structure mentioned before.

class Adviser < User
  include Elasticsearch::Model

  index_name "users-#{Rails.env}"
  document_type "adviser"
end

For importing our new data from the Adviser model to the elasticsearch engine is necessary re-edit our elasticsearch initializer, but before of that we are going to delete our previous user index (you can avoid this step by executing Adviser.import in a rails console)

User.__elasticsearch__.client.indices.delete index: User.index_name rescue nil

# config/initializers/elasticsearch.rb
# ...
unless User.__elasticsearch__.index_exists?
  User.__elasticsearch__.create_index! force: true
  User.import
  Adviser.import
end

With the previous implementation you already can execute searches using the new model (Adviser), yay!

Adviser.search('Alex').first
#=> {
#   "_index" => "users-development",
#    "_type" => "adviser",
#      "_id" => "558990784a7561240d000000",
#   "_score" => 0.7554128,
#  "_source" => {
#         "email" => "alex.morgan@example.com",
#    "first_name" => "Alex",
#     "last_name" => "Morgan"
#  }
#}

Well, now we have a common structure for dealing both the simple searches and the indexing process. What is the next feature we need to deal with? We are going to leave the full searches capabilities for another post inside these post series, for now we are going to deal with the pagination stuff.

PAGINATION

Pagination is a pretty simple integration which comes with the elasticsearch-rails gem, however this gem has a little problem, always selects the kaminari integration in favor of will_paginate. It is not really a critical problem because why should we have both implementations in the same rails project?, but for example, what happens if you like will_paginate but you are also using rails_admin (this could be a common scenario that could be repeated with other implemetations as well).

Rails Admin uses Kaminari by default and if you want to use the will_paginate integration in your project you will have to override the pagination methods because by default elasticsearch-rails will have integrated with kaminari, you can see more information of this behavior here. So, the solution is described in the previous link and is also pretty simple, we only need to include the will_paginate elasticsearch module - we could use an initializer for that - (although this solution is a little dirty it works fine)

# config/initializers/elasticsearch.rb
# ...
# NOTE: you only need to do that if your rails project has both the kaminari and will_paginate implementations and you want to use will_paginate
Elasticsearch::Model::Response::Response.__send__ :include, Elasticsearch::Model::Response::Pagination::WillPaginate

In the future, perhaps the gem should have an external configurator for this specific feature, for now, we will have to make these tricks.

Once you have configured your pagination integration (will_paginate in our case) we could execute something similar to this.

Adviser.search({}).paginate(page: 3, per_page: 2).results.first
#=> {
#   "_index" => "users-development",
#    "_type" => "adviser",
#      "_id" => "558990784a7561240d010000",
#   "_score" => 1.0,
#  "_source" => {
#         "email" => "nadine.angerer@example.com",
#    "first_name" => "Nadine",
#     "last_name" => "Angerer"
#  }
#}

If you don't do this, you will not get any errors while trying to paginate and it will not actually paginate.

DEPLOY TO HEROKU

You have a few options for deploying elasticsearch on heroku, however there is a micro plan for free, using an addon named SearchBox Elasticsearch. Also we have made an example project in which we develop all the topics covered in this post.

First, for deploying to Heroku you need to build your application in, then you will have to integrate some necessary addons using your heroku toolbelt (preferably) inside the root project folder.

$ heroku addons:create searchbox:starter
$ heroku addons:create mongolab:sandbox
$ git push heroku master

Second, you should take into account that with these addons you only can have two indices and the storage is pretty limited, but you should be ok to test with them.

IN CLOSING

Beyond these steps, the integration between Rails and Elasticsearch is simple, however there are some tricks for some odd cases (some of them have been explained here) but the integration itself has a lot of benefits and tools for searching and filtering, in the following posts we are going to deal with searches of medium and high complexity. We will return soon!