In most information systems chances are you will have some kind of data search and filtering features since they are commonly demanded by business processes, the selected tool (or tools) depends on how complex is the search itself. In the Rails domain we have access to several tools: DB through ODM/ORM/O*M tools, Searchcop, Ransack, Textacular (only for postgres). However, most of these search tools are dependant on the database directly and perform strict filters which may be limiting for more complex scenarios.
The main idea with these post series (this is the first one) is showing how is the integration process between an stable search engine and Rails and how make an advanced filters and searches with the information available in your system.
Leaving aside the simple and self-integrated solutions for searching, our team was looking for a complete and smooth approach for it. After an analysis for several external options and engines (most of them were old acquaintances) among which were Sphinx, Solr and Elasticssearch (ES) we decided to use Elasticsearch, we were keen about the new interesting features available in its newest versions (like mapping, scoring and boosting functions, scripting support, distributed model support, elegant API and so on), you can find a good comparison for both Solr and ES in this post (although it is about a year ago).
Once we had chosen our engine we needed to explore what would be our ruby client for it, the answer was not difficult to find. Before, we had worked with Tire, but it is now deprecated in favor of the elasticsearch-rails gem that is supported by Elasticsearch and Tire creator itself. Our first impression for this gem was that was pretty complete and stable, and at the end we were right, also the initial integration is pretty straightforward and there are a good post set for explaining this (for example here). We are going to show here the basics for implementing a simple search with elasticsearch and rails using the elasticsearch-rails gem.
In your development environment, you can install the elasticsearch server by using the brew package manager, however if you want a more portable solution (with edge support) you can download the specific version you desire from here. We will be using version 1.6.0.
using brew
$ brew install elasticsearch
$ elasticsearch --config=/usr/local/opt/elasticsearch/config/elasticsearch.yml
or downloading directly
$ wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.6.0.tar.gz
$ tar xopf elasticsearch-1.6.0.tar.gz
$ sh elasticsearch-1.6.0/bin/elasticsearch
In order to see if everything is ok you can run the next command (it assumes that the server has a default configuration)
$ curl -X GET http://localhost:9200/
The installation process is pretty simple, you only need to add the correct dependencies in your Gemfile and the correct modules in your model. Also, we are going to deal with some issues and tricks associated with pagination, deployment on heroku, code inspection and a known caveat when using STI.
# Gemfile
# ...
gem 'elasticsearch-model', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
gem 'elasticsearch-rails', git: 'git://github.com/elasticsearch/elasticsearch-rails.git'
you can see more information about these gems here.
Once you have added the associated gems, you can configure the elasticsearch url from rails (it will configure the way how Rails will connect to the elasticsearch server), by default the server is on http://localhost:9200.
# config/initializers/elasticsearch.rb
Elasticsearch::Model.client = Elasticsearch::Client.new url: ENV['ELASTICSEARCH_URL'] || "http://localhost:9200/"
You can find good articles related with the first inception between elasticsearch and rails here. The example on that post plays with ActiveRecord models, however we want to make an example using a NoSQL database (MongoDB in this case), the reason is nothing special, we want to save some time from running migrations and other things that are simpler compared to SQL databases (we understand that for a simple scenarios -learning purpose- the database type is not a big difference). Suppose that we have an existent user model who has a first_name
, last_name
and email
fields.
# app/models/user.rb
# Code extracted from a devise model, also it is using Mongoid 4.0.2 as its ODM
class User
include Mongoid::Document
include Elasticsearch::Model
include Elasticsearch::Model::Callbacks
# index name for keeping consistency among existing environments
index_name "users-#{Rails.env}"
field :email, type: String, default: ""
field :first_name, type: String, default: ""
field :last_name, type: String, default: ""
def as_indexed_json
as_json(except: [:id, :_id])
end
end
You only need to include Elasticsearch::Model
and Elasticsearch::Model::Callbacks
modules for getting the default behavior, however if you are going to work with more than one environment you should configure an index name according to the specific environment, you can easily do that calling the index_name method. Finally, you need to create a method named as_indexed_json
in your searchable model in order to tell Elasticsearch how the index document will be structured (We will analyze this method in other posts).
Also, you need to create the Elasticsearch index and perform the first importing process, you can do that inside an initializer file, we are going to reuse our previous initializer:
# config/initializers/elasticsearch.rb
Elasticsearch::Model.client = Elasticsearch::Client.new url: ENV['ELASTICSEARCH_URL'] || "http://localhost:9200/"
unless User.__elasticsearch__.index_exists?
User.__elasticsearch__.create_index! force: true
User.import
end
In this way, you can start using the search engine magic, we are going to create some seeds on the User
model
user_names = ['John Doe', 'Allam Britto', 'Bob Dylan', 'Alice Cooper', 'Alice Pasquini']
user_names.each do |user_name|
user_params = {
first_name: user_name.split(' ').first,
last_name: user_name.split(' ').last,
email: "#{user_name.downcase.gsub(' ', '.')}@example.com",
password: '12345678'
}
User.find_or_create_by user_params
end
And we could to execute some queries, like this
User.search('allam').count
#=> 1
User.search('alice').count
#=> 2
User.search('allam').first
#=> #<Elasticsearch::Model::Response::Result:0x007fc25c52e8a8 @result=#<Hashie::Mash _id="558885474a7561a99a010000" _index="users-development" _score=0.15342641 _source=#<Hashie::Mash email="allam.britto@example.com" first_name="Allam" last_name="Britto"> _type="user">>
Also, you have two choices for retrieving your results:
#records
, this method returns the real instances of your model (it hits the database), it will return a genuine collection of model instances by your database, i.e. ActiveRecord::Relation for ActiveRecord, or Mongoid::Criteria in case of MongoDB (text extracted from the elasticsearch-rails gem).
#results
, this method returns a collection of Elasticsearch objects (gives you information directly, without hitting the database).
User.search('alice').records.records.class
#=> Mongoid::Criteria
User.search('alice').results.results.class
#=> Array
We issue records.records
and results.results
instead of just records
and results
because the first call gives you elasticsearch instances representing the results or records and the second one is to actually fetch them.
We love both debugging and inspecting our applications and the readability is also important for us, if you are using some gem similar to jazz_hands or jazz_fingers that are related with awesome_print and you are using Mongoid like this example, perhaps you will probably have a problem similar to this:
User.search('allam').first
#=> #<NameError: undefined method `to_hash' for class `Elasticsearch::Model::Response::Result'>
User.search('allam').results.first
#=> #<NameError: undefined method `to_hash' for class `Elasticsearch::Model::Response::Result'>
User.search('allam').results.to_a
#=> #<NameError: undefined method `to_hash' for class `Elasticsearch::Model::Response::Result'>
In this case, when we try to invoke an output associated with an Array
from our Elasticsearch::Model::Response::Result
model and our inspecting system is working with awesome_print
, our application will break. You can find a full explanation for this problem here.
As you can see in the referenced issue, the solution is pretty simple (actually, it took us nearly 5 hours to debug this obscure thing) and you can make a little monkey patch for solving the problem, first we are going to guarantee that the lib folder is autoloaded by Rails
# config/application.rb
class Application < Rails::Application
# ...
# Auto-loading lib files
config.autoload_paths << Rails.root.join('lib')
end
Then, we are going to perform our monkey patch using a module strategy (the module will be named Hashable
, but it's just a name we put to it)
# lib/extensions/elasticsearch/model/response/hashable.rb
module Extensions
module Elasticsearch
module Model
module Response
module Hashable
#
# Returns the result object as a plain ruby hash to support awesome_print benefits
# it allows invoking the Elasticsearch::Model::Response::Result#method(:to_hash) method
#
#
# @return [Hash]
#
def to_hash
@result.to_h
end
end
end
end
end
end
Finally, we are going to reuse our elasticsearch initializer for including our monkey patch
# config/initializers/elasticsearch.rb
# ...
if defined? Mongoid
Elasticsearch::Model::Response::Result.include Extensions::Elasticsearch::Model::Response::Hashable
end
Cool, now we can use a better printer for our command’s output (using mongoid and awesome_print
)
User.search('allam').results.first
# => {
# "_index" => "users-development",
# "_type" => "user",
# "_id" => "558885474a7561a99a010000",
# "_score" => 0.15342641,
# "_source" => {
# "email" => "allam.britto@example.com",
# "first_name" => "Allam",
# "last_name" => "Britto"
# }
#}
Ok we have just begun, and we already have made some simple searches using the indexing approach from elasticsearch, but what is an index? you can find a great explanation here, but essentially an index has two meanings: as a noun and as a verb.
On the Elasticsearch context an index can contain many types. You could map this relation as:
one Index => many Types
one Relational Database => many Tables
With this in mind, suppose that we have a new user type and for a business domain reason you need to apply a STI approach, something like this
# app/model/adviser.rb
class Adviser < User
# ...
end
Let’s create some dummy data for our new model
adviser_names = ['Alex Morgan', 'Nadine Angerer', 'Michelle Akers', 'Mia Hamm', 'Lady Andrade']
adviser_names.each do |adviser_name|
adviser_params = {
first_name: adviser_name.split(' ').first,
last_name: adviser_name.split(' ').last,
email: "#{adviser_name.downcase.gsub(' ', '.')}@example.com",
password: '12345678'
}
Adviser.find_or_create_by adviser_params
end
But what happens when we try to index this model?
Adviser.__elasticsearch__.create_index! force: true
#=> {
# "acknowledged" => true
#}
Adviser.import
#=> SystemStackError: stack level too deep
Unfortunately, it does not work, you can find more information about this problem here, as you can see in the issue the simplest solution is to include the Elasticsearch::Model
module once again in the child class. Also we will have to rename the document type taking into account the elasticsearch indexing structure mentioned before.
class Adviser < User
include Elasticsearch::Model
index_name "users-#{Rails.env}"
document_type "adviser"
end
For importing our new data from the Adviser
model to the elasticsearch engine is necessary re-edit our elasticsearch initializer, but before of that we are going to delete our previous user index (you can avoid this step by executing Adviser.import
in a rails console)
User.__elasticsearch__.client.indices.delete index: User.index_name rescue nil
# config/initializers/elasticsearch.rb
# ...
unless User.__elasticsearch__.index_exists?
User.__elasticsearch__.create_index! force: true
User.import
Adviser.import
end
With the previous implementation you already can execute searches using the new model (Adviser
), yay!
Adviser.search('Alex').first
#=> {
# "_index" => "users-development",
# "_type" => "adviser",
# "_id" => "558990784a7561240d000000",
# "_score" => 0.7554128,
# "_source" => {
# "email" => "alex.morgan@example.com",
# "first_name" => "Alex",
# "last_name" => "Morgan"
# }
#}
Well, now we have a common structure for dealing both the simple searches and the indexing process. What is the next feature we need to deal with? We are going to leave the full searches capabilities for another post inside these post series, for now we are going to deal with the pagination stuff.
Pagination is a pretty simple integration which comes with the elasticsearch-rails gem, however this gem has a little problem, always selects the kaminari integration in favor of will_paginate. It is not really a critical problem because why should we have both implementations in the same rails project?, but for example, what happens if you like will_paginate but you are also using rails_admin (this could be a common scenario that could be repeated with other implemetations as well).
Rails Admin uses Kaminari by default and if you want to use the will_paginate integration in your project you will have to override the pagination methods because by default elasticsearch-rails will have integrated with kaminari, you can see more information of this behavior here. So, the solution is described in the previous link and is also pretty simple, we only need to include the will_paginate elasticsearch module - we could use an initializer for that - (although this solution is a little dirty it works fine)
# config/initializers/elasticsearch.rb
# ...
# NOTE: you only need to do that if your rails project has both the kaminari and will_paginate implementations and you want to use will_paginate
Elasticsearch::Model::Response::Response.__send__ :include, Elasticsearch::Model::Response::Pagination::WillPaginate
In the future, perhaps the gem should have an external configurator for this specific feature, for now, we will have to make these tricks.
Once you have configured your pagination integration (will_paginate in our case) we could execute something similar to this.
Adviser.search({}).paginate(page: 3, per_page: 2).results.first
#=> {
# "_index" => "users-development",
# "_type" => "adviser",
# "_id" => "558990784a7561240d010000",
# "_score" => 1.0,
# "_source" => {
# "email" => "nadine.angerer@example.com",
# "first_name" => "Nadine",
# "last_name" => "Angerer"
# }
#}
If you don't do this, you will not get any errors while trying to paginate and it will not actually paginate.
You have a few options for deploying elasticsearch on heroku, however there is a micro plan for free, using an addon named SearchBox Elasticsearch. Also we have made an example project in which we develop all the topics covered in this post.
First, for deploying to Heroku you need to build your application in, then you will have to integrate some necessary addons using your heroku toolbelt (preferably) inside the root project folder.
$ heroku addons:create searchbox:starter
$ heroku addons:create mongolab:sandbox
$ git push heroku master
Second, you should take into account that with these addons you only can have two indices and the storage is pretty limited, but you should be ok to test with them.
Beyond these steps, the integration between Rails and Elasticsearch is simple, however there are some tricks for some odd cases (some of them have been explained here) but the integration itself has a lot of benefits and tools for searching and filtering, in the following posts we are going to deal with searches of medium and high complexity. We will return soon!