ProTips: Elastic Search & Rails

In this blog post I am not going to tell you about how to setup elasticsearch with Rails. There are plenty of blog posts available for that. I referred this blog post to begin with elastic search and rails.

In this blog post, I will be showing how to set “ignore malformed” option. This may sound trivial but unfortunately the usage is neither in the ES-rails documentation nor I could find any content on the web.

I spent a few hours digging in the ES-rails code and finally deciphered it. Hope this helps you save the time.

Problem

Recently we started using elastic search in our project. One of the table had around 9 Million records. After configurations/setting we started the indexing. Monitored a few thousand records and indexes were correctly getting generated. However, at the end found out that the indexing of 5k records failed.  I was puzzled, why only some failed. On digging into ES logs, found out the issue – we had ‘21111-1-1’ value in a date field. This value is invalid for Date type field (we had Postgres). But, we somehow had these values in our DB (from a 3rd party source) and never realized it. But Elastic Search found those invalid for Date type and gave the following error –

failed to parse [date_modified]. Invalid format: \"21111-01-01\" is malformed at \"1-01-01\

Solution

Now, before we get to the solution, let us first see the requirements (project specific) –

  • Don’t change type of field in Elastic Search from date to text.
  • Don’t change data in actual tables as it was from a 3rd party service. So even if we had fixed it on the next sync we would have got the corrupt data.
  • Retain the values even if some were invalid. Rationale, 5k out of 9M is  just 0.056% so for such a low % did not want to loose the correct value for other records. Essentially DO NOT ignore the field.

I started searching and found this blog. It was pretty straight forward, just pass ignore_malformed parameter in properties.

But, before putting in the solution, we wanted to get to root cause. ? We were not at all specifying type of date_modified in our mapping. Then how Elastic Search identified it as type DATE? Was it because of the type in DB? Probably not.

By doing some queries we figured out that if we don’t specify type of field in mapping, then initially while creating the index that field will not appear in the mapping. But it will get added to the mapping when the first document gets indexed. The type of the field will be set based on the type of that filed in the FIRST document. Now, the first record in my table was having the CORRECT date value so it got set to DATE type. However, if the first value it self would have been INVALID then it would set the type to TEXT (as it would have got the value as “invalid date format”)

Hence, when it came across a record with invalid date it gave the error above.

Code

Specify “ignore_malformed” property in the settings (of elastic_search_model_rails).

settings index: index_settings do
  indexes :date_modified, type: 'date', ignore_malformed: true
end

Easier said than done!! As for this we had to go through the code of elastic_search_model for a couple of hours, as this was not mentioned in the README. We shall send PR to update the README. Also, if you are more curious, here is code snippet from gem which describes how anything passed to indexes is sent in properties.

def indexes(name, options={}, &block)
  @mapping[name] = options
  ....
end

Here we are passing

{type: "date", :gnore_malformed: true}

In the options and elastic_search_model in turns convert it to

{date_founded: {type: "date", ignore_malformed: true}}

And calls correct Rest API.

Note: It is always good to specify the type of the field in mapping because when I tried to index this document for the first time (without having the mapping for it) elastic_search set type as DATE, as my first record was DATE. Otherwise it would have been text and all searches on this field would have gone haywire.

Another important point, I was to index 9M records, which itself takes time. So, I came to know about it at a much later stage. As a result 99.95% records were already index. After I changed the mapping the ONLY way to go about is DELETE existing indexes and recreate. Needless to say it took the same amount of time again!!

Advertisements

One thought on “ProTips: Elastic Search & Rails

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s