Use power of database to speedup your application!

In this blog post, I am going to tell you some queries/ tricks to speed up your application by using the power of database. I am using postgres and activerecord rails so all queries are related to these, but you can relate it with your database.

Recently I got some performance issue when our user base increased many folds. We did some optmization and would like to share the tips:

Note: As I am using ActiveRecord most of the examples have ActiveRecord syntax with corresponding SQL statement

Select only required fields from database – Not all

Mostly developers miss to specify the required columns and instead select all fields from database which leads to performance degradation.

Let us say my User Table has 100 users

users = User.all
User Load (15.4ms) SELECT "users".* FROM "users"

Above query took 15.4 ms as it is selecting all columns from the table. But, the query below took only 3.3ms.

users = User.all.pluck(:name)
(3.3ms) SELECT "users"."name" FROM "users"

Here we are not only achieving query time improvement but also memory required to hold objects will be lesser.

Add compound index when you want to use limit + order OR Avoid using limit + order

When we use limit with an order it makes the query too slow. The time can be substantial if your table size is large! I was using limit 1 and order some column it was taking few seconds to respond. After searching I found this quickly. The ideal solution is to use a compound index. However, if you are sure that your result set is going to be very small (say 10s of records) then instead of adding compound index you can take the result in an array and returned the first record.

In my case, I knew I won’t get more than 10 records if I collect it in an array. Hence, I used this additional trick.

Company.unscoped.where(subsidiary: false).where("name like 'josh%'").order(:id).limit(1)

Above query was taking 2928.1ms whereas the following query took 2.3ms

Company.unscoped.where(subsidiary: false).where("name like 'josh%'").order(:id)

NOTE: My Companies table has about 1 Million records.

Use JOIN instead of NOT IN query

NOT IN and IN statements are not well optimized. They are easy, but usually, they can be replaced by a simple JOIN statement making they the query way more efficient.

Now say I have a player from England and I want to count how many matches he played where match score is 70 but the match was not against India.

I can simply write it as

player.matches.where(score: 70).where.not(opponent_team_id: Team.find_by(country: 'India').id).count

(166789.8ms) SELECT COUNT(*) FROM "matches" WHERE "matches"."player" = $1 AND "matches"."score" = 70 AND ("matches"."opponent_team_id" NOT IN (SELECT "teams"."id" FROM "teams" WHERE "teams"."country" = 'India')) [["player_id", 135]]

Above query is simple and will return results in few ms but I have ran it on millions of record and it took 166789.8ms

Now let’s see how joins query for above behaves

Match.joins(:team).where(player_id: 135, score: 70).where.not(opponent_team_id: Team.find_by(country: 'India').id).count

(14.5ms) SELECT COUNT(*) FROM "matches" INNER JOIN "teams" ON "teams"."id" != "matches"."team_id" WHERE "matches"."player_id" = 135 AND "matches"."score" = 70

It took only 14.5ms which is minuscule as compared to the “Not In”

Use md5 digest if you want to search on text column

Note: We can use this trick only if we want to do an exact match on a string.

In my case I had one field on which I knew I am going to do only exact match and that field was of text type. When we were doing match on that column query was taking around 3000ms.

User.where(search_column: 'https://abcdef')

(3139.9ms) SELECT "users".* FROM "users" WHERE "users"."search_column" = 'https://abcdef'

The obvious solution was to index search_column but instead, we stored one more column md5_search_column and applied search on that which reduced our search time to few ms.

User.where(md5_search_column: '7c602c2c776fb33c188957bb6bdd8928')

(1.9ms) SELECT "*".* FROM "users" WHERE "users"."md5_search_column" = '7c602c2c776fb33c188957bb6bdd8928'

Now, why didn’t we index search_column instead of adding the new column and storing a value in that column? For two reasons

  • md5 digest will be always 32 characters long so indexing on 32 characters will be more efficient than text column
  • Searching on 32 characters will be more efficient than text column

Use raw sql query instead of going ActiveRecod way

Above optimizations were easy and could be integrated with active record easily. But there are some cases where we need to use raw sql queries and bypass ActiveRecord. Wait it’s not like I don’t like ActiveRecord. I love it simply because I can switch databases any time. For instance, I could switch to Postgres from Sqlite with minimal changes. However, when your data size increases and you know for sure that you are not going to change database you should use database power. IMO, this is the case with all production systems, you do not change the DB!

Said that, for eg. we used sum, least, window functions to get results in milliseconds.

I had to use raw sql for the following example. I wanted to calculate the number of students present in school based on some criteria but a condition was to take only X (In example below its 20) number of maximum students per class.

Now the simplest thing I could do was

  • Take distinct classes from database
  • Loop on every class to calculate count
  • Calculate sum of count calculated in step 2

But think how inefficient that would be. Instead, database provides least and sum functions which can be used to achieve above

ex

(0.7ms) select sum(A.total_count) as total, sum(A.total_by_max_count) as total_by_max_students from
(SELECT count(*) AS total_count, least(count(*), 20) AS total_by_max_count FROM
students, class WHERE students.hobby = 'Reading' Group BY students.class_id) as A

We also had a requirement: Perform some operation on students and criteria was – select only X (its 20 in an example below) number of students having a hobby as reading from each class. Here also we could have gone with above approach of iterating over records but instead, we used window function and achieved the same thing in less time

Query we used

query = "SELECT * FROM ( SELECT students.id,students.class_id,hobby,row_number()
OVER ( PARTITION BY students.class_id ORDER BY
'students.id' DESC) as rownum FROM students where students.hobby = "'eading') tmp
WHERE tmp.rownum <= 20"

So to conclude if you use power of database efficiently you can speed up your application to the great extent with current stack you are using.

Advertisements

ProTips: Elastic Search & Rails

In this blog post I am not going to tell you about how to setup elasticsearch with Rails. There are plenty of blog posts available for that. I referred this blog post to begin with elastic search and rails.

In this blog post, I will be showing how to set “ignore malformed” option. This may sound trivial but unfortunately the usage is neither in the ES-rails documentation nor I could find any content on the web.

I spent a few hours digging in the ES-rails code and finally deciphered it. Hope this helps you save the time.

Problem

Recently we started using elastic search in our project. One of the table had around 9 Million records. After configurations/setting we started the indexing. Monitored a few thousand records and indexes were correctly getting generated. However, at the end found out that the indexing of 5k records failed.  I was puzzled, why only some failed. On digging into ES logs, found out the issue – we had ‘21111-1-1’ value in a date field. This value is invalid for Date type field (we had Postgres). But, we somehow had these values in our DB (from a 3rd party source) and never realized it. But Elastic Search found those invalid for Date type and gave the following error –

failed to parse [date_modified]. Invalid format: \"21111-01-01\" is malformed at \"1-01-01\

Solution

Now, before we get to the solution, let us first see the requirements (project specific) –

  • Don’t change type of field in Elastic Search from date to text.
  • Don’t change data in actual tables as it was from a 3rd party service. So even if we had fixed it on the next sync we would have got the corrupt data.
  • Retain the values even if some were invalid. Rationale, 5k out of 9M is  just 0.056% so for such a low % did not want to loose the correct value for other records. Essentially DO NOT ignore the field.

I started searching and found this blog. It was pretty straight forward, just pass ignore_malformed parameter in properties.

But, before putting in the solution, we wanted to get to root cause. ? We were not at all specifying type of date_modified in our mapping. Then how Elastic Search identified it as type DATE? Was it because of the type in DB? Probably not.

By doing some queries we figured out that if we don’t specify type of field in mapping, then initially while creating the index that field will not appear in the mapping. But it will get added to the mapping when the first document gets indexed. The type of the field will be set based on the type of that filed in the FIRST document. Now, the first record in my table was having the CORRECT date value so it got set to DATE type. However, if the first value it self would have been INVALID then it would set the type to TEXT (as it would have got the value as “invalid date format”)

Hence, when it came across a record with invalid date it gave the error above.

Code

Specify “ignore_malformed” property in the settings (of elastic_search_model_rails).

settings index: index_settings do
  indexes :date_modified, type: 'date', ignore_malformed: true
end

Easier said than done!! As for this we had to go through the code of elastic_search_model for a couple of hours, as this was not mentioned in the README. We shall send PR to update the README. Also, if you are more curious, here is code snippet from gem which describes how anything passed to indexes is sent in properties.

def indexes(name, options={}, &block)
  @mapping[name] = options
  ....
end

Here we are passing

{type: "date", :gnore_malformed: true}

In the options and elastic_search_model in turns convert it to

{date_founded: {type: "date", ignore_malformed: true}}

And calls correct Rest API.

Note: It is always good to specify the type of the field in mapping because when I tried to index this document for the first time (without having the mapping for it) elastic_search set type as DATE, as my first record was DATE. Otherwise it would have been text and all searches on this field would have gone haywire.

Another important point, I was to index 9M records, which itself takes time. So, I came to know about it at a much later stage. As a result 99.95% records were already index. After I changed the mapping the ONLY way to go about is DELETE existing indexes and recreate. Needless to say it took the same amount of time again!!

Gmail: Get notified when recipient opens message

Think how awesome it will be if Gmail will show me whether recipient has read my mail or not just like WhatsApp! But, Unfortunately Gmail does not do this. There are some apps which allow you to add this feature. I was curious to know how this could be done if Gmail don’t allow this and I actually got chance to implement this on my project.

Few days back I had this requirement. We were sending emails from our iOS app and we wanted to generate feed for user when recipient opens email. After doing some research I came across this document.

It’s very easy to track email opens you just need to add one image in your email and link to that image should point to your server. When mail will be opened it will make call to your server and you can track email open 🙂

I have just appended following to mail body as mails were also sent from app on the behalf of user.

<div style= ''><img src='<url_to_your_api>?track_id=#{track_id}' alt='' width='0.1' height='0.1' border='0'></div>

I had added following at style.

display: none !important;

Notice that I am sending track_id along with link to track each email separately. When I gen call to this specified API I find email in my database with track_id and then generate feed for user.

But here are some known issues

  • When you send message to multiple recipients you can’t track them separately because you can compose 1 mail with different user_ids or track_ids.

  • If you open mail from sent box that will also be tracked.

  • If user has disabled auto download of images then mail won;t be tracked as image will not be downloaded

  • If user has Inbox app installed then mail will be tracked as inbox app shows previews of images with mail

Now tracking email opens is not Blackbox for me 🙂 Hope this will help someone when tracking an email.

ProTips: Google calendar one-way sync (from google to application)

In this blog post, I will walk you through my experience of google calendar one-way sync (from google to my application) and will tell you some key points which I understood about google calendar recurring events and normal events.

Google documentation for this is very nice but you need to experiment a lot to understand this so thought of writing it down.

Recently, I was working on one rails backend API only app which has iOS and angular as frontend. In this project one of the feature was syncing user’s google/salesforce/exchange calendar and send push notification as and when meeting is created or updated and that meeting can be single or recurring. Of course, the  user has given permissions to do so 🙂

The obvious question – why not do it in iOS app?

Yes, but we cannot because the user can login using the web app too. And the data on these two need to be in sync. More over its not a plain sync but an intelligent one. We have a SMART calendar that needs some processing.

The first we picked up was Google calendar sync. The requirement was to store meetings of six months. But this window changes every day as we need +3 months from today and -3 months before today.
NOTE: In our case it was only one-way sync. Google to our database.

In any calendar there are two types of events, and we had to sync both these types.
1. Single
2. Recurring. Recurring are a bit complex. If don’t already understand, perhaps it would be good to read documentation.

Also, syncing means:
1. always getting a delta after the last sync (for performance obviously)
2. get created/updated and deleted events.

My initial thought was that syncing single event would be easy but a bit difficult to deal with recurring ones.

I went trough the API Documentation. Google API accepts parameters which are timeMax, timeMin.

We already had the Google auth integrated in our app so it was just a matter of calling the API through the rest client and thought my job is done. So here was my initial api call –

     url = "https://www.googleapis.com/calendar/v3/calendars/primary/events?singleEvents=true&amp;alwaysIncludeEmail=true&amp;timeMax=#{CGI.escape(time_max.to_s)}&amp;timeMin=#{CGI.escape(time_min.to_s)}"

singleEvents true was used to get each instance of recurring events (See documentation).

When I called this API with RestClient I got expected results for created and updated events so I thought now my job is done as I don’t have to deal with recurring events (as I was getting each instance separately). So I wrote code to create/ update meeting in my database based on unique user id and event id. It was very simple but when our client asked for notification of deleted events we faced issue. Why? because when we read documentation again Google calendar gives you deleted events if you pass showDeleted parameter to api but not always meaning, If showDeleted and singleEvents both are True, only single instances of deleted events (but not the underlying recurring events) are returned. And we wanted all occurrences of deleted events.

Then we again read documentation and came across syncToken parameter. When we pass this parameter we get nextSyncToken which can be stored in database and next time while querying if we pass this parameter we get events which are created/ updated/ deleted after last sync time. This was exactly we wanted. Earlier I was fetching 6 month’s event each time and comparing with database for existence and updating based on time i.e. I was wasting API calls + server processing.

Then we decided to change our approach and did following

  • If nextSyncToken is not present in database then fetch 6 months meetings and store nextSyncToken obtained from last call to database
    url = "https://www.googleapis.com/calendar/v3/calendars/primary/events?alwaysIncludeEmail=true&amp;timeMax=#{CGI.escape(time_max.to_s)}&amp;timeMin=#{CGI.escape(time_min.to_s)}"
  • If nextSyncToken is present in database then only fetch updated events and store it in database
    url = "https://www.googleapis.com/calendar/v3/calendars/primary/events?alwaysIncludeEmail=true&amp;syncToken=#{self.calendar_sync_token}"

but if you notice we have not passed singleEvents option to API. Why? Because when I was sending singleEvent true parameter I was getting each instance of recurring meeting. The API returns max 250 events in a single call. So, if there are 2000 occurrences of any recurring event you will have to paginate through all of them (using nextPageToken) to get the nextSyncToken on the last page. It is a no brainer that processing time increased.

Need a smart way to handle it – After some research and reading I figured out, not to pass singleEvents parameter and if we got recurring_event_id back in response fetch the recurring events separately using this API to maintain proper sync.

So here are some key points which I understood about google one-way calendar sync

  • If you don’t want to deal with recurring events your job is very simple just call API and skip recurring events by parsing response.

  • If you have to deal with recurring event but don’t want to deal with deletion of event just pass singleEvent parameter to true. This will increase some calls and server processing but will make your life easy 🙂

  • BUT, When you want to deal with deletion of recurring event don’t pass singleEvent to true. Google will give you recurring meeting as single event with following data in response. Using this you can figure out whether that is recurring event or not.

{ 
  ...
 "recurrence": [
  "RRULE:FREQ=WEEKLY;BYDAY=TU"
  ],
  "id": "f8j5134grukmqufapvn0q0p2bq0"
  ...
}

Now you can also parse “recurrence” tag and then calculate & set recurring schedule in your database. But, there are multiple formats “recurrence” and I did not want to deal with them. Hence, I separately captured recurring events by passing recurring event id to this API and made my life simple.

How to handle modifications to recurring events (a.k.a. “exceptions”)

CASE-1: Modification to Single Event
A] When you update only single event from recurring event google will create a new event from updated event and you will get that in next sync call. The new event will have parent recurring event id and you can use this to do any processing on your server.

{ 
  ...
  "id": "pmu0u4psc1ebq2vun468vpn9ms_201611123103000Z",
  "recurringEventId": "pmu0u4psc1ebq2vun468asdn9ms",
  ...
}

B] When you delete single event from recurring event you gets following response

 {
  "kind": "calendar#event",
  "etag": ""296070510840"",
  "id": "pmu0u4psc1ebq2vun468vpns_20161201T103000Z",
  "status": "cancelled",
  "recurringEventId": "pmu0u4psc1ebq2vun468vps",
  "originalStartTime": {
    "dateTime": "2016-12-01T16:00:00+05:30",
    "timeZone": "Asia/Calcutta"
  }
}

CASE-2: Modification to Multiple Events
A] When you update multiple by using “Change all following meetings”. For e.g. Let’s say you have 1 recurring event with 10 occurrences you update 6th meeting and all it’s following events now in this case google creates 1 more recurring events. It keeps the 1st 5 meetings in first recurring event and will create new recurring event with updated 5 (6 to 10). The sync API will return BOTH. And hence forth for syncing you can treat these as two separate recurring events.

B] When you delete following recurring events from particular event in response you get recurring event id and you can again resync.

I wasted a couple of days to figure all this out. Hopefully this will save someone’s time.

Lesson Learnt: Always read documentation carefully and think of all future cases and design accordingly though the requirements may not have asked for some feature.

Pro-tip: Scaling properly with Nginx worker_process and max_pool_size

This post talks about worker_process and passenger_max_pool_size options for nginx and how these configurations play an important role for scalability.

Recently our client reported that API performance was  bad and it was taking a few minutes to respond! But, when we tested this API this was quite efficient and was responding in few milliseconds. What was wrong? As the number of users increased, the number of concurrent requests to the server had also increased. Initially, it was only 100’s of request per second coming to a server but now it was about 4 k requests per second and we realised that we had not scaled our server to handle this load.

When this problem arrived I thought of breaking down the problem in steps.

  • First I checked RDS instance. It was using 20%
  • I revisited all queries they were optimized enough. So database was not bottleneck
  • Sent unauthorized request to the server so with less processing I will come to know response time and it took 20 sec.
    20 secs for unauthorized request ?  NOT at all acceptable.

Then thought of checking web server configurations.

I was using Nginx as the web server and Passenger as an application server. So The initial thing I noticed was the number of worker_processes. They were 5.

worker_processes  5;

worker_process directive is responsible for letting our virtual server know many workers to spawn once it has become bound to the proper IP and port(s).

And I was using 1024 worker_connections. Worker_connection sets the maximum number of simultaneous connections that can be opened by a worker process.

events {
    worker_connections  1024;
}

So mathematically my server can handle 1024 * 5 connections, so why is there still a problem?

After reading this I realised that the value of worker_processes depends on the number of cores you have. I was using single core machine and 5 processes, so how will that work?

Just imagine that there is a shop that is handled by one person and billing is done on one computer. Now, as sales increases, the shopkeeper hires new people but he now doesn’t have space for these new employees to operate! So at a time only 1 person can work at a time and the rest just sit idle. Exactly the same thing was happening in our case!

It is common practice to run 1 worker process per core. Anything above this won’t hurt your system, but it will leave idle processes usually just lying about.

So, the shopkeeper needs to shift to a larger space for people to operate!

Similarly, we decided to increase our hardware capacity and opted for quad core server and redirected our request to that server. This helped the  API performance a significantly but still we were not happy with this performance boost. We had a server that was 4 times more powerful that the earlier one  and still request time was a few seconds.

So after researching a little more, we realized our passenger settings are not optimized. We were using process based passenger but with only 6 pools.

Passenger max_pool_size defines The maximum number of application processes that may simultaneously exist. A larger number results in higher memory usage but improves the ability to handle concurrent HTTP requests. When I checked our CPU usage it was around 25 to 30 so we were not utilizing our server at its best. Here are more details about max_pool_size.

Back to the shopkeeper now. The shopkeeper increased employees but kept only 1 billing computer! Hence, there was lesser efficiency even with more room to operate. So, he also invests and increases the number of computers for billing!

Similarly, we increased Passenger_max_pool_size to 15 and it really boosted performance to the great extent.

How we decided this number 15. It was calculated by the following formula

passenger_max_pool_size  = (TOTAL_RAM * 0.75) / RAM_PER_PROCESS

Our total RAM was 15GB and RAM_PER_PROCESS was around 300 to 400 mb.

To get RAM_PER_PROCESS we recorded stats of passnger-status at different interval of time.

passenger

So by above formula we can set number to more than 30 but we decided to keep it 15 as we wanted to keep room for other processes.

Almost many of us work in a team and we know the key skill of each team member and try to use that to deliver the best product similarly here web server, app server, hardware works in the team we always write good code but if know such things that helps us to deliver the best product.

I hope my experience will help others :).

Confessions of a first time speaker

This was second RubyConfIndia I attended and I was excited, happy and scared this time. Why? Because I was one of the speakers. I never thought that I would be able to stand in front of 500 people and deliver a talk. But I did it!

The journey started with Josh Software. In Josh, when you join the company, you are supposed to deliver a talk of 20 minutes internally and this is compulsory for everyone. 2 years ago when it was my turn to present talk, I realised that my greatest fear is public speaking! At that time I delivered talk for a topic that was not very interesting and I was very scared. It was then that I decided to overcome this fear. I had this fear primarily because English has not been my first language. Last year Anuja and I decided to give a lightning talk at RubyConfIndia to get some confidence of speaking on stage. We gave a lightning talk but at that time we had prepared for a month but since I knew my part verbatim, that talk didn’t give me much confidence.

A few months ago when the CFP was open, we were again thinking of giving some lightening talk. Recently at Josh we won the CodeCuriosity competition, so I thought we could  speak about open source contributions. I asked Anuja if we could give a lightening talk on this and finally after discussions we decided to submit the CFP; and it got selected! When I saw a tweet that our talk got selected, I was happy excited and suddenly tensed. We wanted to talk on open source but we did not want to share just statistics and preach to the audience to do open source. Since  we had only just begun contributing to open source, we had lots of fears and realised that we should share our experiences, so that even the experts in the  audience will relate to it and newbies will get some motivation.

Talking about your fears is really not easy!! However, the Josh team was with us all the way 🙂 One day, we had discussions about everyones fear and I realized I am not the only one who faces these. Everyone has faced this sometime in their life and its really important to talk about it, so we decided we will talk about our experience of our first comment on an issue, filing our first issue, submitting our first Pull Request, writing our first Blog post and even about giving our first Lightening talk. I still have the fear of making mistakes about which I didn’t speak at the conference! As Anuja has 4+ years of  experience and her English is better than mine, I was haunted by the thought of making some mistake. People would make judgements about employees of Josh and till now all the speakers from Josh had set a high expectations – I feared  what would happened if I will mess up our talk and I thought of quitting many times but realised that was not an option!

Finally we started our preparation!! I was practicing in front of a mirror and listening to my recording again and again really helped me a lot. Now when I listen to my first recording and last recording, I can make out a huge difference.

We had around 20 days to prepare and thankfully my project was on track, so I could give time to practice my talk 🙂 We decided that our presentation will have only images and searching for images that are under creative commons license is really difficult but we did it!! We had some confidence while preparing slides because of our previous lightning talk and as we were sharing our own experiences, we realised that we could manage!

lemeridian

Finally, it was the day to leave for Kerala and I was really excited because I was flying for first time. I enjoyed the flight. We reached there a day before the conference at 4 PM. The view from our room at Le Meridian was really awesome and after seeing that view I forgot everything and was enjoying it for a few minutes.

When we did Audio/Video testing that evening, I was worried because the font was blurred on the display. Luckily,  the technicians fixed that issue  but I was still scared that it won’t work on the day of talk! After seeing the presentation on the large LED display, we thought of some more changes in the slides. We did those hoping that the changes will be seen properly on screen. Because of these thoughts I couldn’t sleep that night. Our talk was on second day and I decided to attend all talks. This time I was more willing to attend talks than last year because I thought that as a speaker if no one attends my talk, how will I feel? So I attended all the talks, especially the lightening talks. I was worrying about the content so much that just before the conference party started, I thought about sitting in my room and making some changes. Luckily, good sense prevailed and I had enjoyed all talks of first day and the party too 🙂

D-day arrived!  I was still scared to face the public but at that time Gautam told me forget everything, go on stage and enjoy. This time I was a bit more scared because I was sharing one of Gautam’s talking experience and I had put his photo in which he was looking scared and he didn’t even know this. Some of my colleagues were saying he won’t like this if you troll him and put it in your presentation. I also asked  Sethu many times should I show it to Gautam? Would he get angry? Finally I let this thought go and decided to see what happens.

The talk went off very well. Sorry Gautam, but sharing your experience was really inspiring for me, even though we showed some scared photo of yours!  We got standing ovation after talk which was really big thing for me.

We saw some tweets after our talk.

tweet

Cd-lwYvWIAApCQwTejas was feeling guilty for every pull request, issue which he has ignored. After our talk, even  if 5% of the attendees overcome their fears, I will think our talk was a great success.

After our talk Katrina said that this talk was really needed and  we should keep doing talks like this. Sarah said it was a really nice talk and we also got comments from other speakers and audience and icing on the cake was that  Matz liked our talk. Matz asked Gautam when the slides and video would be uploaded for our talk!

Matz told us that it was really nice talk. What more can a rubyist want expect !!

This blog was an attempt to overcome one of the fears I talked about during my talk 🙂

Slides of our talk are available here. Now I have a dream of giving solo talk 🙂

[Gautam helped edit this talk and help out with the grammar. Written English is also not my strength but I intend to overcome that too!]

Gmail: Send email API In ruby on rails

A few days ago, I had a requirement to send email on behalf of users. Of course, the  user has given permissions to do so 🙂 We were not using any particular gem in the project for calling Google API, instead we were using a Rest Client.

Initially I went trough the API Documentation and since the basic architecture is already there, it was just a matter of calling the API through the rest client and my job is done. However, as usual, it’s not as easy as it looks.

After reading the API Docs, I figured out that I need to pass the basic parameters (to, cc, bcc, subject, body) in raw text and as per the format of RFC 2822, in a  bese64url encoded string. But in the doc they didn’t mention that to, cc, etc should be separated by ‘,’ or ‘\n’. And I started complaining about the poor documentation to my colleagues, someone correctly pointed out it should be in RFCC 2822 format. What indeed is RFC 2822? When I did  read rfc 2822, I realized my mistake. The documentation was correct but who cares to read the doc carefully?

So according to RFC 2822 each field should be separated by ‘\n’ and body should be followed by header and separated by extra empty line. Following is code snippet I used to create raw string.

    def convert_to_base_64(to, cc, bcc, body, subject)
      message = &quot;to: #{to}&quot;
      message = message + &quot;\n&quot; + &quot;subject: #{subject}&quot; unless subject.blank?
      message = message + &quot;\n&quot; + &quot;cc: #{cc}&quot; unless cc.blank?
      message = message + &quot;\n&quot; + &quot;bcc: #{bcc}&quot; unless bcc.blank?
      message = message + &quot;\n&quot; + &quot;content: multipart&quot;
      message = message + &quot;\n&quot; + &quot;Content-Type: text/html&quot;
      message = message + &quot;\n\n&quot; + &quot;#{body}&quot;
      Base64.urlsafe_encode64(message)
    end

Note that I have added 2 ‘\n’ before body.

I wasted my day to figure this out might be this will save someone’s time.

Lesson Learnt: Always read documentation carefully.