Learnings from my first solo talk at RubyConf TH

I spoke at International RubyConf alone for the first time and it is one of the best experience I ever had at a conference.

It all started few years back when I joined Josh. We had to deliver internal talk and I realised my biggest fear is Public Speaking. I had decided to overcome this fear therefore I delivered lightening talk and a talk at RubyConf India with a colleague. Now I wanted to move ahead and give solo talk!

I submitted CFP at few of the conferences but it didn’t get selected. I thought it’s not my cup of tea but every time I attend conference the question I used to get from my dad was when are you going to speak at conference? At Josh Software we had a great session on Goal Setting early this year and we all were asked to publish our Goals on internal channel. One of my Goal was to speak at conference and take my parents along with me. Publishing your Goals in your team really helps. Once you publish the Goal you work towards achieving them! I have already achieved 50% of the goals which are supposed to be completed in next 5 years.

In this blog post I will highlight few of the learnings from my first solo talk!

Practise your talk in front of mirror

When you practise your talk in front of mirror you realise what needs to be improved! So always practise your talk in front of mirror.

Record your talk while practising

Recording helps you to analyse your talk on your own! I had recorded my talks 10 times and when I listen first recording and last recoding it had improved a LOT.

Time your talk

Time your talk so you can add/remove few points if required.

Have presenter notes ready

All the tools has facility of presenter notes and when you present your talk you can see those notes on your screen.Those notes are there to give you confidence. I was always worried what if I get blank on the stage? But after adding notes I was bit relaxed as I had feeling that there will be some one who will remind me if I get blank. Though I didn’t use these notes at conference but surely it boosted my confidence!

Check license of the image before you use it

We always tend to add images in our slides because it makes easy to keep audience in sync. But few images are not allowed to be used so always check licence of the image before you use it. Your talk gets recorded so never forget this.

Talk slowly

Talk you are going to deliver is new for most of the people in audience. Audience take some time to figure out what you are talking so talk slowly and repeat your important points twice.

Never read the slides

If you read the slides you loose attention of the audience so never read your slides. Your slides should have highlighted points only. If your slides has more content, audience will start reading it which will distract them.

Look at the audience when you talk

This helps keeping audience engaged in your talk rather than phone 🙂

Make friends before conference start so you have few known faces in the audience

At the time of RubyConf TH There was a cruise dinner arranged at chao phraya river. It was a excellent idea as all speakers got chance to interact with. I knew few of them from DeccanRuby Conf and RubyConf India but many of them were new faces to me but throughout the cruise I didn’t feel like I am new here. Thanks to awesome ruby community. Made some #rubyfriends during this party 🙂

Because of this I had few known faces in the audience which helped me for my own moral support.

Have some punch in first 3 minutes of your talk.

It helps you to get comfortable on stage. In my talk I didn’t have many punches. But when I started the demo of my talk there was laughter and claps from audience at one place. Which gave me some time to cool down and made me comfortable on stage.

Give demo of your talk in front of group of people you are comfortable with, for example meetup

I had given demo of my talk to Josh team. I got really good feedback/ tips from the team also got confidence.

Have backup plan if your laptop crashes

I was all set for my talk and just before I travel my laptop crashed And Apple store said it will take 3-5 days to diagnose the issue 𘦠So I had to recreate my presentation. I managed to do it in time. So always keep back up plan ready! Mail slides and PDF to your self and keep it in pen drive.

If you have fear of public speaking start with lightening talk

Many of the conferences has concept of lightening talks. This is a great stage to start with. You get chance to speak in front of 100â™s of people. Also these talks get recorded. So you can see video later and Enjoy your talk. Also you get chance to learn from mistakes you have made! I had delivered my first lightening talk at DeccanRuby Conf. Timing for these talks are from 3 to 5 minutes.

Don’t give up if talk is not selected

Yes it happens. My 7-8 CFP’s got rejected (different topics) before this selection. But in every conference there are 100’s of CFPs and selection committee has to choose around 10% of the CFPs. There are chances that similar talk was delivered in earlier edition of the conference or there are few better topics than your. So never get disappointed if talk is not selected. Keep trying another conference might be waiting for you!

Submit same talk at multiple conferences

If your talk does not get selected at one conference, it might get selected in other one. I met one person at DeccanRuby Conf whose CFP was not selected at RubyConf TH but got selected in some other conference where mine was not selected. So it totally depends on what kind of topics conference demands.

My first solo talk went well. A lot of people came up to me to say that they liked the talk. It was all very overwhelming. Here are photos with all the speakers and attendees.

It was an amazing conference. This was first edition of RubyConf TH – Awesome venue, amazing people, cruise party. Kudos to organisers and volunteers for pulling this off.

You can find slides of the talk here.

Big thanks to Josh team for valuable feedback and conference budget 🙂

Advertisement

GoLang with Rails

GoLang with Rails? Wondering why one would use GoLang with Rails?

Read on to find out!!

This is purely based on our requirement but can surely benefit others looking forward to similar use-case. We had a web app written in Rails but facing performance bottleneck while processing large chunk of data. The natural choice seem to use power of GoLang concurrency.

In order to use GoLang with our Rails app few approaches came to my mind. But I found one or the other flaw:

  • Write api’s in GoLang APP and route request from nginx based on request URL. Simple but for using this approach we would also need to add authentication in GoLang app. So authentication will be Rails as well as in GoLang – This doesn’t seem correct, because if I had to change authentication mechanism, would need to make changes in two apps.

  • Use RestClient and call GoLang apis from Rails application. So request will be routed to Rails app and it will call api from GoLang app and serve response. Here I will achieve some level of performance but again my Rails app will have to serve request which GoLang app can directly serve and the response has to wait for response from GoLang app.

  • Use FFI. Using FFI we can call GoLang binary directly. You can watch this video to see how it can be done. This seems fine at first, but what if I had to load balance moving GoLang app to other server?

So which approach did I follow?

We went with NONE of the above, but a 4th idea using rack_proxy gem.

Here is sample code for middleware we wrote

class EventServiceProxy < Rack::Proxy
  def initialize(app)
    @app = app
  end

  def call(env)
    original_host = env["HTTP_HOST"]
    rewrite_env(env)
    if env["HTTP_HOST"] != original_host
      perform_request(env)
    else
      @app.call(env)
    end
  end

  def rewrite_env(env)
    request = Rack::Request.new(env)

    if request.path.match('/events')
      if env['warden'].authenticated?
        env["HTTP_HOST"] = "localhost:8000"
        env['HTTP_AUTHORIZATION'] = env['rack.session']['warden.user.user.key'][0]
      end

      env
    end
  end
end


And we inserted our middleware just after Warden (Devise uses this internally for authentication)

config.middleware.insert_after(Warden::Manager, EventServiceProxy)

In above code snippet we are just proxing our request to localhost:8000 where GoLang App is running and setting up user_id in header. Warden adds authenticated user_id in env['rack.session']['warden.user.user.key'][0] so now we know who is logged in at GoLang App from header.

We added middleware in GoLang which extracts user_id from header and sets curretUser details in context.

Important Note
Our GoLang application is exposed only to Rails application and not to the whole world so we are sending user_id in header.

The main advantages we saw using this approach are:

  • We could use existing authentication mechanism used in Rails application
  • If needed we can add load balancer to our Rails and/or GoLang application which is micro service.
  • If we have used FFI we had to put binary on same machine but here we can have application and GoLang service on different machines.
  • As request will be rewritten from Rack it saved redirect and going through entire stack of rails app.

This could be used with any framework similar to Rails.

By using above approach now we can use power of GoLang when needed and development speed of Rails 🙂

Use power of database to speedup your application!

In this blog post, I am going to tell you some queries/ tricks to speed up your application by using the power of database. I am using postgres and activerecord rails so all queries are related to these, but you can relate it with your database.

Recently I got some performance issue when our user base increased many folds. We did some optmization and would like to share the tips:

Note: As I am using ActiveRecord most of the examples have ActiveRecord syntax with corresponding SQL statement

Select only required fields from database – Not all

Mostly developers miss to specify the required columns and instead select all fields from database which leads to performance degradation.

Let us say my User Table has 100 users

users = User.all
User Load (15.4ms) SELECT "users".* FROM "users"

Above query took 15.4 ms as it is selecting all columns from the table. But, the query below took only 3.3ms.

users = User.all.pluck(:name)
(3.3ms) SELECT "users"."name" FROM "users"

Here we are not only achieving query time improvement but also memory required to hold objects will be lesser.

Add compound index when you want to use limit + order OR Avoid using limit + order

When we use limit with an order it makes the query too slow. The time can be substantial if your table size is large! I was using limit 1 and order some column it was taking few seconds to respond. After searching I found this quickly. The ideal solution is to use a compound index. However, if you are sure that your result set is going to be very small (say 10s of records) then instead of adding compound index you can take the result in an array and returned the first record.

In my case, I knew I won’t get more than 10 records if I collect it in an array. Hence, I used this additional trick.

Company.unscoped.where(subsidiary: false).where("name like 'josh%'").order(:id).limit(1)

Above query was taking 2928.1ms whereas the following query took 2.3ms

Company.unscoped.where(subsidiary: false).where("name like 'josh%'").order(:id)

NOTE: My Companies table has about 1 Million records.

Use JOIN instead of NOT IN query

NOT IN and IN statements are not well optimized. They are easy, but usually, they can be replaced by a simple JOIN statement making they the query way more efficient.

Now say I have a player from England and I want to count how many matches he played where match score is 70 but the match was not against India.

I can simply write it as

player.matches.where(score: 70).where.not(opponent_team_id: Team.find_by(country: 'India').id).count

(166789.8ms) SELECT COUNT(*) FROM "matches" WHERE "matches"."player" = $1 AND "matches"."score" = 70 AND ("matches"."opponent_team_id" NOT IN (SELECT "teams"."id" FROM "teams" WHERE "teams"."country" = 'India')) [["player_id", 135]]

Above query is simple and will return results in few ms but I have ran it on millions of record and it took 166789.8ms

Now let’s see how joins query for above behaves

Match.joins(:team).where(player_id: 135, score: 70).where.not(opponent_team_id: Team.find_by(country: 'India').id).count

(14.5ms) SELECT COUNT(*) FROM "matches" INNER JOIN "teams" ON "teams"."id" != "matches"."team_id" WHERE "matches"."player_id" = 135 AND "matches"."score" = 70

It took only 14.5ms which is minuscule as compared to the “Not In”

Use md5 digest if you want to search on text column

Note: We can use this trick only if we want to do an exact match on a string.

In my case I had one field on which I knew I am going to do only exact match and that field was of text type. When we were doing match on that column query was taking around 3000ms.

User.where(search_column: 'https://abcdef')

(3139.9ms) SELECT "users".* FROM "users" WHERE "users"."search_column" = 'https://abcdef'

The obvious solution was to index search_column but instead, we stored one more column md5_search_column and applied search on that which reduced our search time to few ms.

User.where(md5_search_column: '7c602c2c776fb33c188957bb6bdd8928')

(1.9ms) SELECT "*".* FROM "users" WHERE "users"."md5_search_column" = '7c602c2c776fb33c188957bb6bdd8928'

Now, why didn’t we index search_column instead of adding the new column and storing a value in that column? For two reasons

  • md5 digest will be always 32 characters long so indexing on 32 characters will be more efficient than text column
  • Searching on 32 characters will be more efficient than text column

Use raw sql query instead of going ActiveRecod way

Above optimizations were easy and could be integrated with active record easily. But there are some cases where we need to use raw sql queries and bypass ActiveRecord. Wait it’s not like I don’t like ActiveRecord. I love it simply because I can switch databases any time. For instance, I could switch to Postgres from Sqlite with minimal changes. However, when your data size increases and you know for sure that you are not going to change database you should use database power. IMO, this is the case with all production systems, you do not change the DB!

Said that, for eg. we used sum, least, window functions to get results in milliseconds.

I had to use raw sql for the following example. I wanted to calculate the number of students present in school based on some criteria but a condition was to take only X (In example below its 20) number of maximum students per class.

Now the simplest thing I could do was

  • Take distinct classes from database
  • Loop on every class to calculate count
  • Calculate sum of count calculated in step 2

But think how inefficient that would be. Instead, database provides least and sum functions which can be used to achieve above

ex

(0.7ms) select sum(A.total_count) as total, sum(A.total_by_max_count) as total_by_max_students from
(SELECT count(*) AS total_count, least(count(*), 20) AS total_by_max_count FROM
students, class WHERE students.hobby = 'Reading' Group BY students.class_id) as A

We also had a requirement: Perform some operation on students and criteria was – select only X (its 20 in an example below) number of students having a hobby as reading from each class. Here also we could have gone with above approach of iterating over records but instead, we used window function and achieved the same thing in less time

Query we used

query = "SELECT * FROM ( SELECT students.id,students.class_id,hobby,row_number()
OVER ( PARTITION BY students.class_id ORDER BY
'students.id' DESC) as rownum FROM students where students.hobby = "'eading') tmp
WHERE tmp.rownum <= 20"

So to conclude if you use power of database efficiently you can speed up your application to the great extent with current stack you are using.

ProTips: Elastic Search & Rails

In this blog post I am not going to tell you about how to setup elasticsearch with Rails. There are plenty of blog posts available for that. I referred this blog post to begin with elastic search and rails.

In this blog post, I will be showing how to set “ignore malformed” option. This may sound trivial but unfortunately the usage is neither in the ES-rails documentation nor I could find any content on the web.

I spent a few hours digging in the ES-rails code and finally deciphered it. Hope this helps you save the time.

Problem

Recently we started using elastic search in our project. One of the table had around 9 Million records. After configurations/setting we started the indexing. Monitored a few thousand records and indexes were correctly getting generated. However, at the end found out that the indexing of 5k records failed.  I was puzzled, why only some failed. On digging into ES logs, found out the issue – we had ‘21111-1-1’ value in a date field. This value is invalid for Date type field (we had Postgres). But, we somehow had these values in our DB (from a 3rd party source) and never realized it. But Elastic Search found those invalid for Date type and gave the following error –

failed to parse [date_modified]. Invalid format: \"21111-01-01\" is malformed at \"1-01-01\

Solution

Now, before we get to the solution, let us first see the requirements (project specific) –

  • Don’t change type of field in Elastic Search from date to text.
  • Don’t change data in actual tables as it was from a 3rd party service. So even if we had fixed it on the next sync we would have got the corrupt data.
  • Retain the values even if some were invalid. Rationale, 5k out of 9M is  just 0.056% so for such a low % did not want to loose the correct value for other records. Essentially DO NOT ignore the field.

I started searching and found this blog. It was pretty straight forward, just pass ignore_malformed parameter in properties.

But, before putting in the solution, we wanted to get to root cause. ? We were not at all specifying type of date_modified in our mapping. Then how Elastic Search identified it as type DATE? Was it because of the type in DB? Probably not.

By doing some queries we figured out that if we don’t specify type of field in mapping, then initially while creating the index that field will not appear in the mapping. But it will get added to the mapping when the first document gets indexed. The type of the field will be set based on the type of that filed in the FIRST document. Now, the first record in my table was having the CORRECT date value so it got set to DATE type. However, if the first value it self would have been INVALID then it would set the type to TEXT (as it would have got the value as “invalid date format”)

Hence, when it came across a record with invalid date it gave the error above.

Code

Specify “ignore_malformed” property in the settings (of elastic_search_model_rails).

settings index: index_settings do
  indexes :date_modified, type: 'date', ignore_malformed: true
end

Easier said than done!! As for this we had to go through the code of elastic_search_model for a couple of hours, as this was not mentioned in the README. We shall send PR to update the README. Also, if you are more curious, here is code snippet from gem which describes how anything passed to indexes is sent in properties.

def indexes(name, options={}, &block)
  @mapping[name] = options
  ....
end

Here we are passing

{type: "date", :gnore_malformed: true}

In the options and elastic_search_model in turns convert it to

{date_founded: {type: "date", ignore_malformed: true}}

And calls correct Rest API.

Note: It is always good to specify the type of the field in mapping because when I tried to index this document for the first time (without having the mapping for it) elastic_search set type as DATE, as my first record was DATE. Otherwise it would have been text and all searches on this field would have gone haywire.

Another important point, I was to index 9M records, which itself takes time. So, I came to know about it at a much later stage. As a result 99.95% records were already index. After I changed the mapping the ONLY way to go about is DELETE existing indexes and recreate. Needless to say it took the same amount of time again!!

Gmail: Get notified when recipient opens message

Think how awesome it will be if Gmail will show me whether recipient has read my mail or not just like WhatsApp! But, Unfortunately Gmail does not do this. There are some apps which allow you to add this feature. I was curious to know how this could be done if Gmail don’t allow this and I actually got chance to implement this on my project.

Few days back I had this requirement. We were sending emails from our iOS app and we wanted to generate feed for user when recipient opens email. After doing some research I came across this document.

It’s very easy to track email opens you just need to add one image in your email and link to that image should point to your server. When mail will be opened it will make call to your server and you can track email open 🙂

I have just appended following to mail body as mails were also sent from app on the behalf of user.

<div style= ''><img src='<url_to_your_api>?track_id=#{track_id}' alt='' width='0.1' height='0.1' border='0'></div>

I had added following at style.

display: none !important;

Notice that I am sending track_id along with link to track each email separately. When I gen call to this specified API I find email in my database with track_id and then generate feed for user.

But here are some known issues

  • When you send message to multiple recipients you can’t track them separately because you can compose 1 mail with different user_ids or track_ids.

  • If you open mail from sent box that will also be tracked.

  • If user has disabled auto download of images then mail won;t be tracked as image will not be downloaded

  • If user has Inbox app installed then mail will be tracked as inbox app shows previews of images with mail

Now tracking email opens is not Blackbox for me 🙂 Hope this will help someone when tracking an email.

ProTips: Google calendar one-way sync (from google to application)

In this blog post, I will walk you through my experience of google calendar one-way sync (from google to my application) and will tell you some key points which I understood about google calendar recurring events and normal events.

Google documentation for this is very nice but you need to experiment a lot to understand this so thought of writing it down.

Recently, I was working on one rails backend API only app which has iOS and angular as frontend. In this project one of the feature was syncing user’s google/salesforce/exchange calendar and send push notification as and when meeting is created or updated and that meeting can be single or recurring. Of course, the  user has given permissions to do so 🙂

The obvious question – why not do it in iOS app?

Yes, but we cannot because the user can login using the web app too. And the data on these two need to be in sync. More over its not a plain sync but an intelligent one. We have a SMART calendar that needs some processing.

The first we picked up was Google calendar sync. The requirement was to store meetings of six months. But this window changes every day as we need +3 months from today and -3 months before today.
NOTE: In our case it was only one-way sync. Google to our database.

In any calendar there are two types of events, and we had to sync both these types.
1. Single
2. Recurring. Recurring are a bit complex. If don’t already understand, perhaps it would be good to read documentation.

Also, syncing means:
1. always getting a delta after the last sync (for performance obviously)
2. get created/updated and deleted events.

My initial thought was that syncing single event would be easy but a bit difficult to deal with recurring ones.

I went trough the API Documentation. Google API accepts parameters which are timeMax, timeMin.

We already had the Google auth integrated in our app so it was just a matter of calling the API through the rest client and thought my job is done. So here was my initial api call –

     url = "https://www.googleapis.com/calendar/v3/calendars/primary/events?singleEvents=true&amp;alwaysIncludeEmail=true&amp;timeMax=#{CGI.escape(time_max.to_s)}&amp;timeMin=#{CGI.escape(time_min.to_s)}"

singleEvents true was used to get each instance of recurring events (See documentation).

When I called this API with RestClient I got expected results for created and updated events so I thought now my job is done as I don’t have to deal with recurring events (as I was getting each instance separately). So I wrote code to create/ update meeting in my database based on unique user id and event id. It was very simple but when our client asked for notification of deleted events we faced issue. Why? because when we read documentation again Google calendar gives you deleted events if you pass showDeleted parameter to api but not always meaning, If showDeleted and singleEvents both are True, only single instances of deleted events (but not the underlying recurring events) are returned. And we wanted all occurrences of deleted events.

Then we again read documentation and came across syncToken parameter. When we pass this parameter we get nextSyncToken which can be stored in database and next time while querying if we pass this parameter we get events which are created/ updated/ deleted after last sync time. This was exactly we wanted. Earlier I was fetching 6 month’s event each time and comparing with database for existence and updating based on time i.e. I was wasting API calls + server processing.

Then we decided to change our approach and did following

  • If nextSyncToken is not present in database then fetch 6 months meetings and store nextSyncToken obtained from last call to database
    url = "https://www.googleapis.com/calendar/v3/calendars/primary/events?alwaysIncludeEmail=true&amp;timeMax=#{CGI.escape(time_max.to_s)}&amp;timeMin=#{CGI.escape(time_min.to_s)}"
  • If nextSyncToken is present in database then only fetch updated events and store it in database
    url = "https://www.googleapis.com/calendar/v3/calendars/primary/events?alwaysIncludeEmail=true&amp;syncToken=#{self.calendar_sync_token}"

but if you notice we have not passed singleEvents option to API. Why? Because when I was sending singleEvent true parameter I was getting each instance of recurring meeting. The API returns max 250 events in a single call. So, if there are 2000 occurrences of any recurring event you will have to paginate through all of them (using nextPageToken) to get the nextSyncToken on the last page. It is a no brainer that processing time increased.

Need a smart way to handle it – After some research and reading I figured out, not to pass singleEvents parameter and if we got recurring_event_id back in response fetch the recurring events separately using this API to maintain proper sync.

So here are some key points which I understood about google one-way calendar sync

  • If you don’t want to deal with recurring events your job is very simple just call API and skip recurring events by parsing response.

  • If you have to deal with recurring event but don’t want to deal with deletion of event just pass singleEvent parameter to true. This will increase some calls and server processing but will make your life easy 🙂

  • BUT, When you want to deal with deletion of recurring event don’t pass singleEvent to true. Google will give you recurring meeting as single event with following data in response. Using this you can figure out whether that is recurring event or not.

{ 
  ...
 "recurrence": [
  "RRULE:FREQ=WEEKLY;BYDAY=TU"
  ],
  "id": "f8j5134grukmqufapvn0q0p2bq0"
  ...
}

Now you can also parse “recurrence” tag and then calculate & set recurring schedule in your database. But, there are multiple formats “recurrence” and I did not want to deal with them. Hence, I separately captured recurring events by passing recurring event id to this API and made my life simple.

How to handle modifications to recurring events (a.k.a. “exceptions”)

CASE-1: Modification to Single Event
A] When you update only single event from recurring event google will create a new event from updated event and you will get that in next sync call. The new event will have parent recurring event id and you can use this to do any processing on your server.

{ 
  ...
  "id": "pmu0u4psc1ebq2vun468vpn9ms_201611123103000Z",
  "recurringEventId": "pmu0u4psc1ebq2vun468asdn9ms",
  ...
}

B] When you delete single event from recurring event you gets following response

 {
  "kind": "calendar#event",
  "etag": ""296070510840"",
  "id": "pmu0u4psc1ebq2vun468vpns_20161201T103000Z",
  "status": "cancelled",
  "recurringEventId": "pmu0u4psc1ebq2vun468vps",
  "originalStartTime": {
    "dateTime": "2016-12-01T16:00:00+05:30",
    "timeZone": "Asia/Calcutta"
  }
}

CASE-2: Modification to Multiple Events
A] When you update multiple by using “Change all following meetings”. For e.g. Let’s say you have 1 recurring event with 10 occurrences you update 6th meeting and all it’s following events now in this case google creates 1 more recurring events. It keeps the 1st 5 meetings in first recurring event and will create new recurring event with updated 5 (6 to 10). The sync API will return BOTH. And hence forth for syncing you can treat these as two separate recurring events.

B] When you delete following recurring events from particular event in response you get recurring event id and you can again resync.

I wasted a couple of days to figure all this out. Hopefully this will save someone’s time.

Lesson Learnt: Always read documentation carefully and think of all future cases and design accordingly though the requirements may not have asked for some feature.

Pro-tip: Scaling properly with Nginx worker_process and max_pool_size

This post talks about worker_process and passenger_max_pool_size options for nginx and how these configurations play an important role for scalability.

Recently our client reported that API performance was  bad and it was taking a few minutes to respond! But, when we tested this API this was quite efficient and was responding in few milliseconds. What was wrong? As the number of users increased, the number of concurrent requests to the server had also increased. Initially, it was only 100’s of request per second coming to a server but now it was about 4 k requests per second and we realised that we had not scaled our server to handle this load.

When this problem arrived I thought of breaking down the problem in steps.

  • First I checked RDS instance. It was using 20%
  • I revisited all queries they were optimized enough. So database was not bottleneck
  • Sent unauthorized request to the server so with less processing I will come to know response time and it took 20 sec.
    20 secs for unauthorized request ?  NOT at all acceptable.

Then thought of checking web server configurations.

I was using Nginx as the web server and Passenger as an application server. So The initial thing I noticed was the number of worker_processes. They were 5.

worker_processes  5;

worker_process directive is responsible for letting our virtual server know many workers to spawn once it has become bound to the proper IP and port(s).

And I was using 1024 worker_connections. Worker_connection sets the maximum number of simultaneous connections that can be opened by a worker process.

events {
    worker_connections  1024;
}

So mathematically my server can handle 1024 * 5 connections, so why is there still a problem?

After reading this I realised that the value of worker_processes depends on the number of cores you have. I was using single core machine and 5 processes, so how will that work?

Just imagine that there is a shop that is handled by one person and billing is done on one computer. Now, as sales increases, the shopkeeper hires new people but he now doesn’t have space for these new employees to operate! So at a time only 1 person can work at a time and the rest just sit idle. Exactly the same thing was happening in our case!

It is common practice to run 1 worker process per core. Anything above this won’t hurt your system, but it will leave idle processes usually just lying about.

So, the shopkeeper needs to shift to a larger space for people to operate!

Similarly, we decided to increase our hardware capacity and opted for quad core server and redirected our request to that server. This helped the  API performance a significantly but still we were not happy with this performance boost. We had a server that was 4 times more powerful that the earlier one  and still request time was a few seconds.

So after researching a little more, we realized our passenger settings are not optimized. We were using process based passenger but with only 6 pools.

Passenger max_pool_size defines The maximum number of application processes that may simultaneously exist. A larger number results in higher memory usage but improves the ability to handle concurrent HTTP requests. When I checked our CPU usage it was around 25 to 30 so we were not utilizing our server at its best. Here are more details about max_pool_size.

Back to the shopkeeper now. The shopkeeper increased employees but kept only 1 billing computer! Hence, there was lesser efficiency even with more room to operate. So, he also invests and increases the number of computers for billing!

Similarly, we increased Passenger_max_pool_size to 15 and it really boosted performance to the great extent.

How we decided this number 15. It was calculated by the following formula

passenger_max_pool_size  = (TOTAL_RAM * 0.75) / RAM_PER_PROCESS

Our total RAM was 15GB and RAM_PER_PROCESS was around 300 to 400 mb.

To get RAM_PER_PROCESS we recorded stats of passnger-status at different interval of time.

passenger

So by above formula we can set number to more than 30 but we decided to keep it 15 as we wanted to keep room for other processes.

Almost many of us work in a team and we know the key skill of each team member and try to use that to deliver the best product similarly here web server, app server, hardware works in the team we always write good code but if know such things that helps us to deliver the best product.

I hope my experience will help others :).

Confessions of a first time speaker

This was second RubyConfIndia I attended and I was excited, happy and scared this time. Why? Because I was one of the speakers. I never thought that I would be able to stand in front of 500 people and deliver a talk. But I did it!

The journey started with Josh Software. In Josh, when you join the company, you are supposed to deliver a talk of 20 minutes internally and this is compulsory for everyone. 2 years ago when it was my turn to present talk, I realised that my greatest fear is public speaking! At that time I delivered talk for a topic that was not very interesting and I was very scared. It was then that I decided to overcome this fear. I had this fear primarily because English has not been my first language. Last year Anuja and I decided to give a lightning talk at RubyConfIndia to get some confidence of speaking on stage. We gave a lightning talk but at that time we had prepared for a month but since I knew my part verbatim, that talk didn’t give me much confidence.

A few months ago when the CFP was open, we were again thinking of giving some lightening talk. Recently at Josh we won the CodeCuriosity competition, so I thought we could  speak about open source contributions. I asked Anuja if we could give a lightening talk on this and finally after discussions we decided to submit the CFP; and it got selected! When I saw a tweet that our talk got selected, I was happy excited and suddenly tensed. We wanted to talk on open source but we did not want to share just statistics and preach to the audience to do open source. Since  we had only just begun contributing to open source, we had lots of fears and realised that we should share our experiences, so that even the experts in the  audience will relate to it and newbies will get some motivation.

Talking about your fears is really not easy!! However, the Josh team was with us all the way 🙂 One day, we had discussions about everyones fear and I realized I am not the only one who faces these. Everyone has faced this sometime in their life and its really important to talk about it, so we decided we will talk about our experience of our first comment on an issue, filing our first issue, submitting our first Pull Request, writing our first Blog post and even about giving our first Lightening talk. I still have the fear of making mistakes about which I didn’t speak at the conference! As Anuja has 4+ years of  experience and her English is better than mine, I was haunted by the thought of making some mistake. People would make judgements about employees of Josh and till now all the speakers from Josh had set a high expectations – I feared  what would happened if I will mess up our talk and I thought of quitting many times but realised that was not an option!

Finally we started our preparation!! I was practicing in front of a mirror and listening to my recording again and again really helped me a lot. Now when I listen to my first recording and last recording, I can make out a huge difference.

We had around 20 days to prepare and thankfully my project was on track, so I could give time to practice my talk 🙂 We decided that our presentation will have only images and searching for images that are under creative commons license is really difficult but we did it!! We had some confidence while preparing slides because of our previous lightning talk and as we were sharing our own experiences, we realised that we could manage!

lemeridian

Finally, it was the day to leave for Kerala and I was really excited because I was flying for first time. I enjoyed the flight. We reached there a day before the conference at 4 PM. The view from our room at Le Meridian was really awesome and after seeing that view I forgot everything and was enjoying it for a few minutes.

When we did Audio/Video testing that evening, I was worried because the font was blurred on the display. Luckily,  the technicians fixed that issue  but I was still scared that it won’t work on the day of talk! After seeing the presentation on the large LED display, we thought of some more changes in the slides. We did those hoping that the changes will be seen properly on screen. Because of these thoughts I couldn’t sleep that night. Our talk was on second day and I decided to attend all talks. This time I was more willing to attend talks than last year because I thought that as a speaker if no one attends my talk, how will I feel? So I attended all the talks, especially the lightening talks. I was worrying about the content so much that just before the conference party started, I thought about sitting in my room and making some changes. Luckily, good sense prevailed and I had enjoyed all talks of first day and the party too 🙂

D-day arrived!  I was still scared to face the public but at that time Gautam told me forget everything, go on stage and enjoy. This time I was a bit more scared because I was sharing one of Gautam’s talking experience and I had put his photo in which he was looking scared and he didn’t even know this. Some of my colleagues were saying he won’t like this if you troll him and put it in your presentation. I also asked  Sethu many times should I show it to Gautam? Would he get angry? Finally I let this thought go and decided to see what happens.

The talk went off very well. Sorry Gautam, but sharing your experience was really inspiring for me, even though we showed some scared photo of yours!  We got standing ovation after talk which was really big thing for me.

We saw some tweets after our talk.

tweet

Cd-lwYvWIAApCQwTejas was feeling guilty for every pull request, issue which he has ignored. After our talk, even  if 5% of the attendees overcome their fears, I will think our talk was a great success.

After our talk Katrina said that this talk was really needed and  we should keep doing talks like this. Sarah said it was a really nice talk and we also got comments from other speakers and audience and icing on the cake was that  Matz liked our talk. Matz asked Gautam when the slides and video would be uploaded for our talk!

Matz told us that it was really nice talk. What more can a rubyist want expect !!

This blog was an attempt to overcome one of the fears I talked about during my talk 🙂

Slides of our talk are available here. Now I have a dream of giving solo talk 🙂

[Gautam helped edit this talk and help out with the grammar. Written English is also not my strength but I intend to overcome that too!]

Gmail: Send email API In ruby on rails

A few days ago, I had a requirement to send email on behalf of users. Of course, the  user has given permissions to do so 🙂 We were not using any particular gem in the project for calling Google API, instead we were using a Rest Client.

Initially I went trough the API Documentation and since the basic architecture is already there, it was just a matter of calling the API through the rest client and my job is done. However, as usual, it’s not as easy as it looks.

After reading the API Docs, I figured out that I need to pass the basic parameters (to, cc, bcc, subject, body) in raw text and as per the format of RFC 2822, in a  bese64url encoded string. But in the doc they didn’t mention that to, cc, etc should be separated by ‘,’ or ‘\n’. And I started complaining about the poor documentation to my colleagues, someone correctly pointed out it should be in RFCC 2822 format. What indeed is RFC 2822? When I did  read rfc 2822, I realized my mistake. The documentation was correct but who cares to read the doc carefully?

So according to RFC 2822 each field should be separated by ‘\n’ and body should be followed by header and separated by extra empty line. Following is code snippet I used to create raw string.

    def convert_to_base_64(to, cc, bcc, body, subject)
      message = &quot;to: #{to}&quot;
      message = message + &quot;\n&quot; + &quot;subject: #{subject}&quot; unless subject.blank?
      message = message + &quot;\n&quot; + &quot;cc: #{cc}&quot; unless cc.blank?
      message = message + &quot;\n&quot; + &quot;bcc: #{bcc}&quot; unless bcc.blank?
      message = message + &quot;\n&quot; + &quot;content: multipart&quot;
      message = message + &quot;\n&quot; + &quot;Content-Type: text/html&quot;
      message = message + &quot;\n\n&quot; + &quot;#{body}&quot;
      Base64.urlsafe_encode64(message)
    end

Note that I have added 2 ‘\n’ before body.

I wasted my day to figure this out might be this will save someone’s time.

Lesson Learnt: Always read documentation carefully.

Nginx : Allow request only from your domain!

It’s very common these days to use Nginx as your web server. Recently, while setting up Nginx for one of my projects, I forgot to add some configuration which lead to serious problem(s) later.

Here is a code snippet similar to the server block which I used in my Nginx configuration.

server {
    listen 80
    root /usr/share/nginx/html/sampledomain/public;
    access_log logs/sampledomain.access.log main;
    server_name sampledomain.com www.sampledomain.com;

    ...
}

In Nginx “server_name” directive is used to define server names, which are used to determine which server block will be used to process an incoming request.

Let’s take an example. If a request comes for sampledomain.com OR http://www.sampledomain.com, it will be handled by above server block.

In the Nginx configuration, you are allowed to have multiple server blocks. Incoming request is matched against the “server_name” directive in a top to down fashion. However, in the case of multiple server blocks, if the request is for a non-existent domain, then it will be handled by the first server block!

So, what if some other domain sends request to our site?

If you want to block requests coming from all other domains (other than the one specified in ‘server_name’ directive), add the following server block:

server {
    listen 80 default_server;
    server_name _;
    return 444;
}

Setting server_name to ‘_’ means ‘default’ or ‘match all’.
The response code ‘444’, set as return value means “NO RESPONSE”

Now, any request coming from a domain other than “sampledomain.com”, will match this server block and hence won’t be served.