Use power of database to speedup your application!

In this blog post, I am going to tell you some queries/ tricks to speed up your application by using the power of database. I am using postgres and activerecord rails so all queries are related to these, but you can relate it with your database.

Recently I got some performance issue when our user base increased many folds. We did some optmization and would like to share the tips:

Note: As I am using ActiveRecord most of the examples have ActiveRecord syntax with corresponding SQL statement

Select only required fields from database – Not all

Mostly developers miss to specify the required columns and instead select all fields from database which leads to performance degradation.

Let us say my User Table has 100 users

users = User.all
User Load (15.4ms) SELECT "users".* FROM "users"

Above query took 15.4 ms as it is selecting all columns from the table. But, the query below took only 3.3ms.

users = User.all.pluck(:name)
(3.3ms) SELECT "users"."name" FROM "users"

Here we are not only achieving query time improvement but also memory required to hold objects will be lesser.

Add compound index when you want to use limit + order OR Avoid using limit + order

When we use limit with an order it makes the query too slow. The time can be substantial if your table size is large! I was using limit 1 and order some column it was taking few seconds to respond. After searching I found this quickly. The ideal solution is to use a compound index. However, if you are sure that your result set is going to be very small (say 10s of records) then instead of adding compound index you can take the result in an array and returned the first record.

In my case, I knew I won’t get more than 10 records if I collect it in an array. Hence, I used this additional trick.

Company.unscoped.where(subsidiary: false).where("name like 'josh%'").order(:id).limit(1)

Above query was taking 2928.1ms whereas the following query took 2.3ms

Company.unscoped.where(subsidiary: false).where("name like 'josh%'").order(:id)

NOTE: My Companies table has about 1 Million records.

Use JOIN instead of NOT IN query

NOT IN and IN statements are not well optimized. They are easy, but usually, they can be replaced by a simple JOIN statement making they the query way more efficient.

Now say I have a player from England and I want to count how many matches he played where match score is 70 but the match was not against India.

I can simply write it as

player.matches.where(score: 70).where.not(opponent_team_id: Team.find_by(country: 'India').id).count

(166789.8ms) SELECT COUNT(*) FROM "matches" WHERE "matches"."player" = $1 AND "matches"."score" = 70 AND ("matches"."opponent_team_id" NOT IN (SELECT "teams"."id" FROM "teams" WHERE "teams"."country" = 'India')) [["player_id", 135]]

Above query is simple and will return results in few ms but I have ran it on millions of record and it took 166789.8ms

Now let’s see how joins query for above behaves

Match.joins(:team).where(player_id: 135, score: 70).where.not(opponent_team_id: Team.find_by(country: 'India').id).count

(14.5ms) SELECT COUNT(*) FROM "matches" INNER JOIN "teams" ON "teams"."id" != "matches"."team_id" WHERE "matches"."player_id" = 135 AND "matches"."score" = 70

It took only 14.5ms which is minuscule as compared to the “Not In”

Use md5 digest if you want to search on text column

Note: We can use this trick only if we want to do an exact match on a string.

In my case I had one field on which I knew I am going to do only exact match and that field was of text type. When we were doing match on that column query was taking around 3000ms.

User.where(search_column: 'https://abcdef')

(3139.9ms) SELECT "users".* FROM "users" WHERE "users"."search_column" = 'https://abcdef'

The obvious solution was to index search_column but instead, we stored one more column md5_search_column and applied search on that which reduced our search time to few ms.

User.where(md5_search_column: '7c602c2c776fb33c188957bb6bdd8928')

(1.9ms) SELECT "*".* FROM "users" WHERE "users"."md5_search_column" = '7c602c2c776fb33c188957bb6bdd8928'

Now, why didn’t we index search_column instead of adding the new column and storing a value in that column? For two reasons

md5 digest will be always 32 characters long so indexing on 32 characters will be more efficient than text column
Searching on 32 characters will be more efficient than text column

Use raw sql query instead of going ActiveRecod way

Above optimizations were easy and could be integrated with active record easily. But there are some cases where we need to use raw sql queries and bypass ActiveRecord. Wait it’s not like I don’t like ActiveRecord. I love it simply because I can switch databases any time. For instance, I could switch to Postgres from Sqlite with minimal changes. However, when your data size increases and you know for sure that you are not going to change database you should use database power. IMO, this is the case with all production systems, you do not change the DB!

Said that, for eg. we used sum, least, window functions to get results in milliseconds.

I had to use raw sql for the following example. I wanted to calculate the number of students present in school based on some criteria but a condition was to take only X (In example below its 20) number of maximum students per class.

Now the simplest thing I could do was

Take distinct classes from database
Loop on every class to calculate count
Calculate sum of count calculated in step 2

But think how inefficient that would be. Instead, database provides least and sum functions which can be used to achieve above

(0.7ms) select sum(A.total_count) as total, sum(A.total_by_max_count) as total_by_max_students from
(SELECT count(*) AS total_count, least(count(*), 20) AS total_by_max_count FROM
students, class WHERE students.hobby = 'Reading' Group BY students.class_id) as A

We also had a requirement: Perform some operation on students and criteria was – select only X (its 20 in an example below) number of students having a hobby as reading from each class. Here also we could have gone with above approach of iterating over records but instead, we used window function and achieved the same thing in less time

Query we used

query = "SELECT * FROM ( SELECT students.id,students.class_id,hobby,row_number()
OVER ( PARTITION BY students.class_id ORDER BY
'students.id' DESC) as rownum FROM students where students.hobby = "'eading') tmp
WHERE tmp.rownum <= 20"

So to conclude if you use power of database efficiently you can speed up your application to the great extent with current stack you are using.

5 thoughts on “Use power of database to speedup your application!”

Sachin Shintre says:

October 31, 2017 at 8:15 am

Reblogged this on Josh Software – Where Programming is an Art!.

LikeLike

Prathamesh⚡Sonpatki (@_cha1tanya) says:

October 31, 2017 at 8:31 am

Good blog post.

The compound index may not be desirable all the time, it also depends on the order of columns in the query clause.

In general, I have found Postgres explain clause pretty helpful in understanding where the query takes time. https://www.postgresql.org/docs/9.3/static/sql-explain.html

LikeLiked by 1 person

- Shweta Kale says:
  
  November 1, 2017 at 4:34 pm
  
  Thank you Prathamesh. Yes, explain cause is pretty useful.
  
  LikeLike
  
Aboobacker M K says:

November 1, 2017 at 4:25 pm

Good post

I always try to use active record until it creates performance bottleneck, using plain SQL will make it difficult to read after some time (at least for me), and it misses active record goodies like further scoping
For instance, you can re write last example in active record by producing same resultant sql

subquery = Student.select(:id, :class_id, :hobby)
.select(“row_number() OVER (PARTITION BY students.class_id ORDER BY students.id DESC) as rownum”)
.where(hobby: ‘reading’)

students = Student.from(subquery).where(“subquery.rownum <= ?", 20)

now students is an active relation object, you can use further oprations like pagination and other scopes on the top of this

LikeLike

Shweta Kale says:

November 1, 2017 at 4:33 pm

Yes that’s true I also always prefer ActiveRecord but at the same time keeping in mind whether I am not compromising on performance.

LikeLike