In this blog post, I am going to tell you some queries/ tricks to speed up your application by using the power of database. I am using postgres and activerecord rails so all queries are related to these, but you can relate it with your database.
Recently I got some performance issue when our user base increased many folds. We did some optmization and would like to share the tips:
Note: As I am using ActiveRecord most of the examples have ActiveRecord syntax with corresponding SQL statement
Select only required fields from database – Not all
Mostly developers miss to specify the required columns and instead select all fields from database which leads to performance degradation.
Let us say my User Table has 100 users
users = User.all User Load (15.4ms) SELECT "users".* FROM "users"
Above query took 15.4 ms as it is selecting all columns from the table. But, the query below took only 3.3ms.
users = User.all.pluck(:name) (3.3ms) SELECT "users"."name" FROM "users"
Here we are not only achieving query time improvement but also memory required to hold objects will be lesser.
Add compound index when you want to use limit + order OR Avoid using limit + order
When we use limit with an order it makes the query too slow. The time can be substantial if your table size is large! I was using limit 1 and order some column it was taking few seconds to respond. After searching I found this quickly. The ideal solution is to use a compound index. However, if you are sure that your result set is going to be very small (say 10s of records) then instead of adding compound index you can take the result in an array and returned the first record.
In my case, I knew I won’t get more than 10 records if I collect it in an array. Hence, I used this additional trick.
Company.unscoped.where(subsidiary: false).where("name like 'josh%'").order(:id).limit(1)
Above query was taking 2928.1ms whereas the following query took 2.3ms
Company.unscoped.where(subsidiary: false).where("name like 'josh%'").order(:id)
NOTE: My Companies table has about 1 Million records.
Use JOIN instead of NOT IN query
NOT IN and IN statements are not well optimized. They are easy, but usually, they can be replaced by a simple JOIN statement making they the query way more efficient.
Now say I have a player from England and I want to count how many matches he played where match score is 70 but the match was not against India.
I can simply write it as
player.matches.where(score: 70).where.not(opponent_team_id: Team.find_by(country: 'India').id).count (166789.8ms) SELECT COUNT(*) FROM "matches" WHERE "matches"."player" = $1 AND "matches"."score" = 70 AND ("matches"."opponent_team_id" NOT IN (SELECT "teams"."id" FROM "teams" WHERE "teams"."country" = 'India')) [["player_id", 135]]
Above query is simple and will return results in few ms but I have ran it on millions of record and it took 166789.8ms
Now let’s see how joins query for above behaves
Match.joins(:team).where(player_id: 135, score: 70).where.not(opponent_team_id: Team.find_by(country: 'India').id).count (14.5ms) SELECT COUNT(*) FROM "matches" INNER JOIN "teams" ON "teams"."id" != "matches"."team_id" WHERE "matches"."player_id" = 135 AND "matches"."score" = 70
It took only 14.5ms which is minuscule as compared to the “Not In”
Use md5 digest if you want to search on text column
Note: We can use this trick only if we want to do an exact match on a string.
In my case I had one field on which I knew I am going to do only exact match and that field was of text type. When we were doing match on that column query was taking around 3000ms.
User.where(search_column: 'https://abcdef') (3139.9ms) SELECT "users".* FROM "users" WHERE "users"."search_column" = 'https://abcdef'
The obvious solution was to index search_column but instead, we stored one more column md5_search_column and applied search on that which reduced our search time to few ms.
User.where(md5_search_column: '7c602c2c776fb33c188957bb6bdd8928') (1.9ms) SELECT "*".* FROM "users" WHERE "users"."md5_search_column" = '7c602c2c776fb33c188957bb6bdd8928'
Now, why didn’t we index search_column instead of adding the new column and storing a value in that column? For two reasons
- md5 digest will be always 32 characters long so indexing on 32 characters will be more efficient than text column
- Searching on 32 characters will be more efficient than text column
Use raw sql query instead of going ActiveRecod way
Above optimizations were easy and could be integrated with active record easily. But there are some cases where we need to use raw sql queries and bypass ActiveRecord. Wait it’s not like I don’t like ActiveRecord. I love it simply because I can switch databases any time. For instance, I could switch to Postgres from Sqlite with minimal changes. However, when your data size increases and you know for sure that you are not going to change database you should use database power. IMO, this is the case with all production systems, you do not change the DB!
Said that, for eg. we used sum, least, window functions to get results in milliseconds.
I had to use raw sql for the following example. I wanted to calculate the number of students present in school based on some criteria but a condition was to take only X (In example below its 20) number of maximum students per class.
Now the simplest thing I could do was
- Take distinct classes from database
- Loop on every class to calculate count
- Calculate sum of count calculated in step 2
But think how inefficient that would be. Instead, database provides least and sum functions which can be used to achieve above
(0.7ms) select sum(A.total_count) as total, sum(A.total_by_max_count) as total_by_max_students from (SELECT count(*) AS total_count, least(count(*), 20) AS total_by_max_count FROM students, class WHERE students.hobby = 'Reading' Group BY students.class_id) as A
We also had a requirement: Perform some operation on students and criteria was – select only X (its 20 in an example below) number of students having a hobby as reading from each class. Here also we could have gone with above approach of iterating over records but instead, we used window function and achieved the same thing in less time
Query we used
query = "SELECT * FROM ( SELECT students.id,students.class_id,hobby,row_number() OVER ( PARTITION BY students.class_id ORDER BY 'students.id' DESC) as rownum FROM students where students.hobby = "'eading') tmp WHERE tmp.rownum <= 20"
So to conclude if you use power of database efficiently you can speed up your application to the great extent with current stack you are using.