Quick and Dirty Rails Optimization Guide

on December 11, 2007

these are quick notes I spontaneously ranted down about my experience with rails and making it perform

One of the reasons I am working on this current project here in Tokyo is because I can experience the hardships that come with user growth. Apart from learning how to actually get a project to take off, it is also interesting what to do when it actually does!

When we had our first growth spikes we had a lot of people using the system at the same time. Being a learning system, we have the disadvantage of having a lot of data intensive processing. This article is about the code optimization part of a rails project rather than the systems part (which is another chapter, properly described in articles as these)

Finding the slowest requests

There are several tools to spit through the production logs ot find out what the slowest pages are. These pages you will have to tackle first.

Depending on your system load (high mem / cpu / db), you might want to prioritize render-heavy over db-heavy pages, but from what I’ve seen most of the first optimization steps are in DB-heavy pages.

Optimizing a request / page

Render heavy, Database heavy? What are you talking about? Basically, when you look at the mongrel development log load time is separated into two categories: render time and db time. Render time is simply the time it takes excluding calls to the database.

When you have a page that has a high render time, but a low percentage of database time it means that a lot of time is spent on calculations or moving around data. With these pages you have to make sure that:

  • Have no code that blocks the request (HTTP/network calls, external commands, Disk IO). This code should be moved to it’s own background worker.
  • Don’t have too many ActiveRecord code that loads big chunks of data. These calls will appear as having a low DB load, but in fact use up a lot of CPU and memory. Only load the data you display.

When optimizing individual pages, this is my way the way to go:

  • run mongrel_rails on your powerful development machine
  • this might be controversial, but…. LOAD THE ENTIRE PRODUCTION DATABASE. I’m not kidding. This will give you benefits in terms of optimization but also for the usability aspect of developing (might be an extreme literal example of a getting real chapter). However, I do recommend that you make a ‘rake db:make_developer_friendly’ task that will obfuscate the private user data.
  • open up a terminal and run ‘tail -f log/development.log’. That way you can take a good realtime look at all the stuff happening when a request is done. Hit enter a couple of times to create a visual separation between requests :)

ActiveRecord is killing with a thousand cuts

ActiveRecord is a great thing, but when it comes to performance you have to keep it in check. (Even when you’re in production I think it still adds great value!). Stuff like blog_entry.user.username looks quite innocent, but when you have a listing of 100 blog_entries, you’re screwed (it will do a query to load the belongs_to :user relationship everytime this is called in the listing, so you will have another 100 queries to your 1 HTTP request).

You can combat this by preloading and customizing ActiveRecord loads. In the case of blog_entry.user.username, you could do a BlogEntry.find(:all, :include => :user) which will preload the user belongs_to, however this might be inappropriate:

  • :include doesn’t preload polymorphic associations
  • :include doesn’t play well with :joins/:select yet
  • if you only need the username, don’t load the whole user

I’m not sure if I remember correctly, but sometimes it is actually faster to don’t preload at all, but just use the ActiveRecord craze.

Explain these slooow queries

When you have one of those queries that take more than 0.03 secs, you might want to analyze it a bit.

In my current project there are MySQL pro hired-gun consultants that go very far with this stuff, but it’s always good to know a few of their tricks yourself:

Open up your MySQL client and start executing it on your production data copy:

  • Always put SQL_NO_CACHE after your SELECT statement, this will make sure you aren’t looking at MySQL cache load-times.
  • Put the ‘explain’ statement in front of your query to look for big integers which might mean that you’re missing an index.
  • Geez, this output looks fucked! Yes, put \G at the end of you’re pipe characters are gone.

Appropriate indexes should be set up in an early stage. Adding and removing an index can take up to hours when you’ve accumulated a lot of data!

SQL caching is great, but totally useless for datasources that change by the second, an SQL_NO_CACHE can be faster in those cases. For those places that ARE suitable for SQL caching, make sure you don’t work against it. SQL caching needs queries to be always the same.

1
2
('created_at > ?', 5.days.ago.utc) # Not SQL cached
('created_at > ?', 5.days.ago.beginning_of_day.utc) # SQL cached!

In some cases you might be pulling in data of a restricted subset of parents. For example: You want to get all the messages posted by the users that belong to a certain group with x conditions. In those cases, it might be faster to actually retrieve the id’s of all those users in one query. And doing a second SELECT with a giant “user_id IN (?)” condition.

Fragment cache the hell out of it!

You can lower the load of your pages by fragment caching certain area’s in your views. A fragment cache works like this:

1
2
3
<% cache_method(identifier) do %>
  your code here
<% end %>

Code in that block will only run once until clear_cache_method(identifier) is called.

There are several ways of clearing these fragments:
  • Clearing it on specific places during the execution of alterations. This requires specific knowledge of the behavior.
  • Clearing it whenever a change is made to an entity/model. You can use cache_sweepers (observers) for that.
  • Clearing it periodically with a cronjob. This is useful for when behavior is very complicated.

There is one rails good-practice guideline that plays very well with fragment caching: Fat models, Skinny controllers.

1
2
3
<% cache_method(identifier) do %>
  <% @newest_users.each do %>
    ... # no DB calls cached here!!

The instance variable @users is populated in the controller, making the cache nothing more then a HTML cache. What you should do, is MAKE SURE THAT THE DATABASE CALL IS DONE IN THE VIEW.

PHP users will go insane now. What? Database queries in the view? Are you an amateur? Well, it’s actually quite elegantly tucked away in the model:

1
2
3
<% cache_method(identifier) do %>
  <% User.newest.each do %>
    ...

No code in the controller, all execution in the fragment. Yeah!

Join the summaries!

When you have a system with very complicated datasets you will need big queries with a lot of joins. To improve performance you can ‘denormalize’ the database – making the structure more simple. But sometimes you can’t. What you CAN do and probably have to do, is summarize that data so it can be accessed quickly.

Finding the right architecture for a summary table took a few trials and errors. At the moment this is the way I roll with this:
  • add a AR model MyEntitySummary
  • add a class method MyEntitySummary.full_regenerate (truncate table and insert all my_entity_summaries)
  • add a class method MyEntitySummary.update_for(my_entity) (update one row of my_entity_summaries)
  • make sure that both are using the same pieces of SQL (DRY)
  • the first time you migrate, call MyEntitySummary.full_regenerate

And now comes the tricky part. Preferably, you only want to call update_for from now on. full_regenerate is only for the first time or emergencies. You can call full_regenerate on an after_save or an observer (preferably through backgroundrb)

You might tempted to put full_regenerate in a cronjob and run it every hour. Only do that when it’s really necesarry since it will cause big load spikes on your servers. Also, we have had some troubles with table locks etc.

Size, count, length?? Cache that count

As you might know, these methods have different behavior when running them on an association. For example user.blog_entries.length will pull in the full data set and return the size of that data set. user.blog_entries.count on the other hand will just do a count query without pulling in any data.

I could show you a nice table of when to use what, but I’m not going too. Actually, I’m not so sure anymore since I’ve seen some weird stuff lately. Basically, you don’t use length unless you know what you’re doing. I like size, but to be sure I just use count.

If you have a lot of count queries or you want to join in a count for a big query, you might want to take a look at counter_cache. Documentation for counter_cache can be find nowhere, so I will tell you briefly:

  • counter cache stores the count in the parent that has many
  • this count is stored in an SQL field and should ALWAYS BE AN INTEGER (I wasted some time on that, rails will not say anything)
  • In the example of user.blog_entries, you have to open the BlogEntry model and add: belongs_to :user, :counter_cache => true
  • All you need to do now is add a migration saying: add_column :users, :blog_entries_count, :integer, :default => 0
  • If you are testing properly, you WILL get failing tests now :-)

C’est tout

This is just a small set of things you can do to get better performance. Some of it might be wrong or idiotic since it is based on my own trial and error experience (only way I learn). I hope you can use this to solve your performance luxury problem soon ;-)

1 Response to “Quick and Dirty Rails Optimization Guide”

  1. claire says:

    Good day! Me credit is good enough. I am an active traveler therefore use frequent flier card to earn bonus miles. Now I am looking for one more rewards card. I am going to apply for a new one online at the site. But do not know for sure whether it is a real site or not. Could you answer me as soon as possible?

    0 percent finance on bal transfers and no annual fee discover credit card bussiness discover

Sorry, comments are closed for this article.