Use pagination to save memory when iterating over a lot of ActiveRecord objects.
Posted by Luke Ludwig Sun, 17 Feb 2008 04:57:00 GMT
It is very easy to consume a ton of memory when using ActiveRecord's find(:all) method. When transitioning the Team Sport Tech rails app from using file column to attachment_fu I wrote a migration to convert all of the photos we had to the new database format and the new location on disk. Without thinking I wrote code like this:Photo.find(:all).each do |photo|
# conversion commands
.
.
end
I didn't notice any issues executing this on my laptop which has 2 GB of RAM, but when I went to run this migration on our staging server over at Engine Yard I had problems. Our staging server only has 640 MB of RAM. Our database has over 100,000 photos. That is an array with 100,000 ActiveRecord objects in memory all at once. As the migration executed on the staging server I could see that the rake task was using all available RAM and it was paging like crazy, utilizing 600 MB of virtual memory from disk. The cpu fluctuated between 0 and 1 percent of utilization due to the paging. If adequate memory was available this migration would take around 10 minutes and the cpu would be working like crazy. I did a few other things for awhile and came back to this migration 2 hours later. Still working and it probably had a long ways yet to go.
While the memory limitation on the staging server isn't that important, the real question was how the production server with 1024 MB of RAM would handle the migration since the live app has to be down for maintenance when we roll this out. Instead of wondering I decided to fix the real problem, which was the stupid code I wrote to load all of the photos in memory for iteration. I quickly realized that the problem was already solved for me in the form of pagination. Pagination is usually thought of as a way to display N number of items to the user on a web page and providing links to move to the "Next" or "Previous" pages. When doing this the pagination code smartly loads only the ActiveRecord objects that are needed for the current page in memory, which is exactly the behavior I needed.
We recently switched from using the classic rails pagination to the newer will_paginate plugin. Since I knew I would need to use will_paginate in this manner in other migrations, and possibly in actual application code, I wrote this iterator for reuse:
def apply_to_all(klass, per_page)
objects = klass.paginate :page => 1, :per_page => per_page
num_pages = objects.page_count
total_entries = objects.total_entries
for page in 1 .. num_pages
objects = klass.paginate(:page => page, :per_page => per_page, :total_entries => total_entries)
objects.each do |object|
yield object
end
end
end
So I changed my migration code to look like this:
apply_to_all(Photo, 500) do |photo|
# conversion commands
.
.
end
A key thing to note is the use of total_entries. Each time the paginate method is called, 2 sql queries are performed. One to get the records, and one to count how many records there are in total. This counting query can be very expensive when there are a lot of records, and if you provide the total_entries option to the paginate method the counting query will be skipped. While I chose to reuse the will_paginate plugin here, it wouldn't be very difficult to rewrite the above code without it by using LIMIT with an offset, which is how the will_paginate plugin works. Additionally it may be useful to add a third parameter to the apply_to_all method called options, which is simply passed onto the paginate method allowing such things as conditions or includes to be execute in the paginate sql query.
So now the migration code loads 500 Photo objects in memory at once. I ran this migration on the staging server and it barely used any RAM and executed in the expected 10 minutes. This short piece of code is a prime example of why I enjoy programming in ruby.