Why saving data using MongoHQ takes so long?
… and check why 5600+ Rails engineers read also this
Why saving data using MongoHQ takes so long?
Recently in one of the projects we’ve taken over, which uses MongoDB and is hosted on Heroku, we have been asked to speed up an importing file process. It seemed to be a simple thing but without a few tricks we wouldn’t have made it.
I need to say a few things first, we’re using a standard Heroku app configuration and MongoHQ addon. It’s been enough for our needs so far. The question is why import of 10MB file takes so long? It took around 1h. You would say that the problem is in the code, but it’s a half-truth.
Below you can see an example of a service to import files with BIM (Building Information Model) objects. Every BIM object has properties. The properties may be duplicated. The important thing is that we have 2 loops here. 10MB file may include 10k objects and 5k properties, so the service has to save 15k records in DB.
class ImportBimObjectsService
def call(model, data)
bim_object_parser = BimObjectsParser.new(data)
bim_object_parser.bim_properties.each do |name|
bim_property = model.bim_properties.find_or_initialize_by(name: name)
unless bim_property.model_ids.include?(model.id)
bim_property.model_ids << model.id
end
bim_property.save
end
bim_object_parser.bim_objects.each do |bim_object_attrs|
model.bim_objects.create(bim_object_attrs)
end
end
end
We can imagine where the problem is. 15k requests to DB isn’t a small number, especially when we’re using MongoHQ and Heroku. Usually DB server is in a different location than webserver, so the latency isn’t so small like it’s on local environment. In our case the difference between Heroku and local environment was quite big. On development we were able to import the file in 7 min, on Heroku in 1h.
How to minimize the number of requests?
We can use MongoDB insert method, however insert doesn’t run validations and it’s on our hands to make sure that our model is correct. We can compare insert with storing raw data in DB. There is the last thing to remember, before we store data, we have to add fields like updated_at and created_at to attributes.
class ImportBimObjectsService
def call(model, data)
bim_object_parser = BimObjectsParser.new(data)
bim_object_parser.bim_properties.each do |name|
bim_property = model.bim_properties.find_or_initialize_by(name: name)
unless bim_property.model_ids.include?(model.id)
bim_property.model_ids << model.id
end
bim_property.save
end
now = Time.now
valid_bim_objects = bim_object_parser.bim_objects.map do |bim_object_attrs|
bim_object = model.bim_objects.build(bim_object_attrs)
if bim_object.valid?
bim_object.as_document.merge({ created_at: now, updated_at: now })
end
end
model.bim_objects.collection.insert(valid_bim_objects)
end
end
Thanks to solution presented above we were able to reduce the number of requests from 15k into 5k, but we can make it even better. Be aware that part of the code responsible for saving properties isn’t optimal. We could reduce find_or_initialize_by calls. To do that, we can use some kind of cache which stores only unique properties.
class ImportBimObjectsService
class BimPropertyUniqCache
attr_accessor :objects
def initialize
@objects = []
end
def add(name, model_id)
if objects[name].present?
unless objects[name].include?(model_id)
@objects[name] << model_id
end
else
@objects[name] = [model_id]
end
end
end
def call(model, data)
bim_object_parser = BimObjectsParser.new(data)
bim_property_cache = BimPropertyUniqCache.new
bim_object_parser.bim_properties.each do |name|
bim_property_cache.add(name, model.id)
end
bim_property_cache.objects.each do |(name, model_ids)|
bim_property = model.bim_properties.find_or_initialize_by(name: name)
bim_property.model_ids += model_ids
bim_property.model_ids.uniq
bim_property.save
end
now = Time.now
valid_bim_objects = bim_object_parser.bim_objects.map do |bim_object_attrs|
bim_object = model.bim_objects.build(bim_object_attrs)
if bim_object.valid?
bim_object.as_document.merge({ created_at: now, updated_at: now })
end
end
model.bim_objects.collection.insert(valid_bim_objects)
end
end
Thanks to BimPropertyUniqCache class we were able to avoid unnecessary requests.
Conclusion
Remember that access time to MongoDB locally is faster than on Heroku. You can easly bypass it by using mass insert. Unluckily by insert we’re skipping validations and we need to validate records before. We’re forced to write more code, but processing time is significantly decreased. Eventually importing a 10MB file takes around 1 min.