# Why saving data using MongoHQ takes so long?

Recently in one of the projects we've taken over, which uses MongoDB and is hosted on Heroku, we have been asked to speed up an importing file process. It seemed to be a simple thing but without a few tricks we wouldn't have made it. I need to say a few things first, we're using a standard Heroku app configuration and MongoHQ addon. It's been enough for our needs so far. The question is why **import of 10MB file takes so long? It took around 1h.** You would say that the problem is in the code, but it's a half-truth. Below you can see an example of a service to import files with BIM (Building Information Model) objects. Every BIM object has properties. The properties may be duplicated. The important thing is that we have 2 loops here. 10MB file may include 10k objects and 5k properties, so the service has to save 15k records in DB. ```ruby class ImportBimObjectsService def call(model, data) bim_object_parser = BimObjectsParser.new(data) bim_object_parser.bim_properties.each do |name| bim_property = model.bim_properties.find_or_initialize_by(name: name) unless bim_property.model_ids.include?(model.id) bim_property.model_ids << model.id end bim_property.save end bim_object_parser.bim_objects.each do |bim_object_attrs| model.bim_objects.create(bim_object_attrs) end end end ``` We can imagine where the problem is. 15k requests to DB isn't a small number, especially when we're using MongoHQ and Heroku. **Usually DB server is in a different location than webserver, so the latency isn't so small like it's on local environment.** In our case the difference between Heroku and local environment was quite big. On development we were able to import the file in 7 min, on Heroku in 1h. ## How to minimize the number of requests? We can use MongoDB _insert_ method, however _insert_ doesn't run validations and it's on our hands to make sure that our model is correct. We can compare _insert_ with storing raw data in DB. There is the last thing to remember, before we store data, we have to add fields like _updated_\__at_ and _created_\__at_ to attributes. ```ruby class ImportBimObjectsService def call(model, data) bim_object_parser = BimObjectsParser.new(data) bim_object_parser.bim_properties.each do |name| bim_property = model.bim_properties.find_or_initialize_by(name: name) unless bim_property.model_ids.include?(model.id) bim_property.model_ids << model.id end bim_property.save end now = Time.now valid_bim_objects = bim_object_parser.bim_objects.map do |bim_object_attrs| bim_object = model.bim_objects.build(bim_object_attrs) if bim_object.valid? bim_object.as_document.merge({ created_at: now, updated_at: now }) end end model.bim_objects.collection.insert(valid_bim_objects) end end ``` Thanks to solution presented above we were able to reduce the number of requests from 15k into 5k, but we can make it even better. Be aware that part of the code responsible for saving properties isn't optimal. We could reduce _find\_or\_initialize\_by_ calls. To do that, we can use some kind of cache which stores only unique properties. ```ruby class ImportBimObjectsService class BimPropertyUniqCache attr_accessor :objects def initialize @objects = [] end def add(name, model_id) if objects[name].present? unless objects[name].include?(model_id) @objects[name] << model_id end else @objects[name] = [model_id] end end end def call(model, data) bim_object_parser = BimObjectsParser.new(data) bim_property_cache = BimPropertyUniqCache.new bim_object_parser.bim_properties.each do |name| bim_property_cache.add(name, model.id) end bim_property_cache.objects.each do |(name, model_ids)| bim_property = model.bim_properties.find_or_initialize_by(name: name) bim_property.model_ids += model_ids bim_property.model_ids.uniq bim_property.save end now = Time.now valid_bim_objects = bim_object_parser.bim_objects.map do |bim_object_attrs| bim_object = model.bim_objects.build(bim_object_attrs) if bim_object.valid? bim_object.as_document.merge({ created_at: now, updated_at: now }) end end model.bim_objects.collection.insert(valid_bim_objects) end end ``` Thanks to _BimPropertyUniqCache_ class we were able to avoid unnecessary requests. ## Conclusion Remember that access time to MongoDB locally is faster than on Heroku. You can easly bypass it by using mass insert. Unluckily by _insert_ we're skipping validations and we need to validate records before. We're forced to write more code, but processing time is significantly decreased. Eventually importing a 10MB file takes around 1 min. ## References - http://docs.mongodb.org/manual/reference/method/db.collection.insert/