How to store large files on MongoDB?
… and check why 5600+ Rails engineers read also this
How to store large files on MongoDB?
The common problem we deal with is importing files containing a large amount of records. In my previous article I’ve presented how to speed up saving data in MongoDB. In this article i will focus on how we can store these files.
Sometimes we want to store file first and parse it later. This is the case when you use async workers like Sidekiq. To workaround this problem you need to store the file somewhere.
First solution
MongoDB allows us to store files smaller than 16MB as a string in DB. We can simply do it by putting all the data in file_data attribute.
class FileContainer
include Mongoid::Document
include Mongoid::Timestamps
field :file_name, type: String
field :file_format, type: String
field :file_data, type: String
end
file = File.open(xls_file_path)
FileContainer.new.tap do |file_container|
file_container.file_format = File.extname(file.path)
file_container.file_name = File.basename(file.path, file_container.file_format)
file_container.file_data = file.read
file_container.save
end
The code above may work well if you upload files smaller than 16MB, but sometimes users want to import (or store) files even larger. The bad thing in presented code is that we are losing information about the original file. That thing may be very helpful when you need to open the file in a different encoding. It’s always good to have the original file.
Second solution
In this case we’ll use a concept called GridFS. This is MongoDB module for storing files. To enable this feature in Rails we need to import a library called mongoid-grid_fs. The lib gives us access to methods such as:
- grid_fs.put(file_path) - to put file in GridFS
- grid_fs.get(id) - to load file by id
- grid_fs.delete(id) - to delete file
require 'mongoid/grid_fs'
class FileContainer
include Mongoid::Document
include Mongoid::Timestamps
field :grid_fs_id, type: String
end
file = File.open(xls_file_path)
grid_fs = Mongoid::GridFs
grid_file = grid_fs.put(file.path)
FileContainer.new.tap do |file_container|
file_container.grid_fs_id = grid_file.id
file_container.save
end
In the second solution we are storing the original file. We can do anything what we want with it. GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory.