How to store large files on MongoDB?
The common problem we deal with is importing files containing a large amount of records. In my previous article I’ve presented how to speed up saving data in MongoDB. In this article i will focus on how we can store these files.
Sometimes we want to store file first and parse it later. This is the case when you use async workers like Sidekiq. To workaround this problem you need to store the file somewhere.
MongoDB allows us to store files smaller than 16MB as a string in DB. We can simply do it by putting all the data in file_data attribute.
class FileContainer include Mongoid::Document include Mongoid::Timestamps field :file_name, type: String field :file_format, type: String field :file_data, type: String end file = File.open(xls_file_path) FileContainer.new.tap do |file_container| file_container.file_format = File.extname(file.path) file_container.file_name = File.basename(file.path, file_container.file_format) file_container.file_data = file.read file_container.save end
The code above may work well if you upload files smaller than 16MB, but sometimes users want to import (or store) files even larger. The bad thing in presented code is that we are losing information about the original file. That thing may be very helpful when you need to open the file in a different encoding. It’s always good to have the original file.
In this case we’ll use a concept called GridFS. This is MongoDB module for storing files. To enable this feature in Rails we need to import a library called mongoid-grid_fs. The lib gives us access to methods such as:
- grid_fs.put(file_path) - to put file in GridFS
- grid_fs.get(id) - to load file by id
- grid_fs.delete(id) - to delete file
require 'mongoid/grid_fs' class FileContainer include Mongoid::Document include Mongoid::Timestamps field :grid_fs_id, type: String end file = File.open(xls_file_path) grid_fs = Mongoid::GridFs grid_file = grid_fs.put(file.path) FileContainer.new.tap do |file_container| file_container.grid_fs_id = grid_file.id file_container.save end
In the second solution we are storing the original file. We can do anything what we want with it. GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory.