How to store large files on MongoDB?

… and check why 5600+ Rails engineers read also this

How to store large files on MongoDB?

The common problem we deal with is importing files containing a large amount of records. In my previous article I’ve presented how to speed up saving data in MongoDB. In this article i will focus on how we can store these files.

Sometimes we want to store file first and parse it later. This is the case when you use async workers like Sidekiq. To workaround this problem you need to store the file somewhere.

First solution

MongoDB allows us to store files smaller than 16MB as a string in DB. We can simply do it by putting all the data in file_data attribute.

class FileContainer
  include Mongoid::Document
  include Mongoid::Timestamps

  field :file_name,   type: String
  field :file_format, type: String
  field :file_data,   type: String
end

file = File.open(xls_file_path)
FileContainer.new.tap do |file_container|
  file_container.file_format  = File.extname(file.path)
  file_container.file_name    = File.basename(file.path, file_container.file_format)
  file_container.file_data    = file.read
  file_container.save
end

The code above may work well if you upload files smaller than 16MB, but sometimes users want to import (or store) files even larger. The bad thing in presented code is that we are losing information about the original file. That thing may be very helpful when you need to open the file in a different encoding. It’s always good to have the original file.

Second solution

In this case we’ll use a concept called GridFS. This is MongoDB module for storing files. To enable this feature in Rails we need to import a library called mongoid-grid_fs. The lib gives us access to methods such as:

  • grid_fs.put(file_path) - to put file in GridFS
  • grid_fs.get(id) - to load file by id
  • grid_fs.delete(id) - to delete file
require 'mongoid/grid_fs'

class FileContainer
  include Mongoid::Document
  include Mongoid::Timestamps

  field :grid_fs_id, type: String
end

file = File.open(xls_file_path)

grid_fs   = Mongoid::GridFs
grid_file = grid_fs.put(file.path)

FileContainer.new.tap do |file_container|
  file_container.grid_fs_id = grid_file.id
  file_container.save
end

In the second solution we are storing the original file. We can do anything what we want with it. GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory.

References

You might also like