What I learned from reading spreadsheet_architect code

… and check why 5600+ Rails engineers read also this

What I learned from reading spreadsheet_architect code

Recently I heard about spreadsheet_architect gem and I wondered a few things after reading its README. It all lead me to look into its code and wonder…

What started my curiosity was the API:

class Post < ActiveRecord::Base
  include SpreadsheetArchitect
end

Post.order(name: :asc).where(published: true).to_xlsx

I was like wow, why would anyone add such a thing to a model? Why is it a good idea to extend my ActiveRecord class (which most likely already has tons of responsibilities) with methods responsible for generating reporting files. Dunno. Doesn’t sound like the best use of Single responsibility principle.

But I was like… There must be a way to avoid it. After all, this gem can work with normal classes as well as documented:

Post.to_xlsx(instances: posts_array)

or another way:

headers = ['Col 1','Col 2','Col 3']
data = [[1,2,3], [4,5,6], [7,8,9]]
SpreadsheetArchitect.to_xlsx(data: data, headers: headers)

So probably including SpreadsheetArchitect is not mandatory.

I was thinking about checking the code to see how it gets the list of records and implementing a compatible interface inside a different class. I was pretty sure the method was #to_a because why not? What else could it be? So I hoped for this kind of workaround:

class PostsReport
  include SpreadsheetArchitect

  def to_a
    Post.order(name: :asc).where(published: true).to_a
  end
end

or

class PostsReport
  include SpreadsheetArchitect

  def initialize(collection)
    @collection = collection
  end

  def to_a
    @collection.to_a
  end
end

if we want to dynamically build and pass the collection for generating a report.

Then I could define methods such as def spreadsheet_columns which configure what is displayed in PostsReport instead of Posts because that sounded better to me. Less coupling, and bigger separation. Probably easier if we need to support multiple reports.

So I started looking into the code, how it works internally, to confirm my guesses.

module SpreadsheetArchitect
  module ClassMethods
    def to_csv(opts={})
      opts = SpreadsheetArchitect::Utils.get_cell_data(opts, self)
      # ...

It turned out the logic was implemented in get_cell_data. Let’s see what inside. It had 80 lines of code including this:

def self.get_cell_data(options={}, klass)
  # ...
  if !options[:instances] && defined?(ActiveRecord) && klass.ancestors.include?(ActiveRecord::Base)
    # triggers the relation call,
    # not sure how this works but it does
    options[:instances] = klass.where(nil).to_a
  end
  # ...

Ok, so it turned out it is not just to_a but rather where(nil).to_a. I am not exactly sure why would where(nil) be necessary. At this point, I decided that working around the API is probably not worth it and too hard.

But I got curious how can it work? Because you see…

And now something completely different

  • to_csv, to_xlsx, methods are defined on Post (if you include the module)
Post.method(:to_csv)
=> #<Method: Class(SpreadsheetArchitect::ClassMethods)#to_csv>
  • The example shows
Post.order(name: :asc).to_xlsx
  • but that is a relation
Post.order(name: :asc).class
# => Post::ActiveRecord_Relation
  • so this check should fail
klass.ancestors.include?(ActiveRecord::Base)
  • because
Post::ActiveRecord_Relation.ancestors.include?(ActiveRecord::Base)
# => NameError: private constant #<Class:0x0000000478a240>::ActiveRecord_Relation referenced
# doh...

# workaround
Post.const_get(:ActiveRecord_Relation).ancestors.include?(ActiveRecord::Base)
# => false

Yep, Ruby has private classes and they are used here.

So how can this gem work with relations? Let’s try something…

class Post < ApplicationRecord
  include SpreadsheetArchitect

  def self.bump
    puts self.inspect
  end
end

Nothing out of ordinary here:

Post.bump
# Post(id: integer, ...)
Post.where(id: 1).inspect
# Post Load (0.6ms)  SELECT  "posts".* FROM "posts" WHERE "posts"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 11]]
# => "#<ActiveRecord::Relation [#<Post id: 1, ...">]>"

But check this out:

Post.where(id: 1).bump
# Post(id: integer, ...)

Even though this is ActiveRecord::Relation, methods which are dynamically executed report that self is the Post class.

Let’s investigate further:

Post.where(id: 1).method(:method_missing)
# => #<Method: Post::ActiveRecord_Relation(ActiveRecord::Delegation::ClassSpecificRelation)#method_missing>

Post.where(id: 1).method(:method_missing).source_location
# => [".rvm/gems/ruby-2.4.1/gems/activerecord-5.1.2/lib/active_record/relation/delegation.rb", 87]

Let’s see how this Rails magic works:

def method_missing(method, *args, &block)
  if @klass.respond_to?(method)
    self.class.delegate_to_scoped_klass(method)
    scoping { @klass.public_send(method, *args, &block) }
  elsif arel.respond_to?(method)
    self.class.delegate method, to: :arel
    arel.public_send(method, *args, &block)
  else
    super
  end
end
  • as you can see the method bump is delegated to the class directly. So that’s why self is Post.
  • But why is only a subset of records used for generating the file, and not all records, when the method is delegated directly to a class? Because scoping is used, which works like a global (probably a thread-safe global).
Post.where(id: 1).method(:scoping).source_location
# => [".rvm/gems/ruby-2.4.1/gems/activerecord-5.1.2/lib/active_record/relation.rb", 334]

# Scope all queries to the current scope.
#
#   Comment.where(post_id: 1).scoping do
#     Comment.first
#   end
#   # => SELECT "comments".* FROM "comments"
#        WHERE "comments"."post_id" = 1
#        ORDER BY "comments"."id" ASC
#        LIMIT 1
#
# Please check unscoped if you want to remove all previous scopes (including
# the default_scope) during the execution of a block.
def scoping
  previous, klass.current_scope = klass.current_scope, self
  yield
ensure
  klass.current_scope = previous
end

In other words before Post#to_xlsx is called, Post.current_scope is set temporarily and as a result Post.where(nil).to_a called by the SpreadsheetArchitect is limited in scope and does not include all records. That’s how it works.

via GIPHY

Wanna learn more?

Subscribe to our newsletter to receive weekly free Ruby and Rails lessons.

You might also like