Six ways to prevent a monkey-patch drift from the original code
… and check why 5600+ Rails engineers read also this
Six ways to prevent a monkey-patch drift from the original code
Monkey-patching in short is modifying external code, whose source we don’t directly control, to fit our specific purpose in the project. When modernising framework stack in “legacy” projects this is often a necessity when an upgrade of a dependency is not yet possible or would involve moving too many blocks at once.
It’s a short-term solution to move things forward. The reward we get from monkey-patching is instant. The code is changed without asking for anyone’s permission and without much extra work that a dependency fork would involve.
But it comes with a hefty price of being very brittle. We absolutely cannot expect that a monkey-patch would work with any future versions of the external dependency. Thus communicating this short-term loan is crucial when we’re not soloing.
Guarding patched dependency changes with version check
One way to communicate a monkey-patched dependency is to document it with a test.
Why test?
It is close to the changed code — in the project source, as opposed to any external documentation medium.
It is executable, unlike code comment and greatly reduces the risk of someone not noticing an announcement.
In a project I’ve recently worked on there was already an unannounced AcitveRecord::Persistence#reload
method patch inside User
model. I consider myself very lucky spotting within over 910000 lines of code having only 10% test coverage.
A code comment would definitely not help me notice it — I came to this project only recently and the authors were already working on it before were gone.
A test I’ve added to document it looked like this:
# spec/models/user_spec.rb
require "rails_helper"
RSpec.describe "User" do
specify "#reload method is overridden based on framework implementation" do
expect(Rails.version).to eq("5.1.7"), failure_message
end
def failure_message
<<~WARN
It looks like you upgraded Rails.
Check if User#reload method body corresponds to the current Rails version implementation of rails/activerecord/lib/active_record/persistence.rb#reload.
When it's ready bump the version in this condition.
WARN
end
end
Now whenever Rails version changes, this check is supposed to fail. The failure has a descriptive message instructing what needs to be checked in order to prolong the patch.
Within an organisation relying on continuous integration and aspiring to the testing culture this should be enough to prevent failure from such patching.
But is it enough developer-friendly?
Improving version check
One drawback of strict version checks is …being too strict. Some dependencies are best allowed within a range of possible versions. To not fail the check on version changes mitigating security issues for example:
--- spec/app/models/user_spec.rb
+++ spec/app/models/user_spec.rb
@@ -4,7 +4,7 @@ require 'rails_helper'
- expect(Rails.version).to eq('7.0.7'), failure_message
+ expect(Rails.version).to eq('7.0.7.2'), failure_message
+ # Mitigate CVE-2023-38037
+ # https://discuss.rubyonrails.org/t/cve-2023-38037-possible-file-disclosure-of-locally-encrypted-files/83544
When we’re certain that a dependency follows a meaningful version numbering scheme, we can change the check to verify more relaxed version constraints.
An example using RubyGems API:
--- spec/app/models/user_spec.rb
+++ spec/app/models/user_spec.rb
@@ -4,7 +4,7 @@ require 'rails_helper'
- expect(Rails.version).to eq('7.0.7.2'), failure_message
+ expect(Gem::Requirement.create('~> 7.0.0').satisfied_by?(Gem::Version.create(Rails.version))).to eq(true), failure_message
But do we know for sure that a security-patch-release does not change the code we’ve patched?
Checking if the source did not change
It would be ideal if we could peek into the source of the method we’re patching and tell if it has changed since the original one.
Reading a tweet from my former arkency fellow shed new light on the issue.
A complete test utilising this technique may look like this:
# spec/models/user_spec.rb
require "rails_helper"
RSpec.describe "User" do
specify "#reload method is overridden based on framework implementation" do
expect(checksum_of_actual_reload_implementation).to eq(
checksum_of_expected_reload_implementation,
),
failure_message
private
def checksum_of_actual_reload_implementation
Digest::SHA256.hexdigest(
ActiveRecord::Persistence.instance_method(:reload).source,
)
end
def checksum_of_expected_reload_implementation
"3bf4f24fb7f24f75492b979f7643c78d6ddf8b9f5fbd182f82f3dd3d4c9f1600"
end
def failure_message
#...
end
end
end
The little disappointment came shortly after I’ve realised Method#source is not a standard Ruby. It is from a method_source
dependency that came to the project I’ve worked on indirectly via pry
. Nevertheless it worked within the scope of existing project dependencies and was better than a plain version check.
Can we do any better?
Checking Abstract Syntax Tree of the implementation
I admit that computing the hash of the source code is neat. However, it falls short of “formatting” changes. Source code is a textual representation. Introducing whitespace characters — spaces or line breaks does not change the implementation. It behaves the same. The hash will be different though, raising a false negative.
So can we do it better? Yes, we can. With little help of AST. In theory AST representation should free us from how the patched code is formatted.
In Ruby, we have a few options to render AST of the source code. The popular parser
and syntax_tree
gems. The Ripper
in the standard library. Or the native RubyVM::AbstractSyntaxTree
.
A pessimist may notice their limitations first:
RubyVM::AbstractSyntaxTree
andRipper
still include formatting in the output, defeating the purposeparser
andsyntax_tree
are external dependencies, so not universally applicable — chances are they’re already a transitive dependency in your project
I definitely did not see it all at first sight. Here are the implementations I would not recommend.
False hopes for checksum free from formatting
In core Ruby there is RubyVM::AbstractSyntaxTree
module, which provides methods to parse Ruby code into abstract syntax trees. Unfortunately, the output includes line and column information, making it unfit for checksumming independent of source formatting. Thus it is not better in any aspect than hexdigest on plain source code.
# spec/models/user_spec.rb
require "rails_helper"
RSpec.describe "User" do
specify "#reload method is overridden based on framework implementation" do
expect(checksum_of_actual_reload_implementation).to eq(
checksum_of_expected_reload_implementation,
),
failure_message
end
private
def checksum_of_actual_reload_implementation
Digest::SHA256.hexdigest(
RubyVM::AbstractSyntaxTree.parse(
ActiveRecord::Persistence.instance_method(:reload).source,
).pretty_print_inspect,
)
end
def checksum_of_expected_reload_implementation
"ed2f4fdf62aece74173a44a65d8919ecf3e0fca7a5d38e2cefb9e51c408a4ab4"
end
end
No checksum for the added benefit of seeing actual implementation changes
In Ruby standard library we may also find Ripper
, a Ruby script parser. It parses the code into a symbolic expression tree. Unfortunately this too contains line and column information in the output. Perhaps with some additional post-processing step, we could get rid of it. I prefer comparing s-expressions to checksums — the test framework has a chance to show differences in the compared syntax trees. Which is a nice bonus!
# spec/models/user_spec.rb
require "rails_helper"
RSpec.describe "User" do
specify "#reload method is overridden based on framework implementation" do
expect(actual_find_record_implementation).to eq(
expected_find_record_implementation,
),
failure_message
end
private
def actual_reload_implementation
Ripper.sexp(ActiveRecord::Persistence.instance_method(:reload).source)
end
def expected_reload_implementation
[
:program,
[
[
:def,
[:@ident, "reload", [1, 8]],
[
:paren,
[
:params,
nil,
[
[
[:@ident, "options", [1, 15]],
[:var_ref, [:@kw, "nil", [1, 25]]],
],
],
nil,
nil,
nil,
nil,
nil,
],
],
[
:bodystmt,
[
[
:call,
[
:call,
[
:call,
[:var_ref, [:@kw, "self", [2, 6]]],
[:@period, ".", [2, 10]],
[:@ident, "class", [2, 11]],
],
[:@period, ".", [2, 16]],
[:@ident, "connection", [2, 17]],
],
[:@period, ".", [2, 27]],
[:@ident, "clear_query_cache", [2, 28]],
],
[
:assign,
[:var_field, [:@ident, "fresh_object", [4, 6]]],
[
:if,
[
:method_add_arg,
[:fcall, [:@ident, "apply_scoping?", [4, 24]]],
[
:arg_paren,
[
:args_add_block,
[[:var_ref, [:@ident, "options", [4, 39]]]],
false,
],
],
],
[
[
:method_add_arg,
[:fcall, [:@ident, "_find_record", [5, 8]]],
[
:arg_paren,
[
:args_add_block,
[[:var_ref, [:@ident, "options", [5, 21]]]],
false,
],
],
],
],
[
:else,
[
[
:method_add_block,
[
:call,
[
:call,
[:var_ref, [:@kw, "self", [7, 8]]],
[:@period, ".", [7, 12]],
[:@ident, "class", [7, 13]],
],
[:@period, ".", [7, 18]],
[:@ident, "unscoped", [7, 19]],
],
[
:brace_block,
nil,
[
[
:method_add_arg,
[:fcall, [:@ident, "_find_record", [7, 30]]],
[
:arg_paren,
[
:args_add_block,
[[:var_ref, [:@ident, "options", [7, 43]]]],
false,
],
],
],
],
],
],
],
],
],
],
[
:assign,
[:var_field, [:@ivar, "@association_cache", [10, 6]]],
[
:method_add_arg,
[
:call,
[:var_ref, [:@ident, "fresh_object", [10, 27]]],
[:@period, ".", [10, 39]],
[:@ident, "instance_variable_get", [10, 40]],
],
[
:arg_paren,
[
:args_add_block,
[
[
:symbol_literal,
[:symbol, [:@ivar, "@association_cache", [10, 63]]],
],
],
false,
],
],
],
],
[
:assign,
[:var_field, [:@ivar, "@attributes", [11, 6]]],
[
:method_add_arg,
[
:call,
[:var_ref, [:@ident, "fresh_object", [11, 20]]],
[:@period, ".", [11, 32]],
[:@ident, "instance_variable_get", [11, 33]],
],
[
:arg_paren,
[
:args_add_block,
[
[
:symbol_literal,
[:symbol, [:@ivar, "@attributes", [11, 56]]],
],
],
false,
],
],
],
],
[
:assign,
[:var_field, [:@ivar, "@new_record", [12, 6]]],
[:var_ref, [:@kw, "false", [12, 20]]],
],
[
:assign,
[:var_field, [:@ivar, "@previously_new_record", [13, 6]]],
[:var_ref, [:@kw, "false", [13, 31]]],
],
[:var_ref, [:@kw, "self", [14, 6]]],
],
nil,
nil,
nil,
],
],
],
]
end
def failure_message
# ...
end
end
The final boss
Final, “pragmatic” implementation that I’m sticking with. It depends on parser
and method_source
gems. I’ve made peace with them, as they’re already in the project via pry
, mutant
and rubocop
additions.
require "parser/current"
RSpec.describe "User" do
include AST::Sexp
specify "#reload method is overridden based on framework implementation" do
expect(actual_find_record_implementation).to eq(
expected_find_record_implementation,
),
failure_message
end
private
def actual_reload_implementation
Parser::CurrentRuby.parse(
ActiveRecord::Persistence.instance_method(:reload).source,
)
end
def expected_reload_implementation
s(
:def,
:reload,
s(:args, s(:optarg, :options, s(:nil))),
s(
:begin,
s(
:send,
s(:send, s(:send, s(:self), :class), :connection),
:clear_query_cache,
),
s(
:lvasgn,
:fresh_object,
s(
:if,
s(:send, nil, :apply_scoping?, s(:lvar, :options)),
s(:send, nil, :_find_record, s(:lvar, :options)),
s(
:block,
s(:send, s(:send, s(:self), :class), :unscoped),
s(:args),
s(:send, nil, :_find_record, s(:lvar, :options)),
),
),
),
s(
:ivasgn,
:@association_cache,
s(
:send,
s(:lvar, :fresh_object),
:instance_variable_get,
s(:sym, :@association_cache),
),
),
s(
:ivasgn,
:@attributes,
s(
:send,
s(:lvar, :fresh_object),
:instance_variable_get,
s(:sym, :@attributes),
),
),
s(:ivasgn, :@new_record, s(:false)),
s(:ivasgn, :@previously_new_record, s(:false)),
s(:self),
),
)
end
def failure_message
# ...
end
end
As you can see, there are no line or column references in the output.
For the portability, I wish those dependencies weren’t needed. Hopefully one day this all will be easier in the future Ruby:
Fingers crossed 🤞