Making race condition tests deterministic with Concurrent::CyclicBarrier and seam
… and check why 5600+ Rails engineers read also this
Making race condition tests deterministic with Concurrent::CyclicBarrier and seam
During pair programming we wanted to use advisory lock to prevent race conditions while implementing an order splitting feature. One of the obvious question that can be ask during such session is: “How do we know this lock is really needed?”
The answer seemed simple: Write a test that FAILS without the lock and PASSES with it.
Easier said than done.
The code we wanted to test
The important bit of this code is the projection that could be called at almost the same time, causing items to be transferred into two different target orders.
def perform(event)
case event
when OrderSplitIntoTwo
source_order_id = event.data[:source_order_id]
target_order_id = event.data[:target_order_id]
ApplicationRecord.with_advisory_lock('transfer_items', source_order_id) do
items = projection.items_in_order(source_order_id) # calculates which items should be transferred
transfer(items, source_order_id, target_order_id)
end
end
end
The lock protects against this scenario: two threads read the same projection state, then both transfer the same items to different orders. Without the lock, items can end up in multiple orders simultaneously.
But how do you prove this lock is necessary with a test?
Attempt 1: Run threads concurrently using Concurrent::CyclicBarrier to synchronize them
it 'prevents duplicate transfers' do
barrier = Concurrent::CyclicBarrier.new(2)
threads = [
Thread.new do
barrier.wait(1);
subject.perform(split_event_1)
end,
Thread.new do
barrier.wait(1);
subject.perform(split_event_2)
end,
]
threads.each(&:join)
items_in_target_1 = order_items(target_order_1_id)
items_in_target_2 = order_items(target_order_2_id)
expect(items_in_target_1 & items_in_target_2).to be_empty
end
Concurrent::CyclicBarrier is an amazing class for race condition testing. Usually I would start with this
simple approach and get the result that I need. However, this test didn’t fail.
It simply didn’t match the right timing for the race condition to take a place.
One could say, just delay it with some sleeps
Adding sleep statements makes it worse - slower and flaky. There’s nothing more frustrating than random failures in CI. I didn’t want to go this direction.
The insight: We have a seam. Lets use it.
Ok. Let me explain. A concept of seam was explained by from Michael Feathers’ in his famous book “Working Effectively with Legacy Code”.
“A seam is a place where you can alter behavior in your program without editing in that place.”
Looking at our code, the seam is the projection method:
def projection
@projection ||= OrderItemsProjection.new
end
This method is our enabling point. In tests, we can replace this projection with a controlled version that synchronizes threads - without changing production code.
The solution: Exploit the seam to inject synchronization
Here’s the key insight: the race happens when both threads READ the same projection state, then both try to WRITE transfers. We need both threads to:
- Read the projection
- Wait for each other (synchronization point)
- Then race to perform transfers
Concurrent::CyclicBarrier gives us exactly this synchronization primitive:
it 'fails without advisory lock, proving it is needed' do
barrier = Concurrent::CyclicBarrier.new(2)
shared_projection = OrderItemsProjection.new
original_method = shared_projection.method(:items_in_order)
# Intercept projection reads to synchronize both threads
allow(shared_projection).to receive(:items_in_order) do |order_id|
items = original_method.call(order_id)
# Both threads meet here after reading
barrier.wait(1)
items
end
subject_1 = TransferItems.new
subject_2 = TransferItems.new
# Inject controlled projection via the seam
allow(subject_1).to receive(:projection).and_return(shared_projection)
allow(subject_2).to receive(:projection).and_return(shared_projection)
results = Concurrent::Array.new
threads = [
Thread.new { subject_1.perform(split_event_1); results << :success_1 },
Thread.new { subject_2.perform(split_event_2); results << :success_2 }
]
threads.each(&:join)
items_in_target_1 = order_items(target_order_1_id)
items_in_target_2 = order_items(target_order_2_id)
expect(results.size).to eq(2) # make sure both threads finished what they had to do
expect(items_in_target_1 & items_in_target_2).to be_empty # the desired business state
end
One more word about tested scenario
In this specific case the race condition was related to transferring items between orders. Calculating proper target_id and moving those items ONCE was essential.
However, there are different cases in which this pattern could be helpful. You might want to test
concurrent database writes. Then, you have to keep in mind to decorate your test with uses_transaction.
The uses_transaction method prevents wrapping specified test in transaction. Therefore, keep in mind that you have
to take care about cleanup on your own.
uses_transaction('to prove advisory lock is needed')
it 'prevents duplicate transfers' do
barrier = Concurrent::CyclicBarrier.new(2)
threads = [
Thread.new do
barrier.wait(1);
subject.perform(split_event_1)
end,
Thread.new do
barrier.wait(1);
subject.perform(split_event_2)
end,
]
threads.each(&:join)
items_in_target_1 = order_items(target_order_1_id)
items_in_target_2 = order_items(target_order_2_id)
expect(items_in_target_1 & items_in_target_2).to be_empty
end
How CyclicBarrier synchronizes threads
The barrier is initialized with a count of 2 (the number of threads we want to synchronize). Here’s what happens:
- Thread 1 reads items that will be transferred
- Thread 1 calls
wait(1)and blocks, waiting for the other thread - Thread 2 reads items that will be transferred
- Thread 2 calls
wait(1)- the barrier count is now satisfied - Both threads are released simultaneously and proceed with identical projection state
- They race to perform transfers
This makes the race condition deterministic and reproducible.
Why this pattern works
Without changing a single line of production code, we:
- Found a seam - the projection method that returns a projection instance
- Changed method implementation by stubbing it
- Injected synchronization logic - CyclicBarrier coordination
- Made the race deterministic - both threads guaranteed to see the same state
The production code remains clean. No test hooks, no debug flags, no conditional logic.
The proof
Comment out the advisory lock:
# ApplicationRecord.with_advisory_lock('transfer_items', source_order_id) do
items = projection.items_in_order(source_order_id)
transfer(items, source_order_id, target_order_id)
# end
Run the test:
Failure/Error: expect(items_in_target_1 & items_in_target_2).to be_empty
expected: []
got: ["item_A", "item_B"]
(items appeared in both target orders - race condition detected!)
Perfect. The test proves the lock is necessary.
The takeaway
Testing race conditions might force you to synchronize the code that is tested. I don’t like to modify my production code to achieve that state. Instead, I prefer to slightly modify it to expose the race condition. CyclicBarrier is one of the ways in which you coordinate threads to reproduce race condition.
This pattern works for any projection-based race condition. Find your seam (usually dependency injection or method extraction), inject synchronization, and prove your concurrency protection actually works.