How to use Algolia without coupling to ActiveRecord::Base
… and check why 5600+ Rails engineers read also this
How to use Algolia without coupling to ActiveRecord::Base
In my video course, I present using Algolia with Rails using the more direct integration provided by algoliasearch-rails
gem. Like many gems in Rails ecosystem, the integration relies on ActiveRecord::Base
and its callbacks. And while it certainly can be very convenient and fast to add to your app, there is also a certain amount of magic involved. Ie, when your classes are loaded, they send HTTP request to Algolia with the index settings. And for me, that’s a big no-no. I prefer the more explicit approach in which I treat those settings as database schema and update it in migrations so there is a history in the code.
But Algolia made a good decision by splitting their solution into 2 gems. There is algoliasearch gem written in Ruby and not coupled at all to Rails. And there is algoliasearch-rails
which integrates with the Rails ecosystem and ActiveRecord::Base
in particular. And you are free to just not use it :) You don’t like Rails magic? You can opt out from it. I like it!
Use algoliasearch gem instead of algoliasearch-rails
https://github.com/algolia/algoliasearch-client-ruby
The first thing you need to know is how to configure your search indexes.
require 'algoliasearch'
Algolia.init(
application_id: '...',
api_key: '...',
)
freeride = Algolia::Index.new("freeride")
freeride.set_settings({
searchableAttributes: %w[title subtitle unordered(description)],
attributesToSnippet: %w[description],
attributesForFaceting: %w[category filterOnly(starts_at) filterOnly(ends_at)],
replicas: ["freeride_by_starts_at_asc_development"],
}, {
forwardToReplicas: true,
})
f_replica = Algolia::Index.new("freeride_by_starts_at_asc_development")
f_replica.set_settings({
ranking: ["custom"],
customRanking: ["asc(starts_at)"],
})
If you want to make sure not a single value changes over time due to some defaults that Algolia might introduce, then use get_settings method to obtain all possible config values with their defaults and provide all of them:
freeride = Algolia::Index.new("freeride")
freeride.set_settings({
:replicas=>["freeride_by_starts_at_asc_development"],
:attributesForFaceting=> ["category", "filterOnly(starts_at)", "filterOnly(ends_at)"],
:attributesToSnippet=>["description:10"],
:searchableAttributes=>["title", "subtitle", "unordered(description)"],
:minWordSizefor1Typo=>4,
:minWordSizefor2Typos=>8,
:hitsPerPage=>20,
:maxValuesPerFacet=>100,
:version=>2,
:numericAttributesToIndex=>nil,
:attributesToRetrieve=>nil,
:unretrievableAttributes=>nil,
:optionalWords=>nil,
:attributesToHighlight=>nil,
:paginationLimitedTo=>1000,
:attributeForDistinct=>nil,
:exactOnSingleWordQuery=>"attribute",
:ranking=>
["typo",
"geo",
"words",
"filters",
"proximity",
"attribute",
"exact",
"custom"],
:customRanking=>nil,
:separatorsToIndex=>"",
:removeWordsIfNoResults=>"none",
:queryType=>"prefixLast",
:highlightPreTag=>"<em>",
:highlightPostTag=>"</em>",
:snippetEllipsisText=>"",
:alternativesAsExact=>["ignorePlurals", "singleWordSynonym"]
})
If you have many indices to configure it’s OK to create a class for building those configs more dynamically:
module Searching
class EventIndexConfiguration
def initialize(env: Rails.env)
@env_name = env.to_s
end
def fetch(index_name)
settings.fetch(index_name)
end
def index_names
settings.keys
end
def event
baseline_configuration.merge({
"replicas" => [
"#{env_name}_event_starts_at_asc",
"#{env_name}_event_starts_at_desc",
]
})
end
def event_starts_at_asc
set_ranking("asc(starts_at)")
end
def event_starts_at_desc
set_ranking("desc(starts_at)")
end
def primary_index_name
"#{env_name}_event"
end
def sortable_index_names
index_names.map { |index_name| "#{env_name}_#{index_name}" }
end
private
def settings
{
event: event,
event_price_asc: event_starts_at_asc,
event_price_desc: event_starts_at_desc,
}.with_indifferent_access
end
def env_name
@env_name
end
def set_ranking(field)
replica_configuration.merge({
"customRanking" => field,
})
end
def replica_configuration
baseline_configuration.merge({
"primary" => primary_index_name,
"ranking" => [
"custom",
"typo",
"geo",
"words",
"filters",
"proximity",
"attribute",
"exact",
]
})
end
def baseline_configuration
{
:attributesForFaceting=> ["category", "filterOnly(starts_at)", "filterOnly(ends_at)"],
:attributesToSnippet=>["description:10"],
:searchableAttributes=>["title", "subtitle", "unordered(description)"],
:minWordSizefor1Typo=>4,
:minWordSizefor2Typos=>8,
:hitsPerPage=>20,
:maxValuesPerFacet=>100,
:version=>2,
:numericAttributesToIndex=>nil,
:attributesToRetrieve=>nil,
:unretrievableAttributes=>nil,
:optionalWords=>nil,
:attributesToHighlight=>nil,
:paginationLimitedTo=>1000,
:attributeForDistinct=>nil,
:exactOnSingleWordQuery=>"attribute",
:ranking=>
["typo",
"geo",
"words",
"filters",
"proximity",
"attribute",
"exact",
"custom"],
:customRanking=>nil,
:separatorsToIndex=>"",
:removeWordsIfNoResults=>"none",
:queryType=>"prefixLast",
:highlightPreTag=>"<em>",
:highlightPostTag=>"</em>",
:snippetEllipsisText=>"",
:alternativesAsExact=>["ignorePlurals", "singleWordSynonym"]
}
end
end
end
And that’s how you can work with index settings without putting them into ActiveRecord class like algoliasearch-rails
does:
class Event < ApplicationRecord
include AlgoliaSearch
algoliasearch do
searchableAttributes %w[title subtitle unordered(description)]
attributesToSnippet %w[description]
attributesForFaceting %w[category filterOnly(starts_at) filterOnly(ends_at)]
add_replica STARTS_AT_ASC_INDEX, inherit: true do
ranking ['custom']
customRanking ['asc(starts_at)']
end
That’s the 1st step to decoupling this code from ActiveRecord.
Integrate using domain events and handlers
Now we need something to use instead of ActiveRecord callbacks to trigger indexing of our records.
We can use meaningful domain events and event handlers over callbacks. This can be done with RailsEventStore
, or anything else that you use like RabbitMQ
or Kafka
or SQS
. The premise is identical. Publish info about changes happening in your application and in reaction update the search index.
There are 2 approaches that can you can go with. Full reindexing all the time or partial updates. Full reindexing is usually a safer approach. You use the domain events only as a trigger to gather all the data and send an updated version of your object to Elastic
or Algolia
. It’s easy to handle retries in case of a networking error because you can just build a new version of your object again based on latest data and send it. But the downside is that you need to have a way of collecting all the necessary data. Sometimes it might be simple and it could be just mapping your active record attributes to a proper json
. But sometimes you might have an event sourced aggregate and you don’t want it to expose its internal fields. Or your search object (remember: read model) might have attributes coming from multiple objects from the write-side of your app. For example, the party in your search index can have data from the part, from its organizer, from attendees, from the venue, etc etc.
If you go with full reindexing you are usually going to have a mapper converting the attributes and their format from your domain object to search object.
module Searching
class EventToAlgoliaMapper
def to_hash(event)
{
starts_at: event.starts_at.utc.to_i,
ends_at: event.ends_at.utc.to_i,
title: event.title,
description: event.description,
category: event.category,
state: event.state,
image_url: event.image.url,
}
end
end
class EventToAlgolia < ActiveJob::Base
def perform(fact)
fact = YAML.load(fact)
index = Algolia::Index.new(EventIndexConfiguration.new.primary_index_name)
mapper = EventToAlgoliaMapper.new
index.add_object(mapper.to_hash(Event.find(fact.data.fetch(:event_id))))
end
end
end
Rails.
configuration.
event_store.
subscribe(Searching::EventToAlgolia, [EventAdded])
As you can see, you put composed the solution together on your own, instead of putting everything into Event class like you would do with algoliasearch-rails
gem:
class Event < ApplicationRecord
include AlgoliaSearch
STARTS_AT_ASC_INDEX = "Event_by_starts_at_asc_#{Rails.env}"
ADMIN_INDEX_NAME = "Admin_Event_#{Rails.env}"
algoliasearch enqueue: true, per_environment: true do
attribute :title, :description, :category,
:state, :image_url
attribute :starts_at do
starts_at.to_i
end
attribute :ends_at do
ends_at.to_i
end
searchableAttributes %w[title subtitle unordered(description)]
attributesToSnippet %w[description]
attributesForFaceting %w[category filterOnly(starts_at) filterOnly(ends_at)]
add_replica STARTS_AT_ASC_INDEX, inherit: true do
ranking ['custom']
customRanking ['asc(starts_at)']
end
There is a class for index configuration, there are domain events triggering the indexing and re-indexing, there is a mapper for mapping attributes (ie Time to Unix timestamp in UTC), and there is a handler actually invoking the Algolia API. Everything under your control, and everything in plain sight. No direct coupling with ActiveRecord::Base
, just your app reactively updating the search index.
Partial updates, on the other hand, can be more convenient sometimes, especially when published events contain all the information necessary to perform an update, without the need to load domain object and map all fields. Imagine that you publish a domain event (fact) when someone moves a party to a different date. In such case you can add a handler reacting to the fact, which only updates 2 fields in the search index:
Rails.
configuration.
event_store.
subscribe(Searching::UpdatedEventDatesInAlgolia, [EventMoved])
module Searching
class UpdatedEventDatesInAlgolia < ActiveJob::Base
def perform(fact)
fact = YAML.load(fact)
index = Algolia::Index.new(EventIndexConfiguration.new.primary_index_name)
index.partial_update_object(
objectID: fact.event_id,
starts_at: fact.starts_at.utc.to_i,
ends_at: fact.ends_at.utc.to_i,
)
end
end
end
Someplace in your code, you publish the fact:
fact = EventMoved.new(data: {
event_id: 1,
starts_at: Time.utc(2018, 1, 13, 12),
ends_at: Time.utc(2018, 1, 13, 22)
})
event_store.publish_event(fact, stream_name: "Event$1")
P.S. If your domain is all about events (conferences, parties, exhibitions, concerts etc) then I like to use the synonym fact for domain events (which you publish and save in a DB).
Are you also feeling the pain of building search pages from scratch every time? Or maybe you just want to learn how to deal with it upfront? We have a video course that can help :)