When I first built job-iteration, the idea was elegant: let job authors return an ActiveRecord relation from build_enumerable
, and the framework would handle the rest. Simple interface, powerful abstraction.
class MyJob
include Jobs::Iteration
def build_enumerable(params)
Shop.where(active: true) # Just return a relation
end
def each_record(shop, params)
# Framework handles batching, cursors, interruption
end
end
The framework would introspect the relation, figure out cursor columns, handle batching, and preserve state across interruptions. Job authors stayed blissfully unaware of the complexity.
The Growing Beast
But then reality happened. Shopify needed more than just simple relations:
- Sharded relations that required special database handling
- Plain arrays for non-ActiveRecord data
- Different cursor strategies for different column types
Jobs::Iteration
grew to handle them all. By that point, it had become this:
enum SupportedEnumerable<R> { ARRelation(ActiveRecord::Relation<R>) # Regular AR relations ShardedARRelation(ActiveRecord::Relation<R>) # Sharded AR relations PopEnumerable(LockQueue<R>) # Queue-based iteration PlainEnumerable(Array<R where R: NotActiveRecord>) # Plain arrays # Plus some other variant I'm missing }
And the actual branching logic that handled them:
if collection.is_a?(ActiveRecord::Relation)
cursor_columns = Array.wrap(iteration_options[:columns] || "#{collection.table_name}.#{collection.primary_key}")
options = iteration_options.merge(
columns: cursor_columns,
start: cursor_position,
batch_size: batch_size,
)
find_in_batches(collection, options) do |records|
# Different logic for each_record vs each_batch
# Different cursor position tracking
end
elsif collection_is_a?(Array)
# Handle plain Array
elsif collection_is_a?(CSV)
# Completely different path for arrays, queues, etc.
elsif ...
end
Guillaume's Challenge
In August 2017, my colleague Guillaume looked at this and said: "I'm not smart enough to understand Jobs::Iteration
."
But instead of walking away, he made a provocative argument. What if the complexity wasn't inevitable? What if Jobs::Iteration
was trying to be too smart?
His insight was about types and interfaces. Instead of:
# Jobs::Iteration knowing about everything if collection.is_a?(ActiveRecord::Relation) # complex cursor logic # sharing logic # batching logic elsif collection.is_a?(Array) # different logic end
Move to:
# Jobs::Iteration knowing about nothing enumerable.records.each do |record, cursor| # just consume tuples end
The Resistance Point
My natural resistance was: "Why make job authors deal with enumerators instead of just passing relations?"
Guillaume's answer was essentially about cognitive load distribution. Yes, individual jobs become slightly more complex, but the system's total complexity decreases because:
- Each enumeration strategy is isolated
- The strategies become independently testable
- The core iteration logic becomes trivial
- New enumeration types don't require touching the core
The Conclusion
The decision taught me something counterintuitive: sometimes more verbose code is better code.
Instead of one magical method that "just works":
def build_enumerable(params)
Shop.where(active: true)
end
Job authors would now write:
def build_enumerable(params, cursor:)
enumerator_builder.active_record_on_records(
Shop.where(active: true),
cursor: cursor
)
end
More lines? Yes. More explicit dependencies? Absolutely. But also more honest about what's actually happening, more testable, and paradoxically simpler because each piece does one thing well.
It was about admitting when elegant abstraction becomes cognitive overload, and choosing explicitness over magic - even when the magic was convenient.