Skip to content

Commit

Permalink
Support BatchEnumerator collections
Browse files Browse the repository at this point in the history
  • Loading branch information
adrianna-chang-shopify committed May 10, 2021
1 parent 1917ecb commit f39ef68
Show file tree
Hide file tree
Showing 8 changed files with 83 additions and 6 deletions.
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,39 @@ title,content
My Title,Hello World!
```

### Processing Batch Collections

The Maintenance Tasks gem supports processing Active Records in batches. This
can reduce the number of calls your Task makes to the database. Use
`ActiveRecord::Batches#in_batches` to specify that your Task should process
records in batches. The default batch size is 1000, but a custom size can be
specified.

```ruby
# app/tasks/maintenance/update_posts_in_batches_task.rb
module Maintenance
class UpdatePostsInBatchesTask < MaintenanceTasks::Task
def collection
Post.in_batches
end

def count
collection.count
end

def process(batch_of_posts)
batch_of_posts.update_all(content: "New content added on #{Time.now.utc}")
end
end
end
```

Ensure that you've implemented the following methods:
* `collection`: return an `ActiveRecord::Batches::BatchEnumerator`.
* `process`: do the work of your Task on a batch (`ActiveRecord::Relation`).
* `count`: return the number of rows that will be iterated over (this should
still be the number of records to be processed, not the number of batches).

### Throttling

Maintenance Tasks often modify a lot of data and can be taxing on your database.
Expand Down
12 changes: 11 additions & 1 deletion app/jobs/concerns/maintenance_tasks/task_job_concern.rb
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,15 @@ def build_enumerator(_run, cursor:)
collection_enum = case collection
when ActiveRecord::Relation
enumerator_builder.active_record_on_records(collection, cursor: cursor)
when ActiveRecord::Batches::BatchEnumerator
relation = collection.instance_variable_get(:@relation)
batch_size = collection.instance_variable_get(:@of)
enumerator_builder.active_record_on_batches(
relation,
cursor: cursor,
batch_size: batch_size,
as_relation: true
)
when Array
enumerator_builder.build_array_enumerator(collection, cursor: cursor)
when CSV
Expand All @@ -58,7 +67,8 @@ def build_enumerator(_run, cursor:)
def each_iteration(input, _run)
throw(:abort, :skip_complete_callbacks) if @run.stopping?
task_iteration(input)
@ticker.tick
batch_size = input.is_a?(ActiveRecord::Relation) ? input.size : 1
@ticker.tick(batch_size)
@run.reload_status
end

Expand Down
12 changes: 7 additions & 5 deletions app/models/maintenance_tasks/ticker.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,13 @@ def initialize(throttle_duration, &persist)
@ticks_recorded = 0
end

# Increments the tick count by one, and may persist the new value if the
# threshold duration has passed since initialization or the tick count was
# last persisted.
def tick
@ticks_recorded += 1
# Increments the tick count, which defaults to one. It may persist the new
# value if the threshold duration has passed since initialization or the
# tick count was last persisted.
#
# @param increment [Integer] amount to increment ticks by
def tick(increment = 1)
@ticks_recorded += increment
persist if persist?
end

Expand Down
16 changes: 16 additions & 0 deletions test/dummy/app/tasks/maintenance/update_posts_in_batches_task.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# frozen_string_literal: true
module Maintenance
class UpdatePostsInBatchesTask < MaintenanceTasks::Task
def collection
Post.in_batches(of: 5)
end

def count
collection.count
end

def process(batch_of_posts)
batch_of_posts.update_all(content: "New content added on #{Time.now.utc}")
end
end
end
13 changes: 13 additions & 0 deletions test/jobs/maintenance_tasks/task_job_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -355,5 +355,18 @@ class << self

assert_predicate run.reload, :succeeded?
end

test ".perform_now handles batch relation tasks" do
5.times do |i|
Post.create!(title: "Another Post ##{i}", content: "Content ##{i}")
end
# We expect 2 batches (7 posts => 5 + 2)
Maintenance::UpdatePostsInBatchesTask.any_instance.expects(:process).twice

run = Run.create!(task_name: "Maintenance::UpdatePostsInBatchesTask")
TaskJob.perform_now(run)

assert_equal Post.count, run.reload.tick_count
end
end
end
1 change: 1 addition & 0 deletions test/models/maintenance_tasks/task_data_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ class TaskDataTest < ActiveSupport::TestCase
"Maintenance::ErrorTask",
"Maintenance::ImportPostsTask",
"Maintenance::TestTask",
"Maintenance::UpdatePostsInBatchesTask",
"Maintenance::UpdatePostsTask",
"Maintenance::UpdatePostsThrottledTask",
]
Expand Down
1 change: 1 addition & 0 deletions test/system/maintenance_tasks/tasks_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class TasksTest < ApplicationSystemTestCase
"Maintenance::ErrorTask\nNew",
"Maintenance::ImportPostsTask\nNew",
"Maintenance::TestTask\nNew",
"Maintenance::UpdatePostsInBatchesTask\nNew",
"Maintenance::UpdatePostsThrottledTask\nNew",
"Completed Tasks",
"Maintenance::UpdatePostsTask\nSucceeded",
Expand Down
1 change: 1 addition & 0 deletions test/tasks/maintenance_tasks/task_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ class TaskTest < ActiveSupport::TestCase
"Maintenance::ErrorTask",
"Maintenance::ImportPostsTask",
"Maintenance::TestTask",
"Maintenance::UpdatePostsInBatchesTask",
"Maintenance::UpdatePostsTask",
"Maintenance::UpdatePostsThrottledTask",
]
Expand Down

0 comments on commit f39ef68

Please sign in to comment.