Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch active job queue adapter #478

Merged
merged 9 commits into from
Sep 30, 2023
Merged

Switch active job queue adapter #478

merged 9 commits into from
Sep 30, 2023

Conversation

robbevp
Copy link
Member

@robbevp robbevp commented Sep 9, 2023

Fixes #473

This PR switches the active job backend from DelayedJob to GoodJob. DelayedJob will be removed later, since we don't have a migration planned for the jobs that are currently queued in a production environment.

I've created explicit job classes for all things which currently got scheduled with delay(). Since these classes only exist to call a method on the model, I didn't add tests for these. (I can add them, but I wouldn't want to test the actual model logic there, so they would simply test whether a method exists).

In some cases I think it could make sense to move more code to the job class, but I haven't done this for now. (For CalculateContentLengthJob and ConvertTranscodeJob most - if not all - of the model code could probably also exist in the job class.)

My last two commits are optional and can be removed if we don't want those. State before those two commits
My reasoning for them:

035f7b3 - Queue structure

I renamed/reorganised these queue to time based queues. We switch to a fixed number of queues, based on the maximum time a job should allowed to be enqueued for, before being picked up by a worker. This makes the ranking and/or priority a lot more explicit, without going done to minute details like priority per queue leads to. The exact numbers don't matter that much (and could be changed), but I went for:

  • within_30_seconds
  • within_5_minutes
  • within_30_minutes
  • within_24_hours

A deployment can then configure their workers, so that none of these queues ever overflow and don't respect their time limits. (This can be done with different workers, or with a thread limit within one worker, see: https://github.com/bensheldon/good_job#optimize-queues-threads-and-processes)
This approach is based on Nate Berkopec's advice about running Sidekiq, but also applies here.

fd74c89 - Mount engine

Good job includes a decent dashboard to monitor the jobs and processes/workers. I use this dashboard quite a bit at work (to monitor a bit and to retry failed jobs when relevant), but I don't know if we would want this in accentor.
I've currently mounted it at /good_job, but wasn't sure if we would want that.
Two considerations:

  • We need to re-add some middleware to make this work
  • We need to roll some kind of authentication. Maybe basic auth with nginx is good enough here?

We could also make this an option (either in rails or in the nixos module), but I'd hesitate adding the complexity if no-one ends up using this.

@robbevp robbevp requested a review from chvp September 9, 2023 18:21
@robbevp robbevp self-assigned this Sep 9, 2023
@robbevp robbevp added the chore Repository or build maintenance label Sep 9, 2023
@chvp chvp changed the title Chore/switch active job backend Switch active job backend Sep 9, 2023
@robbevp robbevp force-pushed the chore/switch-active-job-backend branch from e7dda9d to fd74c89 Compare September 10, 2023 07:00
@robbevp
Copy link
Member Author

robbevp commented Sep 10, 2023

Fixed the failing test and updated my comment to refer to the right commits

@robbevp robbevp changed the title Switch active job backend Switch active job queue adapter Sep 10, 2023
This was referenced Sep 10, 2023
@chvp
Copy link
Member

chvp commented Sep 10, 2023

I'd not add the dashboard. As you mentioned, it needs auth and a bunch of middleware we currently don't have. If at any point in the future we notice missing the dashboard we can still add it.

@robbevp robbevp force-pushed the chore/switch-active-job-backend branch from fd74c89 to 035f7b3 Compare September 10, 2023 11:57
@robbevp
Copy link
Member Author

robbevp commented Sep 10, 2023

I'd not add the dashboard. As you mentioned, it needs auth and a bunch of middleware we currently don't have. If at any point in the future we notice missing the dashboard we can still add it.

Dropped this commit

Copy link
Member

@chvp chvp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a matching PR on the flake repo to change the way the runner is executed? Or I can do it as well, if you don't want to.

@@ -22,7 +22,7 @@ def self.destroy_all_and_recalculate
next unless Rails.application.config.recalculate_content_length_if.call af

CodecConversion.find_each do |cc|
af.delay(queue: :content_lengths_backlog).calc_audio_length(cc)
CalculateContentLengthJob.set(queue: :within_30_minutes).perform_later(af, cc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's entirely infeasible to recalculate all content lengths (that qualify) within 30 minutes. This happens very rarely, but if it does it takes at least a few days. Is it possible to create a "no deadline" queue where these jobs can be scheduled to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've renamed :within_24_hours to :whenever (as I don't think we need the distinction between those two, and the latter is more explicit).

Due to this change, I've put the TranscodeCacheCleanJob in :within_30_minutes as it should be possible for this job to be picked up rather quickly.

Copy link
Member

@chvp chvp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since (as far as I can tell at least), the priority of the queues is not defined in-repo anymore, we should at least mention them in the setup instructions I think?

app/jobs/transcode_cache_clean_job.rb Outdated Show resolved Hide resolved
@robbevp
Copy link
Member Author

robbevp commented Sep 17, 2023

Since (as far as I can tell at least), the priority of the queues is not defined in-repo anymore, we should at least mention them in the setup instructions I think?

Good suggestion. We didn't have anything about starting/managing background workers, but I've added a section.

Note: I've also changed Rails' default queue to be within_5_minutes (AFAIK this should only be ActiveStorage::AnalyseJoband ActiveStorage::PurgeJob for us - if need be, we can specify this per job in the config docs)

@robbevp robbevp force-pushed the chore/switch-active-job-backend branch from 95b680a to 11bd944 Compare September 17, 2023 07:03
@robbevp robbevp requested a review from chvp September 17, 2023 07:13
@chvp chvp merged commit 934ad47 into main Sep 30, 2023
5 checks passed
@chvp chvp deleted the chore/switch-active-job-backend branch September 30, 2023 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Repository or build maintenance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch active job backend to GoodJob
2 participants