Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More aggressive cleanup of failed delayed jobs #3346

Conversation

kathap
Copy link
Contributor

@kathap kathap commented Jul 11, 2023

The idea was to have an additional absolute limit of failed jobs so that they get deleted even before the 14d failed_jobs.cutoff_age_in_days. The limit should be configurable.

Therefore we implement a new parameter for maximum number of failed delayed_jobs that should be in the delayed_jobs table. If the number gets exceeded, entries will be deleted keeping the ones with the higher ids. The failed_jobs cleanup job will now run more often than once per day.

Currently, failed delayed_jobs are deleted after 14d (configurable) to keep some info about failed jobs that helps debugging: https://github.com/cloudfoundry/cloud_controller_ng/blob/main/app/jobs/runtime/failed_jobs_cleanup.rb.
This can still lead to very large number of delayed_jobs records that slow down DB queries working on delayed_jobs (also addressed by an index, ccng #3324).

Related to cloudfoundry/capi-release#328

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

@kathap kathap marked this pull request as draft July 11, 2023 14:21
@kathap kathap force-pushed the more-aggressive-cleanup-of-failed-delayed-jobs branch 6 times, most recently from e1d545a to d6e61e1 Compare July 19, 2023 15:03
@kathap kathap force-pushed the more-aggressive-cleanup-of-failed-delayed-jobs branch 4 times, most recently from aa6a656 to ec136ff Compare July 25, 2023 17:13
Currently, failed delayed_jobs are deleted after 14d (configurable) to keep some info about failed jobs that helps debugging: https://github.com/cloudfoundry/cloud_controller_ng/blob/main/app/jobs/runtime/failed_jobs_cleanup.rb

This can still lead to very large number of delayed_jobs records that slow down DB queries working on delayed_jobs (also addressed by an index, ccng cloudfoundry#3324). Idea was to have an additional absolute limit of failed jobs so that they get deleted even before the 14d failed_jobs.cutoff_age_in_days. The limit should be configurable.

change the start_frequent_jobs method to always use all configured parameters except frequency_in_seconds, change expiration_in_seconds from positional to keyword parameter
@kathap kathap force-pushed the more-aggressive-cleanup-of-failed-delayed-jobs branch from e6d929b to 68ab9b0 Compare July 27, 2023 07:11
@kathap kathap marked this pull request as ready for review July 27, 2023 10:31
app/jobs/runtime/failed_jobs_cleanup.rb Outdated Show resolved Hide resolved
app/jobs/runtime/failed_jobs_cleanup.rb Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants