-
-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lock not getting properly cleared for some jobs #560
Comments
If you change your worker configuration to something like: SidekiqUniqueJobs.config.lock_info = true
module InboxManagement
module Job
class ProcessSendQueue
include Sidekiq::Worker
sidekiq_options lock: :until_executed, lock_info: true, lock_prefix: :psq, on_conflict: :log
end
end
end There should be a lock_info page that you can check what is going on with the locks: From there you should be able to provide some more details. The |
Further info, there was a fix in |
Thanks for the suggestions, @mhenrixon. Settings are: Sidekiq.configure_server do |config|
config.death_handlers << lambda { |job, ex|
puts "*** DEAD JOB *** #{job['class']} #{job['jid']} just died with error #{ex.message}."
SidekiqUniqueJobs::Digests.del(digest: job['unique_digest']) if job['unique_digest']
}
end |
For v7 it is |
As I commented here: #524 (comment) you can provide a |
#571 |
Thanks, @mhenrixon! For now, we fixed the death handler config and jobs seem to be running smoothly, so we'll stick to |
Describe the bug
The locks for some jobs were not removed after completing the job. This lead to new jobs getting rejected despite being unique with no other similar jobs in the queue.
This problem only affected some jobs immediately after we did a release adding the lock on the worker. We suspect the bug is related to the fact that we had many duplicate jobs scheduled and enqueued at the time of the release.
Expected behavior
No new jobs should not be rejected when they are unique.
Having duplicate jobs scheduled/enqueued already when introducing the lock should not cause locks to get stuck. Alternatively, there should be a warning in the documentation if this is a known problem.
Current behavior
The situation is similar to #379.
We have a CRON running every 10 minutes, as well as other processes in the app, scheduling jobs like so:
The job also re-enqueues itself like so:
As explained above, we noticed a number of jobs had not run for hours after introducing the lock. This was critical so we didn't have much time to debug further. We took the following steps to solve the immediate problem:
We rolled back the release, manually removed
uniquejobs
Redis keys, and released again.Jobs seem to be running fine for now. However we're not 100% sure what caused the bug and definitely want to prevent jobs from getting stuck this way again.
Worker class
Additional context
We also found many warnings like these in the server logs right after the release:
One of the jobs affected by the bug is listed there,
uniquejobs:1c3f9b4b86ec4a62d650a48096f90d97
. We found it odd that these warnings do not have the corresponding INFO log saying job will be skipped, and that the lock typeuntil_executed
is not displayed before the digest.The text was updated successfully, but these errors were encountered: