Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique job needs to be unlocked manually? #261

Closed
lephyrius opened this issue Apr 2, 2018 · 19 comments
Closed

Unique job needs to be unlocked manually? #261

lephyrius opened this issue Apr 2, 2018 · 19 comments

Comments

@lephyrius
Copy link

I have started to see these messages in my log:
FATAL: the unique_key: "som unique key" needs to be unlocked manually

What does it mean? How do I unlock the job manually?

@tmlee
Copy link

tmlee commented Apr 21, 2018

Noticing this too

@adamof
Copy link

adamof commented Apr 23, 2018

We're seeing these too. Some more information would be great!

@rezonant
Copy link

Hello, for those of you seeing this message, understand that Sidekiq is not running those jobs. We just got bit by this.

@mhenrixon mhenrixon self-assigned this Jun 6, 2018
@mhenrixon
Copy link
Owner

mhenrixon commented Jun 7, 2018

Can you all provide the gem version you are using, what version of Sidekiq and which version of redis? @lephyrius @tmlee @adamof @rezonant

@rezonant
Copy link

Sidekiq 4.2.9, Sidekiq-unique-jobs 5.0.10, Sidekiq-scheduler 2.1.2, Rails 5.1.6 -- Redis is Elasticache on AWS, engine compat 3.2.4.

@mhenrixon
Copy link
Owner

Version 6 won't have this problem and it will take care of the upgrade automatically.

Keep an eye out on the README and CHANGELOG for more information about the changes.

@mhenrixon mhenrixon added this to the Version 6.0 milestone Jun 26, 2018
@smock
Copy link

smock commented Aug 8, 2018

This seems to still happen on version 6 when a worker process dies -- is that expected? What's the recommended way to handle?

@mhenrixon
Copy link
Owner

So there is something I still need to look into but it is hard to test. I don't think I have coverage for jobs coming from the scheduled and retry set in the form of an integration test. What that means is that I don't want to sound too sure about anything.

The way I intended the gem to work though was that when a worker process crashes and sidekiq hopefully puts the job back the lock would remain so that no other jobs will be allowed.

When the job is considered dead by sidekiq you could add a death handler to also delete the lock. See: https://github.com/mhenrixon/sidekiq-unique-jobs#cleanup-dead-locks

@smock
Copy link

smock commented Aug 8, 2018

I do have the handler set up, and it works perfectly for jobs which generate an exception. However, whenever I stop or start workers I see many messages like:

FATAL: the unique_key: "som unique key" needs to be unlocked manually

For further context, I don't retry jobs (but I do have retry set to 0 in sidekiq, not null, which triggers the death handler appropriately on exceptions), so I'm not sure that Sidekiq puts jobs back in this scenario.

@mhenrixon
Copy link
Owner

Thanks for the info @smock I'll add some coverage later today.

@smock
Copy link

smock commented Aug 13, 2018

Just wanted to check in on this, any updates?

@mhenrixon
Copy link
Owner

@smock I did add some coverage and couldn't find any problem. The error message is generated from here: https://github.com/mhenrixon/sidekiq-unique-jobs/blob/master/lib/sidekiq_unique_jobs/lock/base_lock.rb#L102

What it means is that when the server is executing the workers #perform method it receives the exception Sidekiq::Shutdown which means the job was not done executing when we had to abort. If the job is lost we could unlock but if the job is pushed back to the retry queue or job queue then we should keep the lock to avoid duplicates.

What I have not investigated yet is if the job is put back or not.

@adeubank
Copy link

@mhenrixon To add some more context to this, somehow the jobs are never re-queued when Sidekiq is shutting down. Hopefully this is helpful.

It seems that my job with lock: :until_and_while_executing doesn't get re-queued correctly if it's in the middle of processing and the server shuts down. The worker that does not have any unique options gets re-queued just fine.

Here is what I have on the MyWorkerClass.

sidekiq_options retry: true, lock: :until_and_while_executing, log_duplicate_payload: true
2018-08-13T22:52:58.274Z 1 TID-gpe208zth INFO: Shutting down
2018-08-13T22:52:58.274Z 1 TID-gpe208zth INFO: Terminating quiet workers
2018-08-13T22:52:58.274Z 1 TID-gpe2y4bwd INFO: Scheduler exiting...
2018-08-13T22:52:58.775Z 1 TID-gpe208zth INFO: Pausing to allow workers to finish...
2018-08-13T22:53:05.778Z 1 TID-gpe208zth WARN: Terminating 3 busy worker threads
2018-08-13T22:53:05.778Z 1 TID-gpe208zth WARN: Work still in progress [
#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:default", job="...">,
#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:default", job="...">,
#<struct Sidekiq::BasicFetch::UnitOfWork queue="queue:my_other_queue", job="">]
2018-08-13T22:53:05.780Z 1 TID-gpe208zth INFO: Pushed 3 jobs back to Redis
2018-08-13T22:53:05.781Z 1 TID-gpe2y465p MyWorkerClass JID-7f038e3738663e7bee45c6e1 FATAL: the unique_key: uniquejobs:unique_digest_1:RUN needs to be unlocked manually
2018-08-13T22:53:05.783Z 1 TID-gpe2y465p MyWorkerClass JID-7f038e3738663e7bee45c6e1 INFO: fail: 45.018 sec
2018-08-13T22:53:05.783Z 1 TID-gpe2q7aul MyWorkerClass JID-eaad6b51d0870d8d8488beb5 FATAL: the unique_key: uniquejobs:unique_digest_2:RUN needs to be unlocked manually
2018-08-13T22:53:05.783Z 1 TID-gpe2q7aul MyWorkerClass JID-eaad6b51d0870d8d8488beb5 INFO: fail: 60.75 sec
2018-08-13T22:53:05.785Z 1 TID-gpe208zth INFO: Bye!

Let me know how else I can help or if you can point me in the right direction. This gem saves me a lot of headaches and am interested in seeing this fixed.

@mhenrixon
Copy link
Owner

According to all that I can find the job should be pushed back on shutdown: https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/manager.rb#L128 I haven't confirmed what is happening exactly. I think it might be that the first lock needs removal but I would have to add some test cases for this.

Will let you know what I can find!

@tsauerwein
Copy link

I also had these errors today, when I made some existing queues unique. We are doing a rolling-update. So, while there were still some clients (without the unique configuration) pushing jobs, workers with the unique configuration were already trying to process these jobs. Once all clients and workers were running with the same configuration, the error did not pop up anymore.

I don't know if it's related to the rolling-update having clients/workers with different configurations or to the workers shutting down.

@mhenrixon
Copy link
Owner

I need to improve the logging...

@mhenrixon
Copy link
Owner

It isn't an error. I am changing this to warning level because in most cases it is just a warning (sidekiq pushes the job back to the queue).

@adeubank
Copy link

Just wanted to loop back around to this as I probably confused some of the options that are available in this gem with my specific worker requirements.

I have a specific worker that actually needed to be unique until it's executing (:until_executing) but only allow one to execute at a time. Until executing fails my specific requirement of only one executing at time. If my application enqueues another job, I only need one of them to run. Using the strategy below, sidekiq is able to enqueue the job (even when shutting down) since the conflict strategy doesn't care if there is a job with the same unique digest in the queues.

My fix was to add a runtime lock using the redis-lock gem. The finished class looks something like this.

class HardWorker
  include Sidekiq::Worker
  sidekiq_options retry:                 true,
                  lock:                  :until_executing,
                  on_conflict:           :replace,
                  log_duplicate_payload: true


  def perform(params)
    client = Redis.new
    client.lock("my_hard_worker_lock") do |lock|
      puts 'Doing hard work!'
      sleep 5
    end
  rescue Redis::Lock::LockNotAcquired
    # Try again later replacing any job in the queue
    new_job_id = HardWorker.perform_in(rand(10..1.minutes), params)
    puts "Could not grab the lock. Enqueued again. new_job_id=#{new_job_id}"
  end
end

Hopefully I didn't confuse people more or miss something blatantly obvious.

@mhenrixon
Copy link
Owner

@adeubank you don't need the extra gem. All you need to do is change your lock type to lock: :until_and_while_executing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants