Crash handling #14

cpuguy83 · 2013-05-31T16:04:45Z

Currently Unique Jobs does not handle crashes. So if a sidekiq worker crashes whatever it was doing is lost (except for Sidekiq Pro). When using a worker with unique those unique job keys persist in Redis with no way of clearing them out (short of deleting them manually).

mhenrixon · 2013-06-21T05:54:06Z

I've been thinking about this, what you bring up is of course not ideal but I'm not sure it's possible for us to handle it. Have you configured retry on those jobs?

cpuguy83 · 2013-06-21T11:29:12Z

Retry is worse. When a job is in the retry status it can also subsequently be rescheduled normally, as if there was no unique key.

kfalconer · 2013-07-07T23:43:43Z

@cpuguy83 In regards to the retry queue, I wrote some code which will check if the job is already in that queue.

  def is_retried?(*args)
    # unique jobs will not check if the item is in the retry queue
    retries = Sidekiq::RetrySet.new

    # check if the job exists
    # the second comparison compares each of the arguments builing a list of boolean results, then it checks that all of the results are true
    retries.any? { |retri| retri.klass == self.class.name && args.enum_for(:each_with_index).collect {|item, index| item == retri.args[index]}.all? {|result| result == true }  }
  end

The code is not pretty, but it seems to work. I am looking for a more robust solution, and will probably look to encapsulate uniqueness within the application.

sheerun · 2014-01-07T14:14:39Z

When I delete queue with jobs pending, they cannot be added again, probably because of unique-jobs.. Why unique-jobs doesn't use sidekiq entries to determine uniqueness?

sheerun · 2014-01-07T14:18:39Z

Quick fix:

Redis.current.keys.select { |k| k.include?("sidekiq_unique") }.each { |k| Redis.current.del(k) }

cpuguy83 · 2014-01-07T14:22:05Z

I think this could be resolved by instead of using unique keys use the new redis scan feature.

Brian Goff

On Tue, Jan 7, 2014 at 9:18 AM, Adam Stankiewicz [email protected]
wrote:

Quick fix:
Redis.current.keys.select { |k| k.include?("sidekiq_unique") }.each { |k| Redis.current.del(k) }
Reply to this email directly or view it on GitHub:
#14 (comment)

mhenrixon · 2014-01-07T14:24:14Z

Thanks guys I'll have a look at redis scan and see if we can handle failures and deleted queues better.

sahin · 2014-08-09T00:44:06Z

@marclennox hi, is this still an open issue?

marclennox · 2014-08-10T16:17:11Z

I don't think this was one I was involved in.

ksaveras · 2014-09-17T08:44:09Z

+1 Waiting for this solution

didil · 2014-11-06T01:31:50Z

+1

vincentwoo · 2014-12-22T10:50:37Z

@mhenrixon @cpuguy83 do you guys know if this is still an issue?

mhenrixon · 2014-12-22T10:59:21Z

our workers crash every now and then but we are not experiencing this issue. We did a lot of work this fall to make sure that unique arguments are cleared as the jobs are deleted in all queues so it should at least be less of a problem.

vincentwoo · 2014-12-22T11:15:50Z

Is closing the issue justified? In either case, I think you should maybe post a comment (or update the readme) about the protections sidekiq-unique-jobs has in place (are they atomic in redis? does sidekiq-unique-jobs poll for staleness?).

It'd certainly help with my peace of mind, and would help a bunch of people make an informed decision. @mperham has been staunch in keeping uniqueness out of sidekiq core, so it'd be good to know how you've addressed what he seees as a very difficult problem.

cpuguy83 · 2014-12-22T13:51:54Z

I honestly don't maintain the app I was using this for anymore, so I couldn't say if it works now or not.

draffensperger · 2015-03-31T16:50:42Z

I work on a Rails app that uses Sidekiq pro and we have run into issues with our jobs not being unique when a Sidekiq pro process crashes and it pulls from the reliable fetch queue. We have also had problems with jobs not being unique on retries. We ended up using some code similar to the @kfalconer code above that checks for retried jobs but we also check for older actively running jobs. Here's a gist of the class we made: https://gist.github.com/draffensperger/a90e54426be60769f621 (we just call return if duplicate_job?(args) in our workers that include it). Job uniqueness under crashes and retries is helpful to us and I'm open to collaborating to get it into siekiq-unique-jobs. @mhenrixon, how complex do you think it would be to get uniqueness on crash restarts and retries and are you open to having it added in?

mhenrixon · 2015-04-01T07:02:56Z

I am definitely open to having a duplicate? check as part of uniqueness I think the code you gisted could be pretty much copied into unique jobs. Pull requests much appreciated :)

draffensperger · 2015-04-06T17:10:16Z

Sounds good. I'll try to do a PR for it in the next couple weeks.

mhenrixon · 2015-07-31T18:42:34Z

This should be fixed. If not open a new issue with a failing test.

draffensperger · 2015-07-31T19:54:48Z

@mhenrixon thanks for fixing it, and sorry that I didn't get to the PR... I was meaning to and got started at one point and then got caught up with other things.

salmanasiddiqui · 2016-12-01T13:31:50Z

I think I have reproduce this issue on
sidekiq (3.5.4)
sidekiq-unique-jobs (4.0.18)

my worker:

  sidekiq_options retry: false,
                  unique: :until_executed,
                  unique_args: -> (args) { [args.first] }

STR:

queue any long running job
run sidekiq process
kill it

Now we can't enqueue the same job for next 30 mins. Can we configure this 30 mins expiry time somehow?

Is this 30 mins expiry is the solution to the actual problem of not handling crashes at all?

mhenrixon closed this as completed Jul 31, 2015

mhenrixon mentioned this issue Jan 10, 2019

Automatic unlock of jobs #362

Closed

mhoncharov mentioned this issue Apr 3, 2020

SIGKILL handling feature #486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash handling #14

Crash handling #14

cpuguy83 commented May 31, 2013

mhenrixon commented Jun 21, 2013

cpuguy83 commented Jun 21, 2013

kfalconer commented Jul 7, 2013

sheerun commented Jan 7, 2014

sheerun commented Jan 7, 2014

cpuguy83 commented Jan 7, 2014

mhenrixon commented Jan 7, 2014

sahin commented Aug 9, 2014

marclennox commented Aug 10, 2014

ksaveras commented Sep 17, 2014

didil commented Nov 6, 2014

vincentwoo commented Dec 22, 2014

mhenrixon commented Dec 22, 2014

vincentwoo commented Dec 22, 2014

cpuguy83 commented Dec 22, 2014

draffensperger commented Mar 31, 2015

mhenrixon commented Apr 1, 2015

draffensperger commented Apr 6, 2015

mhenrixon commented Jul 31, 2015

draffensperger commented Jul 31, 2015

salmanasiddiqui commented Dec 1, 2016

Crash handling #14

Crash handling #14

Comments

cpuguy83 commented May 31, 2013

mhenrixon commented Jun 21, 2013

cpuguy83 commented Jun 21, 2013

kfalconer commented Jul 7, 2013

sheerun commented Jan 7, 2014

sheerun commented Jan 7, 2014

cpuguy83 commented Jan 7, 2014

mhenrixon commented Jan 7, 2014

sahin commented Aug 9, 2014

marclennox commented Aug 10, 2014

ksaveras commented Sep 17, 2014

didil commented Nov 6, 2014

vincentwoo commented Dec 22, 2014

mhenrixon commented Dec 22, 2014

vincentwoo commented Dec 22, 2014

cpuguy83 commented Dec 22, 2014

draffensperger commented Mar 31, 2015

mhenrixon commented Apr 1, 2015

draffensperger commented Apr 6, 2015

mhenrixon commented Jul 31, 2015

draffensperger commented Jul 31, 2015

salmanasiddiqui commented Dec 1, 2016