Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash handling #14

Closed
cpuguy83 opened this issue May 31, 2013 · 21 comments
Closed

Crash handling #14

cpuguy83 opened this issue May 31, 2013 · 21 comments

Comments

@cpuguy83
Copy link

Currently Unique Jobs does not handle crashes. So if a sidekiq worker crashes whatever it was doing is lost (except for Sidekiq Pro). When using a worker with unique those unique job keys persist in Redis with no way of clearing them out (short of deleting them manually).

@mhenrixon
Copy link
Owner

I've been thinking about this, what you bring up is of course not ideal but I'm not sure it's possible for us to handle it. Have you configured retry on those jobs?

@cpuguy83
Copy link
Author

Retry is worse. When a job is in the retry status it can also subsequently be rescheduled normally, as if there was no unique key.

@kfalconer
Copy link

@cpuguy83 In regards to the retry queue, I wrote some code which will check if the job is already in that queue.

  def is_retried?(*args)
    # unique jobs will not check if the item is in the retry queue
    retries = Sidekiq::RetrySet.new

    # check if the job exists
    # the second comparison compares each of the arguments builing a list of boolean results, then it checks that all of the results are true
    retries.any? { |retri| retri.klass == self.class.name && args.enum_for(:each_with_index).collect {|item, index| item == retri.args[index]}.all? {|result| result == true }  }
  end

The code is not pretty, but it seems to work. I am looking for a more robust solution, and will probably look to encapsulate uniqueness within the application.

@sheerun
Copy link
Contributor

sheerun commented Jan 7, 2014

When I delete queue with jobs pending, they cannot be added again, probably because of unique-jobs.. Why unique-jobs doesn't use sidekiq entries to determine uniqueness?

@sheerun
Copy link
Contributor

sheerun commented Jan 7, 2014

Quick fix:

Redis.current.keys.select { |k| k.include?("sidekiq_unique") }.each { |k| Redis.current.del(k) }

@cpuguy83
Copy link
Author

cpuguy83 commented Jan 7, 2014

I think this could be resolved by instead of using unique keys use the new redis scan feature.

Brian Goff

On Tue, Jan 7, 2014 at 9:18 AM, Adam Stankiewicz [email protected]
wrote:

Quick fix:

Redis.current.keys.select { |k| k.include?("sidekiq_unique") }.each { |k| Redis.current.del(k) }

Reply to this email directly or view it on GitHub:
#14 (comment)

@mhenrixon
Copy link
Owner

Thanks guys I'll have a look at redis scan and see if we can handle failures and deleted queues better.

@sahin
Copy link

sahin commented Aug 9, 2014

@marclennox hi, is this still an open issue?

@marclennox
Copy link
Contributor

I don't think this was one I was involved in.

@ksaveras
Copy link

+1 Waiting for this solution

@didil
Copy link

didil commented Nov 6, 2014

+1

@vincentwoo
Copy link

@mhenrixon @cpuguy83 do you guys know if this is still an issue?

@mhenrixon
Copy link
Owner

our workers crash every now and then but we are not experiencing this issue. We did a lot of work this fall to make sure that unique arguments are cleared as the jobs are deleted in all queues so it should at least be less of a problem.

@vincentwoo
Copy link

Is closing the issue justified? In either case, I think you should maybe post a comment (or update the readme) about the protections sidekiq-unique-jobs has in place (are they atomic in redis? does sidekiq-unique-jobs poll for staleness?).

It'd certainly help with my peace of mind, and would help a bunch of people make an informed decision. @mperham has been staunch in keeping uniqueness out of sidekiq core, so it'd be good to know how you've addressed what he seees as a very difficult problem.

@cpuguy83
Copy link
Author

I honestly don't maintain the app I was using this for anymore, so I couldn't say if it works now or not.

@draffensperger
Copy link

I work on a Rails app that uses Sidekiq pro and we have run into issues with our jobs not being unique when a Sidekiq pro process crashes and it pulls from the reliable fetch queue. We have also had problems with jobs not being unique on retries. We ended up using some code similar to the @kfalconer code above that checks for retried jobs but we also check for older actively running jobs. Here's a gist of the class we made: https://gist.github.com/draffensperger/a90e54426be60769f621 (we just call return if duplicate_job?(args) in our workers that include it). Job uniqueness under crashes and retries is helpful to us and I'm open to collaborating to get it into siekiq-unique-jobs. @mhenrixon, how complex do you think it would be to get uniqueness on crash restarts and retries and are you open to having it added in?

@mhenrixon
Copy link
Owner

I am definitely open to having a duplicate? check as part of uniqueness I think the code you gisted could be pretty much copied into unique jobs. Pull requests much appreciated :)

@draffensperger
Copy link

Sounds good. I'll try to do a PR for it in the next couple weeks.

@mhenrixon
Copy link
Owner

This should be fixed. If not open a new issue with a failing test.

@draffensperger
Copy link

@mhenrixon thanks for fixing it, and sorry that I didn't get to the PR... I was meaning to and got started at one point and then got caught up with other things.

@salmanasiddiqui
Copy link

I think I have reproduce this issue on
sidekiq (3.5.4)
sidekiq-unique-jobs (4.0.18)

my worker:

  sidekiq_options retry: false,
                  unique: :until_executed,
                  unique_args: -> (args) { [args.first] }

STR:

  1. queue any long running job
  2. run sidekiq process
  3. kill it

Now we can't enqueue the same job for next 30 mins. Can we configure this 30 mins expiry time somehow?

Is this 30 mins expiry is the solution to the actual problem of not handling crashes at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests