Problem with releasing uniquejobs locks after timeout expires #169

davehartnoll · 2016-03-02T15:54:37Z

The redis/aquire_lock.lua script sets two keys in redis for each unique job; the first has an expiration time:

if redis.pcall('set', unique_key, job_id, 'nx', 'ex', expires) then
    redis.pcall('hsetnx', 'uniquejobs', job_id, unique_key)
    return 1
else
    return 0
end

The redis/release_lock.lua script contains this to delete the same two keys:

if redis.pcall('del', unique_key) then
    redis.pcall('hdel', 'uniquejobs', job_id)
    return 1
 end

However, if the job has already expired by the time this release_lock script is called then the first redis.pcall will return false and the 2nd one never gets executed. This causes the uniquejobs hash to keep some entries forever, and it just gets bigger and bigger, and in my case it was consuming all available memory and causing Sidekiq to reject all additional jobs even though the underlying queues were apparently empty.

One simple fix may be to always remove the key from the uniquejobs hash before testing whether the timed-out key can also be deleted, but I'll leave it to someone who understands the locking mechanism better to decide if it's good:

redis.pcall('hdel', 'uniquejobs', job_id)
if redis.pcall('del', unique_key) then
    return 1
 end

My workaround (to avoid having to change the library) is to add a configuration setting the increase the value of the default timeout:

SidekiqUniqueJobs.configure do |config|
  config.default_queue_lock_expiration = 24 * 60 * 60
end

P.S.: the correct spelling of 'aquire' is 'acquire'

The text was updated successfully, but these errors were encountered:

mhenrixon · 2016-03-02T15:56:41Z

Thanks for reporting! I never thought that part of the code would be an issue. I'll make sure both calls are always made.

mhenrixon · 2016-03-02T16:29:10Z

Fixed by c22a5a3

mathieujobin · 2016-03-04T20:36:57Z

thank you very much ! I was waiting for this

ecnalyr · 2016-03-04T20:45:41Z

+1

mathieujobin · 2016-03-05T00:06:10Z

this is unfortunately not solving our problem...

mhenrixon · 2016-03-05T06:34:41Z

Bah I know why, sorry about that I forgot about the check for the key to existence.

davehartnoll · 2016-03-05T11:57:49Z

The fix doesn't actually solve my problem either. Should this issue be reopened?

stephankaag · 2016-03-11T13:56:27Z

Any idea how to fix this @mhenrixon ?

mhenrixon closed this as completed Mar 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with releasing uniquejobs locks after timeout expires #169

Problem with releasing uniquejobs locks after timeout expires #169

davehartnoll commented Mar 2, 2016

mhenrixon commented Mar 2, 2016

mhenrixon commented Mar 2, 2016

mathieujobin commented Mar 4, 2016

ecnalyr commented Mar 4, 2016

mathieujobin commented Mar 5, 2016

mhenrixon commented Mar 5, 2016

davehartnoll commented Mar 5, 2016

stephankaag commented Mar 11, 2016

Problem with releasing uniquejobs locks after timeout expires #169

Problem with releasing uniquejobs locks after timeout expires #169

Comments

davehartnoll commented Mar 2, 2016

mhenrixon commented Mar 2, 2016

mhenrixon commented Mar 2, 2016

mathieujobin commented Mar 4, 2016

ecnalyr commented Mar 4, 2016

mathieujobin commented Mar 5, 2016

mhenrixon commented Mar 5, 2016

davehartnoll commented Mar 5, 2016

stephankaag commented Mar 11, 2016