You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The redis connector classes should not be returning the connections for use outside the #with or #redis blocks - that block is used to guarantee exclusive access to the connection and prevent other threads from touching the connection while it's working. This would blow up a lot more massively, but the redis connections themselves are intended to be thread safe, so the bugs end up being a lot more subtle:
Potential race conditions allowing jobs to be added multiple times - watch needs to be called on the same connection you call multi on, but since #conn is potentially a different connection from the pool every time, there's no guarantee this happens.
Been trying to hunt down some mysteriously stalling dynos on our heroku app, and have traced back the source of our woes:
https://github.com/mhenrixon/sidekiq-unique-jobs/blob/master/lib/sidekiq_unique_jobs/connectors/sidekiq_redis.rb#L5
The redis connector classes should not be returning the connections for use outside the
#with
or#redis
blocks - that block is used to guarantee exclusive access to the connection and prevent other threads from touching the connection while it's working. This would blow up a lot more massively, but the redis connections themselves are intended to be thread safe, so the bugs end up being a lot more subtle:https://github.com/mhenrixon/sidekiq-unique-jobs/blob/master/lib/sidekiq_unique_jobs/middleware/client/strategies/unique.rb#L49-L51 the connection the multi is started on will not necessarily be the same one setex is called on. To be thread safe, redis throws a separate mutex around multi blocks (https://github.com/redis/redis-rb/blob/master/lib/redis.rb#L2147) and every other command as well - so it's possible you try to call setex on a connection that's currently locked for a multi, and then inside that multi locks on the connection waiting for the setex to unlock.
#conn
is potentially a different connection from the pool every time, there's no guarantee this happens.There might be other issues stemming from this as well. I was able to reproduce the error case we're seeing with the following script: https://gist.github.com/adstage-david/d1057fb6e4b1a676cce4
The text was updated successfully, but these errors were encountered: