Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConnectionPool used incorrectly - causes deadlocks #66

Closed
adstage-david opened this issue Dec 12, 2014 · 0 comments
Closed

ConnectionPool used incorrectly - causes deadlocks #66

adstage-david opened this issue Dec 12, 2014 · 0 comments

Comments

@adstage-david
Copy link

Been trying to hunt down some mysteriously stalling dynos on our heroku app, and have traced back the source of our woes:

https://github.com/mhenrixon/sidekiq-unique-jobs/blob/master/lib/sidekiq_unique_jobs/connectors/sidekiq_redis.rb#L5

The redis connector classes should not be returning the connections for use outside the #with or #redis blocks - that block is used to guarantee exclusive access to the connection and prevent other threads from touching the connection while it's working. This would blow up a lot more massively, but the redis connections themselves are intended to be thread safe, so the bugs end up being a lot more subtle:

  1. The deadlocks I've been hunting down:
    https://github.com/mhenrixon/sidekiq-unique-jobs/blob/master/lib/sidekiq_unique_jobs/middleware/client/strategies/unique.rb#L49-L51 the connection the multi is started on will not necessarily be the same one setex is called on. To be thread safe, redis throws a separate mutex around multi blocks (https://github.com/redis/redis-rb/blob/master/lib/redis.rb#L2147) and every other command as well - so it's possible you try to call setex on a connection that's currently locked for a multi, and then inside that multi locks on the connection waiting for the setex to unlock.
  2. Potential race conditions allowing jobs to be added multiple times - watch needs to be called on the same connection you call multi on, but since #conn is potentially a different connection from the pool every time, there's no guarantee this happens.

There might be other issues stemming from this as well. I was able to reproduce the error case we're seeing with the following script: https://gist.github.com/adstage-david/d1057fb6e4b1a676cce4

mhenrixon added a commit that referenced this issue Dec 15, 2014
Use ConnectionPool blocks to ensure exclusive connection. Closes #66.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant