Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscribe to switch master message is necessary #24

Open
isaiah opened this issue Sep 10, 2013 · 4 comments
Open

Subscribe to switch master message is necessary #24

isaiah opened this issue Sep 10, 2013 · 4 comments

Comments

@isaiah
Copy link

isaiah commented Sep 10, 2013

If there is a false alarm, and the sentinels promote a slave to master even the master is still alive, the client will keep connected to the old master, which becomes slave as a result of the switch.

@flyerhzm Why was the code that subscribe to the +switch-master event removed and can we bring it back?

@flyerhzm
Copy link
Owner

@isaiah I forgot the reason, but you're right, I should add it back.

@sheldonh
Copy link

I just saw this with some testing. I added an action to my rack app, to test writability with a simple SET and GET. Then I spammed that action while a background process was sending "SENTINEL failover mymaster" to a sentinel every 20 seconds. Eventually, the SET operation fails.

It occurs to me that I could have the resilience I want without +switch-master event monitoring, if the redis-sentinel gem responded to failures as follows:

  • Issue the INFO command.
  • If it succeeds, and the output indicates that the instance is not master, rediscover current master and retry.

Rediscovery would probably enjoy a configurable delay.

What do you think?

@reist
Copy link

reist commented May 13, 2014

Hi, I just had it happen with a resque-only redis in a production environment with down-after-milliseconds that seems to have been set a bit too low, as sentinel failed it over randomly after 3 days of running with no problems.
This is a pretty serious issues - after the switch, all I could do was restart every process using redis in the cluster (which is most of them) and try to rescue missed jobs.

Is there any plan to track sentinel promotions again?

flyerhzm added a commit that referenced this issue May 17, 2014
@flyerhzm
Copy link
Owner

@isaiah @sheldonh @reist sorry for the late commit, finally I added "subscribe +switch-master" back again, please let me know if it works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants