-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection handling improvements. #66
Changes from all commits
259b540
c7fea67
2e43b9e
ac17ae0
af4054e
f570399
88dfab5
df4ff01
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,3 +17,4 @@ test/version_tmp | |
tmp | ||
tags | ||
.DS_Store | ||
vendor/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,6 +51,7 @@ def call(command, &block) | |
# @option options [String] :password password for redis nodes | ||
# @option options [String] :db database to use for redis nodes | ||
# @option options [String] :namespace namespace for redis nodes | ||
# @option options [String] :trace_id trace string tag logged for client debugging | ||
# @option options [Logger] :logger logger override | ||
# @option options [Boolean] :retry_failure indicates if failures are retried | ||
# @option options [Integer] :max_retries max retries for a failure | ||
|
@@ -61,6 +62,7 @@ def call(command, &block) | |
# @return [RedisFailover::Client] | ||
def initialize(options = {}) | ||
Util.logger = options[:logger] if options[:logger] | ||
@trace_id = options[:trace_id] | ||
@master = nil | ||
@slaves = [] | ||
@node_addresses = {} | ||
|
@@ -130,7 +132,7 @@ def respond_to_missing?(method, include_private) | |
|
||
# @return [String] a string representation of the client | ||
def inspect | ||
"#<RedisFailover::Client (db: #{@db.to_i}, master: #{master_name}, slaves: #{slave_names})>" | ||
"#<RedisFailover::Client [#{@trace_id}] (db: #{@db.to_i}, master: #{master_name}, slaves: #{slave_names})>" | ||
end | ||
alias_method :to_s, :inspect | ||
|
||
|
@@ -157,13 +159,12 @@ def shutdown | |
purge_clients | ||
end | ||
|
||
# Reconnect will first perform a shutdown of the underlying redis clients. | ||
# Next, it attempts to reopen the ZooKeeper client and re-create the redis | ||
# clients after it fetches the most up-to-date list from ZooKeeper. | ||
# Reconnect method needed for compatibility with 3rd party libs that expect this for redis client objects. | ||
def reconnect | ||
purge_clients | ||
@zk ? @zk.reopen : setup_zk | ||
build_clients | ||
#NOTE: Explicit/manual reconnects are no longer needed or desired, and | ||
#triggered kernel mutex deadlocks in forking env (unicorn & resque) [ruby 1.9] | ||
#Resque automatically calls this method on job fork. | ||
#We now auto-detect underlying zk & redis client InheritedError's and reconnect automatically as needed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no no no, do not do this please, for the love of god. we've fixed the deadlocks. don't do a buggy workaround. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This PR does not change behavior wrt use (or not) of ZK.install_fork_hook |
||
end | ||
|
||
# Retrieves the current redis master. | ||
|
@@ -235,16 +236,21 @@ def dispatch(method, *args, &block) | |
verify_supported!(method) | ||
tries = 0 | ||
begin | ||
client_for(method).send(method, *args, &block) | ||
redis = client_for(method) | ||
redis.send(method, *args, &block) | ||
rescue ::Redis::InheritedError => ex | ||
logger.debug( "Caught #{ex.class} - reconnecting [#{@trace_id}] #{redis.inspect}" ) | ||
redis.client.reconnect | ||
retry | ||
rescue *CONNECTIVITY_ERRORS => ex | ||
logger.error("Error while handling `#{method}` - #{ex.inspect}") | ||
logger.error(ex.backtrace.join("\n")) | ||
|
||
if tries < @max_retries | ||
tries += 1 | ||
free_client | ||
build_clients | ||
sleep(RETRY_WAIT_TIME) | ||
build_clients | ||
retry | ||
end | ||
raise | ||
|
@@ -288,7 +294,7 @@ def build_clients | |
return unless nodes_changed?(nodes) | ||
|
||
purge_clients | ||
logger.info("Building new clients for nodes #{nodes.inspect}") | ||
logger.info("Building new clients for nodes [#{@trace_id}] #{nodes.inspect}") | ||
new_master = new_clients_for(nodes[:master]).first if nodes[:master] | ||
new_slaves = new_clients_for(*nodes[:slaves]) | ||
@master = new_master | ||
|
@@ -320,19 +326,37 @@ def should_notify? | |
# | ||
# @return [Hash] the known master/slave redis servers | ||
def fetch_nodes | ||
data = @zk.get(redis_nodes_path, :watch => true).first | ||
nodes = symbolize_keys(decode(data)) | ||
logger.debug("Fetched nodes: #{nodes.inspect}") | ||
tries = 0 | ||
begin | ||
data = @zk.get(redis_nodes_path, :watch => true).first | ||
nodes = symbolize_keys(decode(data)) | ||
logger.debug("Fetched nodes: #{nodes.inspect}") | ||
nodes | ||
rescue Zookeeper::Exceptions::InheritedConnectionError, ZK::Exceptions::InterruptedSession => ex | ||
logger.debug { "Caught #{ex.class} '#{ex.message}' - reopening ZK client [#{@trace_id}]" } | ||
sleep 1 if ex.kind_of?(ZK::Exceptions::InterruptedSession) | ||
@zk.reopen | ||
retry | ||
rescue *ZK_ERRORS => ex | ||
logger.error { "Caught #{ex.class} '#{ex.message}' - retrying ... [#{@trace_id}]" } | ||
sleep(RETRY_WAIT_TIME) | ||
|
||
nodes | ||
rescue Zookeeper::Exceptions::InheritedConnectionError, ZK::Exceptions::InterruptedSession => ex | ||
logger.debug { "Caught #{ex.class} '#{ex.message}' - reopening ZK client" } | ||
@zk.reopen | ||
retry | ||
rescue *ZK_ERRORS => ex | ||
logger.warn { "Caught #{ex.class} '#{ex.message}' - retrying" } | ||
sleep(RETRY_WAIT_TIME) | ||
retry | ||
if tries < @max_retries | ||
tries += 1 | ||
retry | ||
elsif tries < (@max_retries * 2) | ||
tries += 1 | ||
logger.error { "Hmmm, more than [#{@max_retries}] retries: reopening ZK client [#{@trace_id}]" } | ||
@zk.reopen | ||
retry | ||
else | ||
tries = 0 | ||
logger.error { "Oops, more than [#{@max_retries * 2}] retries: establishing fresh ZK client [#{@trace_id}]" } | ||
@zk.close! | ||
setup_zk | ||
retry | ||
end | ||
end | ||
end | ||
|
||
# Builds new Redis clients for the specified nodes. | ||
|
@@ -434,7 +458,7 @@ def disconnect(*redis_clients) | |
# Disconnects current redis clients. | ||
def purge_clients | ||
@lock.synchronize do | ||
logger.info("Purging current redis clients") | ||
logger.info("Purging current redis clients [#{@trace_id}]") | ||
disconnect(@master, *@slaves) | ||
@master = nil | ||
@slaves = [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What version of zk and zookeeper were you using?
In the past couple months I made a lot of fixes that improved some of these situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and not calling the fork hooks will likely lead to segfaults (which is why this stuff is here in the first place).
The issue is that the apache zookeeper C library does not provide a way to say, "cleanup this zookeeper connection without issuing a close", so the child can't clean up the parents connection without closing the parents connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is insanely dangerous and will break worse with this disabled. Safe operation requires the use of the fork hooks to ensure that the mutexes are owned by the thread calling fork() and the child is able to clean up the connection properly.
This is likely just masking the error or causing a different kind of broken behavior. The latest release of zk-1.9.3 & zookeeper-1.4.8 have fixes for the event delivery system that we believe caused the deadlocks (@eric has been running them with success in production for several weeks)
You must install the fork hooks for correct operation in forking environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed this in my longer comment below, but to be clear, redis_failover has never installed ZK.install_fork_hook. My PR simply adds a comment about our experiences with it.
P.S. I'm happy to remove the comment, especially now that we have new zk/zookeeper releases.