You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the client is initialized with multiple hosts, it makes sense to retry a failed request on a different host
You can specify how many times should the client retry the request before it raises an exception
Elasticsearch by default dynamically discovers new nodes in the cluster. You can leverage this in the client, and periodically check for new nodes to spread the load.
Depending on the error we get different results. If ES actually returns a response with some kind of failure, then it would be retried 3 times (max_retries). But if the connection is actually refused (the host can't even respond), the query is executed multiple times in different hosts, how many times the number of nodes we have (so theoretically the query will be executed in all the different nodes to see if one actually responds). The connections are reloaded how many times the number of nodes we have
reload_on_failure makes it so that max_retries isn't checked. If max_retries is 3, reload_on_failure is true and we have 20 ES hosts, the query is retried 20 times after reloading connections 20 times.
This happens because max_retries is only checked after the reload_on_failure logic here, so if it gets into the reload_on_failure && tries < connections.all.size block, it calls retry and nothing else is checked.
PS:
This is a edge case because it requires all ES nodes to be completely down or under heavy load that no connections can't even get in, but this has happened to us in production many times due to the size and queries of our cluster
The text was updated successfully, but these errors were encountered:
I'm unsure if this is an actual bug or the intended feature. Kind of feel like it is a bug but at the same time it doesn't.
Reading the docs we have:
Depending on the error we get different results. If ES actually returns a response with some kind of failure, then it would be retried 3 times (
max_retries
). But if the connection is actually refused (the host can't even respond), the query is executed multiple times in different hosts, how many times the number of nodes we have (so theoretically the query will be executed in all the different nodes to see if one actually responds). The connections are reloaded how many times the number of nodes we havereload_on_failure
makes it so thatmax_retries
isn't checked. Ifmax_retries
is 3,reload_on_failure
is true and we have 20 ES hosts, the query is retried 20 times after reloading connections 20 times.This happens because
max_retries
is only checked after thereload_on_failure
logic here, so if it gets into thereload_on_failure && tries < connections.all.size
block, it callsretry
and nothing else is checked.PS:
This is a edge case because it requires all ES nodes to be completely down or under heavy load that no connections can't even get in, but this has happened to us in production many times due to the size and queries of our cluster
The text was updated successfully, but these errors were encountered: