-
Notifications
You must be signed in to change notification settings - Fork 532
Specify multiple nodes #162
Comments
Hi, sorry for the delay. tl;dr: yes, it's probably a useful feature for This is something which has been debated over and over, mainly on IRC with @clintongormley and others. It has even been said Tire «misses the point of elasticsearch» because it does not handle high-availability in the client :) Now, I have couple of things to say to this. Let me explain step by step. First, let me say that I think that handling high-availability/failover in client “completely misses the point of elasticsearch” :) One of the best things about elasticsearch is that it talks HTTP. Handling availability in HTTP stacks is what we, web developers, do all day. Putting some kind of proxy (HAProxy, Varnish, custom Ruby proxy with em-proxy) in front of HTTP-based backend is easy, right? Most proxies can handle external sources for backend URL, can handle backend health, etc., right? That's what proxies are for, and people smarter then me have written those. So, what you should do, generally speaking, is to invest you intellectual energy into building a robust stack, where you put a proxy in front of the elasticsearch cluster, which automatically does round-robin, health checks. Then you can talk to elasticsearch from whatever client, even Obviously, it could be argued that you've just introduced a single point of failure in your stack — but there's no free lunch. Second, I've written one or two proxies in my life (see eg. https://github.com/igrigorik/em-proxy/blob/master/examples/balancing.rb), and while it's CS101-level programming, there are couple of interesting questions:
These are all questions you have to go over before you start writing code. Third, let's review what's possible now, as we speak: Thanks to @dylanahsmith's patch at 94e7b89, it's now possible to It is obviously an ugly solution, and the library should support you much better here. The most trivial start of this journey would be to store URLs from Fourth, all this said, three things make a feature like this worth exploring and implementing:
Still with me? So, here's how I think the feature should be approached and done, and I'll probably want it myself sooner or later. Everything should be designed as a Initially, when you pass a URL (or URLs) to Tire in the configure block (please load your YAML yourself and pass an Array, thank you! :), Tire performs the initial cluster state check, and retrieves & stores URLs for healthy nodes in a variable. Then, when you perform a search, nodes are queried in round-robin strategy, in succession. When a failure occurs where the node is not available, Tire will a) kick the node out of its set of healthy URLs, b) launch a background process which will retrieve fresh cluster state, and continue with next node in its set. If all nodes in set fail, in succession, it will give up and raise some Notice how, in this implementation, we don't have to perform background health checks — we perform them during “normal” searches. We kick out dead nodes when they fail to respond to search (or other operation). Now, the tricky part is of course the inter-process communication between the main code (your application) and the background process. I've done my share of process programming (see eg. https://gist.github.com/486161), but I still trip over from time to time when wearing sandals. I hope it will be doable without storing the nodes info in some external resource (file on disk, Redis, ...), but I'm definitely not sure. Obviously, to make it work nicely, we must first a) implement the cluster API in the core library, b) change the concept of To resume: when done like this, it would be hidden from the regular library user, as you'd just seed Tire with initial URL(s), and the library takes it from there. For developers, it would be transparent what's happening, given we can just log all the events in the main Tire log. Update: New in ES master, external multicast discovery, elastic/elasticsearch#1532. |
Lets break it down a bit (the logic), at least in a very simplistic manner (And putting 1532 elasticsearch issue aside). Tire should be initialized with a list of seed URLs and roundrobin around them. There should be an option to "sniff" the rest of the cluster using the nodes info API, by going to elasticsearch, getting all the nodes and using them as the urls to round robin against. One simple option is to have a list of live nodes that are always round robin through the API. A scheduled background job will go over the seed nodes and replace the live nodes. When sniffing is disabled, it means simple "ping" (HEAD on Make sense? |
Yes, definitely makes sense, that's the vision I have, and close to what I have described above, thanks for chiming in, @kimchy. I'm still unsure about the precise mechanics -- I'd like to be able to leave scheduled background jobs out, but of course, it's something which is at hand. Some Rake task which the user can schedule etc. One problem with the solution I've outlined – starting poller on node failure – is that it handles failover well, but cannot easily add new nodes unless one of them fails. |
@karmi I don't see the need for a background process to check which nodes are up - it's really fast, so I'd just do it synchronously. Makes things much less complex. In the Perl API, I accept 3 parameters:
A request does the following:
** if *** **** we try each server in turn (or in parallel for the async backends) to get a list of the live nodes ***
** if the error is not a *** **** if we still have potentially "good" servers in the list, then use the next server in the list to retry the request, otherwise repopulate from the *** **** success: rerun the request **** failure: throw an error (Damn markdown doesn't render the above bullets correctly - hope this is still readable) Couple of things I plan on changing:
|
@clintongormley Hi, sorry for the delay with the response, Clint! What you're saying makes perfect sense, and I was thinking that maybe keeping track of some counter and issuing cluster nodes checks within regular operations could make perfect sense. It would certainly be more robust then the background polling and message passing, at the cost of a possibly very small overhead when checking with ES. Thanks!! |
Any updates on this? Just wondering because the most recent comments were made around two months ago. Do you recommend going ahead with the rescue solution (94e7b89) or proxy for now? |
No updates yet -- I guess putting a proxy in front of ES if you're interested in high availability makes sense for now. |
One option instead of creating a specific proxy, is running an elasticsearch instance configured with |
Any updates on this? |
Would it be feasible to use DNS to handle failover? Route 53 has health checks and failover support. |
@mkdynamic The best solution would be to use a real proxy, such as HProxy, Nginx, etc. You can use Nginx as a round-robin proxy with keepalive pretty easy, see eg. https://gist.github.com/karmi/0a2b0e0df83813a4045f for config example. |
Thanks, that's helpful. Main concern with a proxy is that it introduces a single point of failure. Obviously potential performance and cost downsides too. — On Fri, Jul 19, 2013 at 6:17 AM, Karel Minarik [email protected]
|
I wouldn't describe Nginx-based proxy to have "obviously potential performance and cost downsides". But yes, every proxy would be a "single point of failure", though I'd hesitate to describe it as a practical problem. There's definitely planned support for robust, extensible multi-node support directly in Tire or its successors. |
Is there a way to specify multiple urls or hosts?
If our master node goes down, we want the ability to failover to another node. If this does not exist we will probably be adding this functionality, so any direction from you would be great.
So far from browsing tire, I do not see anything that loads in a config yml file. Is this correct?
The text was updated successfully, but these errors were encountered: