-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
way to map/query across all workers #2
Comments
Are you meaning a map reduce implemention in pigato? The best thing for me is to create a worker that handles this kind of I think that there should be no need to know how many workers are Please tell me if I misunderstood your thoughts. Paolo On 15 Oct 2014 17:59, "Bradley Meck" [email protected] wrote:
|
yes this is a map reduce across workers, where the workers are the source of state. In our usage we have workers sitting on boxes monitoring processes. Since the processes are tied to physical machines we cannot run out to a shared DB etc. to get a single source of data. so if we want to query all workers, somehow. We can do the reduce client side, or server side. I would prefer not to send a reduce method over the wire like many map/reduce impls. Renaming issue to just be a global map() |
@prdn starting impl, will need a new set of queueing on the controller.
|
just going to have map send normal looking requests to all workers, they should be indistinguishable for the worker. unless you can think of a reason why workers need to know that this specific request a is a map request. |
I think that a worker shouldn't know that a request is map request. I have a couple of questions to better suggest a solution. How do you know that all required workers are connected to the broker? We may think to a slightly different pattern. For example a broadcast request from the client to a specific service. All the workers will answer to that request. Let me know what do you think |
How do we know that all required workers are connected?You don't, just query what you have and see if the results have the needed info How many workers required?I was just going to queue the request to all currently connected workers for a service Who performs the reduce?Reduce will be left to the client. Since the request is not seen as a map request, the broker needs to tag responses for multiplexing purposes. Incomplete repliesUnsure what you mean, if we are pulling multiple streams of data the concept of a complete reply would be difficult. We may not know when a stream ends (until it does). |
Ok now it's more clear. Thank you. If I've well understood the client receives a single multiplexed steam from the broker that is the result of all the workers streams. Is it correct? |
yes |
Ok so the client could not have any builtin logic of reducing but receives a stream as normally. |
I have to think why we need a new queue. |
@prdn let me know if you have any thoughts to avoid the new queue. right now we need a new queue because we can only queue based upon service name and not target requests to workers. if we wait for a dispatch on a service and see the queued request is not for our service id we would need to skip that request (but keep it at the front of the queue to avoid starvation). I did not see a sanctioned way to do this. |
Ok. Thank you |
Maybe I'm wrong, but isn't ventilator/sink pattern a better fit for this? |
@alexeygolev it would be a better fit in the sense of doing parallel tasks but we start to get into territory where zmq is not exactly a good fit to my experience. We would need a multiplexed sink and ventilator per service on the broker. Preferably without opening a port per sink/ventilator. If you know a good way to do this I would be game and it would make things easier. |
@bmeck sounds like a mission... will research/play around |
@alexeygolev one thing to note is service workers may contain state so we need to be sure that the query hits all the workers |
Guys.
To implement this scenario we have only to:
This approach is interesting for me because there are only little modifications for the broker and we can move the complexity to specialised workers. What do you think about this? |
as long as there is a way to query all of the workers reliably thats fine, On Fri, Jan 9, 2015 at 5:17 AM, Paolo Ardoino [email protected]
|
For the Client this is totally transparent. On Fri Jan 09 2015 at 12:55:51 PM Bradley Meck [email protected]
|
@prdn are we tied to the special worker approach, would a PR for allowing normal workers to act as part of a reduce be fine? |
We are not tied to that approach. I'm truly interested in seeing your PR. |
@bmeck could you explain a little bit which are the proposed modifications to the worker and how it can impact the current behaviour. |
This would be completely transparent to the worker. It would require that the broker be able to multiplex receiving multiple streams from different workers. To the worker there would be no need to know that it is being asked as a batch request. The map function is a bit more complicated though, we can perform it client side or via a well known service. In a pseudocode sort of summary it looks like:
In this case the broker and the client need to be altered, but not the Worker. This means that you can send batch requests without any issues. Now onto mapping... Mapping is a bit interesting because I don't think it should be in the broker at all. For now I think it should live on the client and the client should just deal w/ the multiple replies as it comes in. We could do the map on the broker or make a well known function signature for how to accept a map request; but I think that should be put off until the ups/downs of this are more visible. |
as a side note we are faking this in a similar way to do multiplexed responses from a worker and sending back Server Sent Events; just would be using this on the broker for multiplexing worker responses rather than parts of the worker's reply stream. |
If I well understand the Broker should offer a way to retrieve all workers available for a given service. For example, let's say that the Broker answers to a Client request called $directory giving all the workerIds belonging to a given service. Does this correspond to your scenario? |
in my scenario the client uses 1 connection and the broker makes multiple We could make the client do a request for all the current workers of a On Sat, Apr 11, 2015 at 5:23 AM, Paolo Ardoino [email protected]
|
in my suggested scenario the client will use a single connection but sends multiple requests on that connection. The broker maintains its simplicity and let requests flow as normal. If it is not too much effort for you, don't you mind if we start with a client demultiplexer? |
we can do it client multiplexed first, but then we will need to expose what the worker ids are, which is fine. |
@bmeck please take a look to this commit f7127f2 , this file https://github.com/prdn/pigato/blob/master/services/Directory.js and this test https://github.com/prdn/pigato/blob/master/test/directory.js Basically now I have added a new socket to the Broker (inproc) that is used by the Broker to publish its internal status to subscribed core services. Now we should have all the pieces for the map-reduce. Let me know what do you think about this. |
design / bikeshed
this is a common problem but being able to run a reduce across workers is important.
right now we can use a separate registry from the broker to generate a list of workers and run a reduce across them.
being able to wait on / run a reduce across all workers of a service would be a big win. technically all that is needed is the ability to:
since pigato is more than just a simple omdp we should discuss if this should be a service or built into the broker.
The text was updated successfully, but these errors were encountered: