-
Notifications
You must be signed in to change notification settings - Fork 10
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Necessity of linger on exit for servers that time out #42
Comments
This man page for socket close suggests it may not be possible through the existing NNG interface: https://nng.nanomsg.org/man/tip/nng_close.3.html
|
In the case of long-running computation, it seems like this would matter most when sending the result of a completed task back to the client, rather than receiving data for a new task. And in the former case, would it be possible for the server to pause its idle timers etc. before initiating a send? Unless I am missing something, it seems like this would just be a matter of expressing the timer logic differently in R. |
The issue is we can do what we like prior to the send, or afterwards for that matter. But we just simply do not know when it has finished. As that is an interplay between the C process and the system TCP stack, that R has no access to at present. |
That makes sense. By the way, this discussion made me concerned that a server could exit and lose the data far before the client has a chance to download it. I am happy to see that lightweight tasks seem to be available somewhere well after the server exits. On my company's cluster, I started a dispatcher on one node: library(mirai)
url <- sprintf("ws://%s:57000", getip::getip())
print(url)
daemons(
n = 1L,
url = url,
dispatcher = TRUE,
token = FALSE
)
while (!is.matrix(daemons()$daemons)) {
Sys.sleep(0.1)
}
while (daemons()$daemons[, "online"] < 1L) {
Sys.sleep(0.1)
}
tasks <- replicate(4, mirai(rnorm(n = 1)))
Sys.sleep(4)
print(as.numeric(lapply(tasks, function(task) task$data))) During the
The server visibly came and went, and the client did not make an attempt to collect the data until a couple seconds after that. But yet no result went missing! print(as.numeric(lapply(tasks, function(task) task$data)))
#> [1] 1.3502759 -0.2049120 0.1465165 -0.5801425 This is really amazing. Where do the results live between the server exit and the moment the client starts to collect them? |
Ha yes TCP is surprisingly resilient.
The send is eager so it is done when the server is still alive. <- This though assumes it finishes transmitting before the 'exitlinger' period and the process dies.
I believe the data is just buffered at the client (listener) TCP socket, so it can be collected at any time by NNG. |
Seems like there would have to be new logic. Just for the sake of thinking out loud:
Is this all possible? Am I missing something? I'm not sure if (4) is possible because the dispatcher is non-polling. Without polling, I suppose a callback mechanism would be needed, and from #42 (comment) it sounds like a callback mechanism does not exist at the NNG level. |
It's just a question of efficiency. You can always do something like send a received ack when dispatcher receives the result from server and have server wait for that. Just sending messages will be more efficient than establishing a new pipe in [4]. However this will mean having a 'receive task' state at server, followed by a 'receive ack' state. Probably robust, but likely 'something they did 30 years ago'... And I think this will mean doing this for every task, I don't think there's a good way for server to signal 'I want to exit, send an ack next time'. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
As servers have the option to time out or task-out after a set number of tasks, it would be ideal to exit the process immediately thereafter - however, at present, this is only possible after an 'exitlinger' period, which by default is set to 1s. This should be sufficient for sending objects of ~ 1GB in size.
What is currently not possible is for exit to be conditional upon the send being completed.
This is, I believe, due to:
It would be great if a solution can be found.
The text was updated successfully, but these errors were encountered: