Necessity of linger on exit for servers that time out #42

shikokuchuo · 2023-03-23T19:54:45Z

As servers have the option to time out or task-out after a set number of tasks, it would be ideal to exit the process immediately thereafter - however, at present, this is only possible after an 'exitlinger' period, which by default is set to 1s. This should be sufficient for sending objects of ~ 1GB in size.

What is currently not possible is for exit to be conditional upon the send being completed.

This is, I believe, due to:

If no linger period is implemented in R, the interpreter thinks execution has ended and reaps all child threads even though the send is in progress asynchronously at the C level.
C functions that are part of the NNG library do not help as sends are recorded as complete once the socket accepts the message for transport. That means that NNG's definition of a send being complete only means the responsibility is transferred to the system sockets. However this does not guarantee that the send actually completes if the process is reaped in the meantime.

It would be great if a solution can be found.

shikokuchuo · 2023-03-24T17:38:03Z

This man page for socket close suggests it may not be possible through the existing NNG interface: https://nng.nanomsg.org/man/tip/nng_close.3.html

Closing the socket while data is in transmission will likely lead to loss of that data. There is no automatic linger or flush to ensure that the socket send buffers have completely transmitted. It is recommended to wait a brief period after calling nng_send() or similar functions, before calling this function.

wlandau · 2023-05-08T15:13:44Z

In the case of long-running computation, it seems like this would matter most when sending the result of a completed task back to the client, rather than receiving data for a new task. And in the former case, would it be possible for the server to pause its idle timers etc. before initiating a send? Unless I am missing something, it seems like this would just be a matter of expressing the timer logic differently in R.

shikokuchuo · 2023-05-08T15:40:45Z

The issue is we can do what we like prior to the send, or afterwards for that matter. But we just simply do not know when it has finished. As that is an interplay between the C process and the system TCP stack, that R has no access to at present.

wlandau · 2023-05-08T19:19:27Z

That makes sense.

By the way, this discussion made me concerned that a server could exit and lose the data far before the client has a chance to download it. I am happy to see that lightweight tasks seem to be available somewhere well after the server exits. On my company's cluster, I started a dispatcher on one node:

library(mirai)
url <- sprintf("ws://%s:57000", getip::getip())
print(url)
daemons(
  n = 1L,
  url = url,
  dispatcher = TRUE,
  token = FALSE
)
while (!is.matrix(daemons()$daemons)) {
  Sys.sleep(0.1)
}
while (daemons()$daemons[, "online"] < 1L) {
  Sys.sleep(0.1)
}
tasks <- replicate(4, mirai(rnorm(n = 1)))
Sys.sleep(4)
print(as.numeric(lapply(tasks, function(task) task$data)))

During the while() loop with daemons()$daemons[, "online"] , I launched a server on a different node on the local network:

R -e 'mirai::server(url = "ws://x.x.x.x:57000", idletime = 1000, exitlinger = 1000)'

The server visibly came and went, and the client did not make an attempt to collect the data until a couple seconds after that. But yet no result went missing!

print(as.numeric(lapply(tasks, function(task) task$data)))
#> [1]  1.3502759 -0.2049120  0.1465165 -0.5801425

This is really amazing. Where do the results live between the server exit and the moment the client starts to collect them?

shikokuchuo · 2023-05-08T19:25:10Z

That makes sense.

By the way, this discussion made me concerned that a server could exit and lose the data far before the client has a chance to download it. I am happy to see that lightweight tasks seem to be available somewhere well after the server exits. On my company's cluster, I started a dispatcher on one node:

Ha yes TCP is surprisingly resilient.

During the while() loop with daemons()$daemons[, "online"] , I launched a server on a different node on the local network:
R -e 'mirai::server(url = "ws://x.x.x.x:57000", idletime = 1000, exitlinger = 1000)'
The server visibly came and went, and the client did not make an attempt to collect the data until a couple seconds after that. But yet no result went missing!

The send is eager so it is done when the server is still alive. <- This though assumes it finishes transmitting before the 'exitlinger' period and the process dies.

This is really amazing. Where do the results live between the server exit and the moment the client starts to collect them?

I believe the data is just buffered at the client (listener) TCP socket, so it can be collected at any time by NNG.

wlandau · 2023-05-19T15:17:28Z

Seems like there would have to be new logic. Just for the sake of thinking out loud:

Server: when beginning a send, increment a statistic like sends.
Server: create a new condition variable to count dispatcher-side receives.
Dispatcher: check for incoming data without actually downloading it, similar to .unresolved() (is this possible?)
Dispatcher: in the event loop, if (2) shows that the data is completely ready for download from listener TCP socket, then trigger a pipe event to increment the server-side receives condition variable.
Server: if the sends statistic and receives CV are equal to each other, then it is safe to exit.

Is this all possible? Am I missing something? I'm not sure if (4) is possible because the dispatcher is non-polling. Without polling, I suppose a callback mechanism would be needed, and from #42 (comment) it sounds like a callback mechanism does not exist at the NNG level.

shikokuchuo · 2023-05-19T22:31:30Z

It's just a question of efficiency. You can always do something like send a received ack when dispatcher receives the result from server and have server wait for that. Just sending messages will be more efficient than establishing a new pipe in [4].

However this will mean having a 'receive task' state at server, followed by a 'receive ack' state. Probably robust, but likely 'something they did 30 years ago'...

And I think this will mean doing this for every task, I don't think there's a good way for server to signal 'I want to exit, send an ack next time'.

shikokuchuo added enhancement New feature or request help wanted Extra attention is needed labels Mar 23, 2023

wlandau mentioned this issue May 11, 2023

Hanging tasks on Github Actions Ubuntu Runners (R CMD Check) #53

Closed

wlandau mentioned this issue Jun 20, 2023

Incorrect cumulative “complete” stats on Linux wlandau/crew#90

Closed

Repository owner locked and limited conversation to collaborators Jun 27, 2023

shikokuchuo converted this issue into discussion #63 Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Necessity of linger on exit for servers that time out #42

Necessity of linger on exit for servers that time out #42

shikokuchuo commented Mar 23, 2023

shikokuchuo commented Mar 24, 2023 •

edited

Loading

wlandau commented May 8, 2023 •

edited

Loading

shikokuchuo commented May 8, 2023

wlandau commented May 8, 2023 •

edited

Loading

shikokuchuo commented May 8, 2023

wlandau commented May 19, 2023 •

edited

Loading

shikokuchuo commented May 19, 2023 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Necessity of linger on exit for servers that time out #42

Necessity of linger on exit for servers that time out #42

Comments

shikokuchuo commented Mar 23, 2023

shikokuchuo commented Mar 24, 2023 • edited Loading

wlandau commented May 8, 2023 • edited Loading

shikokuchuo commented May 8, 2023

wlandau commented May 8, 2023 • edited Loading

shikokuchuo commented May 8, 2023

wlandau commented May 19, 2023 • edited Loading

shikokuchuo commented May 19, 2023 • edited Loading

This issue was moved to a discussion.

shikokuchuo commented Mar 24, 2023 •

edited

Loading

wlandau commented May 8, 2023 •

edited

Loading

wlandau commented May 8, 2023 •

edited

Loading

wlandau commented May 19, 2023 •

edited

Loading

shikokuchuo commented May 19, 2023 •

edited

Loading