make control socket more robust and support larger messages #67

smemsh · 2024-08-19T13:48:10Z

This patch series makes a number of changes in communications.c to make sending and receiving of messages on the unix control socket more robust against failures and transient conditions. It also removes limitations on the content and size of the messages.

Previously, the control socket was augmented in #43 to extend the size of response that could be sent by receive_message() back to the sender -- ie to send_message() -- which was awaiting response to queries, or output from executed commands. This handled the case where there was a large output (eg, sdorfehs -c windows), but did not cover the case of large input messages, as noted in #66.

Included in this PR are patches which factor out the earlier code from #43 into a common function which is now shared by both the receiver (getting a command) and the sender (getting the response, after sending the command). The message buffer is now gathered on the heap and processed whole, rather than one byte at a time. We also cover partial reads, and interrupted reads, which can happen when an [asynchronously] forked child process (such as with sdorfehs' exec command) happens to finish during message processing, and the SIGCHLD handler interrupts message IO system calls. Error handling is made somewhat more robust as a side effect.

There was already existing code to process multi-line messages for sdorfehs -c echo command: it would draw a window that was sized according to the presence of newlines. However, the message receive code had been changed to use a newline as message delimiter ever since sdorfehs was augmented to use a UNIX domain socket (noted in #66). That limitation is now removed, and we go until the end of the submitted buffer. This is safe because:

we will keep getting bytes until the socket is closed or EWOULDBLOCK (ie EAGAIN), so we know the message is either still proceeding or finished
these bytes originated as C strings (such as sdorfehs -c coming from the argument vector), which we know are NULL terminated, and have taken care to include trailing NULL in the message
we keep reading until we got everything, or discard and cancel
we zero all the bytes in our buffers as we grow it, before writing into it
we have a safety check that the last byte in the message is indeed a NULL (this can be removed later)

Now we can do things like:

sdorfehs -c "echo $(ncal 2024)"

which extends into 3 message quanta (via realloc()). It works well in testing.

Lastly, we also fixed a stray leak of the control socket FD. All forked procs (such as terminals) were getting a copy of the FD due to UNIX semantics preserving open file descriptors across execv().

probably the intention here was to support characters with values > 127 (ie iso_8859_1 chars), but there's no reason to use unsigned for this: in C the string functions work just fine as their default type (char) and don't require casting to support high-ascii characters. of course, it still won't work with chars > 255, as before. this change was tested with "sdorfehs -c $'echo \uab'" which correctly shows the double-arrow, "left-pointing double angle quotation mark"

Rather than process one buffer at a time and do partial outputs each loop on every partial response (sent to us from receive_command()), we make a buffer on the heap and keep growing it as long as there's still data coming in the response. So we get the whole message before attempting to process it. As before, we go until either the other end closes (as it does at the end of the transmission), or there's no more data to read (ie, another read would block). The latter happens when we received the whole response buffer, but the receiver hasn't yet finished closing the client socket. Either of these scenarios means we're at the end of the message. The reason for separating receipt and disposition is so that processing of the response buffer's contents becomes independent, out-of-band from filling the buffer itself. This way, we can make the message-receive loop into a utility function of its own, and re-use it in receive_command() so it can be made to process arbitrarily sized send_command() messages; it's currently limited to a single BUFSZ buffer and is read one character at a time. Already, we are *sending* the complete command argument (eg, from "sdorfehs -c"), but it will get truncated by receive_command() if it exceeds the buffer size on the other side. (Actually, we try to send the whole thing in a single write() and will bail the whole WM on failure, which we probably need to fix as well...)

smemsh · 2024-08-19T13:51:15Z

Works for me. Let me know if any style or logic changes are needed...

communications.c

jcs · 2024-08-19T14:49:41Z

otherwise looks ok, thanks for this

smemsh · 2024-08-19T15:07:59Z

@jcs Let's wait for @kit-la to comment that it works for him, and let me run it for a few days locally, just to make sure nothing funny happens. I'll follow up when ready, thanks.

Previously, we used a newline to mark the end of the message sent from send_command(), which restricted the characters we could send across (ie, we couldn't send a newline). This was a new limitation of sdorfehs that was not present in ratpoison. In particular, "ratpoison -c echo" was instrumented to be able to handle newlines in the message and draw variable-height windows to show them (for example, "ratpoison -c $(ncal)". After the sdorfehs change and switch to unix domain socket, sdorfehs now uses a newline to mark the end of the command, and will therefore silently truncate the sent message at the first newline. This limitation can be removed; it's safe enough to go until we see a NULL because it's always there: - there's only one caller of send_command(), ie main() coming from "sdorfehs -c" processing; since that comes from the program's argument vector, we know it will always terminated by a NULL itself - our buffer was fully cleared at entry, so a small read will be terminated by the trailing initialized bytes - if the sender exceeds our buffer size, it's already handled in the code by checking that we reached the end and aborting with a "bogus command length" message (this was verified by stepping through with gdb using 'sdorfehs -c "$(ncal 2024)"') Note that the end of buffer content comes before we might expect it because the full command string is included: the "interactive" byte, the command word itself, and the entire argument string. Note also that code on both ends (sender and receiver) has to take special care with the first byte, because it determines if the command is interactive. While this patch removes the newline limitation, this receiver code is still limited by being able to use only a single buffer, and doing character-at-a-time reads therefrom. To be addressed in following commits.

Here we make a new function recv_unix() that encapsulates the logic of receiving a message from a unix domain socket (ie our control socket) and copying it into a dynamically-sized heap buffer for disposition by the caller. This will let us use it from other functions, in particular from receive_command(), which is currently using a single buffer and reading from the socket one character at a time. Since the logic for doing this robustly was already implemented in send_command(), we can just factor it out and use in both places.

This re-uses the code factored out from send_command() to receive the sent command in multiple chunks into a dynamic heap buffer. Now we are able to receive arbitrarily sized commands that span multiple BUFSZs. This restores a regression from the original Ratpoison. Example: 'sdorfehs -c "echo $(ncal 2024)"' will print the full year's calendar as a message, it's over 2k bytes, so it will span 3 buffer quanta (BUFSZ == 1024 at time of writing).

There are some transient conditions which are normal, such as an interrupted system call, which can happen if sdorfehs received a signal during processing, such as SIGCHLD when a forked process exited (a completely normal situation). Of course, we should retry the syscall in that case . We can retry these after a small sleep of 200ms both send_message() and receive_message()

Both send_message() and receive_message() use writes: the former to send to the latter, and then a reply is sent back from the latter to the former. Both of them expected a single write to work unconditionally without being able to tolerate any transient issues. This commit factors out the writes into a common function called send_unix() which serves as the companion to the recently added recv_unix(). The send call is modified to handle both partial writes and interrupted writes. Especially the latter is not that unlikely: it can happen when we're in the middle of writing a message and then we get interrupted by a SIGCHLD. This is not a difficult thing to happen during the ordinary course of operations because the native 'exec' is asynchronous. It was observed to happen during testing. Partial writes might also be possible if some of the buffer was sent and then got interrupted by a signal, but this probably can't happen at the small BUFSZ we use. Nonetheless it will make the code more robust to support the possibility of partial writes. Any error besides EINTR is considered a fatal error for send_message() because this is only called by a "client" invocation, ie "sdorfehs -c". The "server" (ie receve_message()) only issues a warning and then aborts processing the signal. Otherwise, it would die. This matches the existing use of err() and warn() that was used after a mismatch in written and wrote buffer sizes in the respective writers, prior to this patch.

All the exec'ed processes would get copies of the control socket descriptors, because of the default UNIX semantics of fd inherit across execv(). This was easily seen on my system where an 'lsof' revealed that all my terminals and shells had a copy. To fix this we add the SOCK_CLOEXEC flag whilst opening the listener socket(), avoiding this problem. We also need to set CLOEXEC on the socket returned by accept(): "accept() ... extracts the first connection request on the queue of pending connections ... creates a new connected socket, and returns a new file descriptor" The new descriptor does not inherit CLOEXEC flag from the listener. The "bar" descriptor already handles this by setting O_CLOEXEC in its own open() call on the FIFO.

smemsh · 2024-08-21T12:56:15Z

Found two bugs:

Control socket was still leaking. It's not enough to set SOCK_CLOEXEC on the socket(); also the accept()s get their own new descriptor, which does not inherit this flag from the socket and thus need their own fcntl() to set it. Actually there is an accept4() that has a flags argument, avoiding the separate call, but this requires _GNU_SOURCE to be defined on Linux, so I just used a separate call. FreeBSD has this without any special flags since 10.0. It looked like completions.c already had this define to pull in strcasestr() but I didn't want to pollute the source and sdorfehs is not multithreaded so there is no race condition.
The prepared reply to sender in receive_message() still was attaching newlines to the end of response messages, even though they weren't needed anymore. Since they weren't getting stripped off anymore they were causing spurious newlines on stdout after any sdorfehs -c command.

These were just one-liner changes so fixed up the commits rather than make new ones.

Still hoping @kit-la can acknowledge, and will bake a few more days.

kit-la · 2024-08-27T21:21:51Z

Sorry for the late response, it does work as intended, thank you!

smemsh · 2024-08-28T04:34:36Z

@jcs been running with no issues since then, comfortable shipping it when you are, thanks

smemsh added 2 commits August 17, 2024 02:08

smemsh mentioned this pull request Aug 19, 2024

communications.c: Added support for messages spanning multiple lines. #66

Closed

jcs reviewed Aug 19, 2024

View reviewed changes

communications.c Show resolved Hide resolved

smemsh force-pushed the large-cmd-messages branch from 306e727 to b03b491 Compare August 19, 2024 15:02

smemsh added 6 commits August 21, 2024 05:48

smemsh force-pushed the large-cmd-messages branch from b03b491 to 90ba682 Compare August 21, 2024 12:49

jcs merged commit 50c0635 into jcs:master Aug 29, 2024
1 check passed

smemsh deleted the large-cmd-messages branch September 10, 2024 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make control socket more robust and support larger messages #67

make control socket more robust and support larger messages #67

smemsh commented Aug 19, 2024 •

edited

Loading

smemsh commented Aug 19, 2024

jcs commented Aug 19, 2024

smemsh commented Aug 19, 2024

smemsh commented Aug 21, 2024 •

edited

Loading

kit-la commented Aug 27, 2024

smemsh commented Aug 28, 2024

make control socket more robust and support larger messages #67

make control socket more robust and support larger messages #67

Conversation

smemsh commented Aug 19, 2024 • edited Loading

smemsh commented Aug 19, 2024

jcs commented Aug 19, 2024

smemsh commented Aug 19, 2024

smemsh commented Aug 21, 2024 • edited Loading

kit-la commented Aug 27, 2024

smemsh commented Aug 28, 2024

smemsh commented Aug 19, 2024 •

edited

Loading

smemsh commented Aug 21, 2024 •

edited

Loading