Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make control socket more robust and support larger messages #67

Merged
merged 8 commits into from
Aug 29, 2024

Conversation

smemsh
Copy link
Contributor

@smemsh smemsh commented Aug 19, 2024

This patch series makes a number of changes in communications.c to make sending and receiving of messages on the unix control socket more robust against failures and transient conditions. It also removes limitations on the content and size of the messages.

Previously, the control socket was augmented in #43 to extend the size of response that could be sent by receive_message() back to the sender -- ie to send_message() -- which was awaiting response to queries, or output from executed commands. This handled the case where there was a large output (eg, sdorfehs -c windows), but did not cover the case of large input messages, as noted in #66.

Included in this PR are patches which factor out the earlier code from #43 into a common function which is now shared by both the receiver (getting a command) and the sender (getting the response, after sending the command). The message buffer is now gathered on the heap and processed whole, rather than one byte at a time. We also cover partial reads, and interrupted reads, which can happen when an [asynchronously] forked child process (such as with sdorfehs' exec command) happens to finish during message processing, and the SIGCHLD handler interrupts message IO system calls. Error handling is made somewhat more robust as a side effect.

There was already existing code to process multi-line messages for sdorfehs -c echo command: it would draw a window that was sized according to the presence of newlines. However, the message receive code had been changed to use a newline as message delimiter ever since sdorfehs was augmented to use a UNIX domain socket (noted in #66). That limitation is now removed, and we go until the end of the submitted buffer. This is safe because:

  • we will keep getting bytes until the socket is closed or EWOULDBLOCK (ie EAGAIN), so we know the message is either still proceeding or finished
  • these bytes originated as C strings (such as sdorfehs -c coming from the argument vector), which we know are NULL terminated, and have taken care to include trailing NULL in the message
  • we keep reading until we got everything, or discard and cancel
  • we zero all the bytes in our buffers as we grow it, before writing into it
  • we have a safety check that the last byte in the message is indeed a NULL (this can be removed later)

Now we can do things like:

sdorfehs -c "echo $(ncal 2024)"

which extends into 3 message quanta (via realloc()). It works well in testing.

Lastly, we also fixed a stray leak of the control socket FD. All forked procs (such as terminals) were getting a copy of the FD due to UNIX semantics preserving open file descriptors across execv().

smemsh added 2 commits August 17, 2024 02:08
probably the intention here was to support characters with values > 127
(ie iso_8859_1 chars), but there's no reason to use unsigned for this:
in C the string functions work just fine as their default type (char)
and don't require casting to support high-ascii characters.

of course, it still won't work with chars > 255, as before.

this change was tested with "sdorfehs -c $'echo \uab'" which correctly
shows the double-arrow, "left-pointing double angle quotation mark"
Rather than process one buffer at a time and do partial outputs each
loop on every partial response (sent to us from receive_command()), we
make a buffer on the heap and keep growing it as long as there's still
data coming in the response.  So we get the whole message before
attempting to process it.

As before, we go until either the other end closes (as it does at the
end of the transmission), or there's no more data to read (ie, another
read would block).  The latter happens when we received the whole
response buffer, but the receiver hasn't yet finished closing the client
socket.  Either of these scenarios means we're at the end of the
message.

The reason for separating receipt and disposition is so that processing
of the response buffer's contents becomes independent, out-of-band from
filling the buffer itself.  This way, we can make the message-receive
loop into a utility function of its own, and re-use it in
receive_command() so it can be made to process arbitrarily sized
send_command() messages; it's currently limited to a single BUFSZ buffer
and is read one character at a time.  Already, we are *sending* the
complete command argument (eg, from "sdorfehs -c"), but it will get
truncated by receive_command() if it exceeds the buffer size on the
other side.  (Actually, we try to send the whole thing in a single
write() and will bail the whole WM on failure, which we probably need to
fix as well...)
@smemsh
Copy link
Contributor Author

smemsh commented Aug 19, 2024

Works for me. Let me know if any style or logic changes are needed...

communications.c Show resolved Hide resolved
@jcs
Copy link
Owner

jcs commented Aug 19, 2024

otherwise looks ok, thanks for this

@smemsh smemsh force-pushed the large-cmd-messages branch from 306e727 to b03b491 Compare August 19, 2024 15:02
@smemsh
Copy link
Contributor Author

smemsh commented Aug 19, 2024

@jcs Let's wait for @kit-la to comment that it works for him, and let me run it for a few days locally, just to make sure nothing funny happens. I'll follow up when ready, thanks.

smemsh added 6 commits August 21, 2024 05:48
Previously, we used a newline to mark the end of the message sent from
send_command(), which restricted the characters we could send across
(ie, we couldn't send a newline).  This was a new limitation of sdorfehs
that was not present in ratpoison.  In particular, "ratpoison -c echo"
was instrumented to be able to handle newlines in the message and draw
variable-height windows to show them (for example, "ratpoison -c
$(ncal)".

After the sdorfehs change and switch to unix domain socket, sdorfehs now
uses a newline to mark the end of the command, and will therefore
silently truncate the sent message at the first newline.

This limitation can be removed; it's safe enough to go until we see a
NULL because it's always there:

  - there's only one caller of send_command(), ie main() coming from
    "sdorfehs -c" processing; since that comes from the program's
    argument vector, we know it will always terminated by a NULL itself

  - our buffer was fully cleared at entry, so a small read will be
    terminated by the trailing initialized bytes

  - if the sender exceeds our buffer size, it's already handled in the
    code by checking that we reached the end and aborting with a
    "bogus command length" message (this was verified by stepping
    through with gdb using 'sdorfehs -c "$(ncal 2024)"')

Note that the end of buffer content comes before we might expect it
because the full command string is included: the "interactive" byte, the
command word itself, and the entire argument string.

Note also that code on both ends (sender and receiver) has to take
special care with the first byte, because it determines if the command
is interactive.

While this patch removes the newline limitation, this receiver code is
still limited by being able to use only a single buffer, and doing
character-at-a-time reads therefrom.  To be addressed in following
commits.
Here we make a new function recv_unix() that encapsulates the logic of
receiving a message from a unix domain socket (ie our control socket)
and copying it into a dynamically-sized heap buffer for disposition by
the caller.  This will let us use it from other functions, in particular
from receive_command(), which is currently using a single buffer and
reading from the socket one character at a time.  Since the logic for
doing this robustly was already implemented in send_command(), we can
just factor it out and use in both places.
This re-uses the code factored out from send_command() to receive the
sent command in multiple chunks into a dynamic heap buffer.  Now we are
able to receive arbitrarily sized commands that span multiple BUFSZs.
This restores a regression from the original Ratpoison.

Example: 'sdorfehs -c "echo $(ncal 2024)"' will print the full year's
calendar as a message, it's over 2k bytes, so it will span 3 buffer
quanta (BUFSZ == 1024 at time of writing).
There are some transient conditions which are normal, such as an
interrupted system call, which can happen if sdorfehs received a signal
during processing, such as SIGCHLD when a forked process exited (a
completely normal situation).  Of course, we should retry the syscall in
that case

.  We can retry these after a small sleep of
200ms

both send_message() and receive_message()
Both send_message() and receive_message() use writes: the former to send
to the latter, and then a reply is sent back from the latter to the
former.  Both of them expected a single write to work unconditionally
without being able to tolerate any transient issues.

This commit factors out the writes into a common function called
send_unix() which serves as the companion to the recently added
recv_unix().  The send call is modified to handle both partial writes
and interrupted writes.  Especially the latter is not that unlikely: it
can happen when we're in the middle of writing a message and then we get
interrupted by a SIGCHLD.  This is not a difficult thing to happen
during the ordinary course of operations because the native 'exec' is
asynchronous.  It was observed to happen during testing.

Partial writes might also be possible if some of the buffer was sent and
then got interrupted by a signal, but this probably can't happen at
the small BUFSZ we use.  Nonetheless it will make the code more robust
to support the possibility of partial writes.

Any error besides EINTR is considered a fatal error for send_message()
because this is only called by a "client" invocation, ie "sdorfehs -c".
The "server" (ie receve_message()) only issues a warning and then aborts
processing the signal.  Otherwise, it would die.  This matches the
existing use of err() and warn() that was used after a mismatch in
written and wrote buffer sizes in the respective writers, prior to this
patch.
All the exec'ed processes would get copies of the control socket
descriptors, because of the default UNIX semantics of fd inherit across
execv().  This was easily seen on my system where an 'lsof' revealed
that all my terminals and shells had a copy.

To fix this we add the SOCK_CLOEXEC flag whilst opening the listener
socket(), avoiding this problem.  We also need to set CLOEXEC on the
socket returned by accept():

  "accept() ... extracts the first connection request on the queue of
  pending connections ... creates a new connected socket, and returns a
  new file descriptor"

The new descriptor does not inherit CLOEXEC flag from the listener.

The "bar" descriptor already handles this by setting O_CLOEXEC in its
own open() call on the FIFO.
@smemsh smemsh force-pushed the large-cmd-messages branch from b03b491 to 90ba682 Compare August 21, 2024 12:49
@smemsh
Copy link
Contributor Author

smemsh commented Aug 21, 2024

Found two bugs:

  • Control socket was still leaking. It's not enough to set SOCK_CLOEXEC on the socket(); also the accept()s get their own new descriptor, which does not inherit this flag from the socket and thus need their own fcntl() to set it. Actually there is an accept4() that has a flags argument, avoiding the separate call, but this requires _GNU_SOURCE to be defined on Linux, so I just used a separate call. FreeBSD has this without any special flags since 10.0. It looked like completions.c already had this define to pull in strcasestr() but I didn't want to pollute the source and sdorfehs is not multithreaded so there is no race condition.
  • The prepared reply to sender in receive_message() still was attaching newlines to the end of response messages, even though they weren't needed anymore. Since they weren't getting stripped off anymore they were causing spurious newlines on stdout after any sdorfehs -c command.

These were just one-liner changes so fixed up the commits rather than make new ones.

Still hoping @kit-la can acknowledge, and will bake a few more days.

@kit-la
Copy link

kit-la commented Aug 27, 2024

Sorry for the late response, it does work as intended, thank you!

@smemsh
Copy link
Contributor Author

smemsh commented Aug 28, 2024

@jcs been running with no issues since then, comfortable shipping it when you are, thanks

@jcs jcs merged commit 50c0635 into jcs:master Aug 29, 2024
1 check passed
@smemsh smemsh deleted the large-cmd-messages branch September 10, 2024 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants