-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Flush stdout/err in Dockerfile-workers before replacing the current process #14195
Conversation
…er base config to show when it already exists.
…on successful tests.
Spent a few minutes tracking down the fail. Looks like the Alias flake
Text searching the raw logs for the ❌ found the error. The logs are huge, which is why I recommend reverting that commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about some of this.
I think a cleaner fix might be to call sys.stdout.flush()
and sys.stderr.flush()
before we call os.execle
/os.execv
/os.execve
?
Co-authored-by: David Robertson <[email protected]>
I admit to coming across those in my research about this. I did not think they would work without a companion A quick set of tests yielded odd results. I won't rehash the bulk of what I tried out in combinations but the most notable was that leaving I'm pretty sure I stand by my change here. Having |
If you like I can include a comment above the change about why the flush was added? |
Yes and no. If someone in the future writes makes a
I don't think this is an anomaly in buffer handling; this is something that you simply are expected to do when replacing the current process with an
So I think explicitly flushing before exec-ing is the more robust (and arguably idiomatic) change here. |
Well. I suppose that's a good point. I would argue if they've heard of those things they would look and see what other's have done before in a file they are changing. But, still a valid point.
And this is me learning something new today. I had not come across that detail in previous research. It would have saved me a few hours, I'm sure. Very well, I'll add |
Or perhaps just make both of them |
I think having Otherwise sounds good! |
I went for(essentially):
and then added:
above each
Would you like another set of |
Try changing |
|
…stdout." This reverts commit 7210a97.
Any objections to a helper function? Here's what I'm looking at duplicating across both files: Utility functions
def log(txt: str) -> None:
"""
Log something to the stdout.
Args:
txt: The text to log.
"""
print(txt)
def error(txt: str) -> NoReturn:
"""
Log something to the stderr and exit with an error code.
Args:
txt: The text to log in error.
"""
print(txt, file=sys.stderr)
sys.exit(2)
def flush_buffers() -> None:
"""
Flush stdout and stderr buffers
"""
sys.stdout.flush()
sys.stderr.flush() Then can just call |
… use the new helper.
Any objections to using
So I read a bit on
title above all the |
These are simple bootstrapping scripts: we don't need thorough docstrings here (particularly when the functions are short and the docstrings are longer than the functions they describe). Otherwise no objections to helpers.
|
They were in |
…go south it's LOUD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM.
Pull Request Checklist
Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
EventStore
toEventWorkerStore
.".code blocks
.Pull request includes a sign off
Code style is correct
(run the linters)
Fixes Dockerfile-workers not showing logging in early output #14194
Additional log output there.
Some logged output from the entrypoint of Dockerfile-workers is not displayed when running a container as a background process, or detached mode, with
-d
.-d
and detached mode is explained here, but a quick summary is that it is used by any process or service to be run in the background and not accept input from a command line, like a standard http server. Many VPS and NAS systems(such as unRAID and fly.io) that run docker images have this option built-in to the scripting that runs docker images as 'apps'; it is not something easily changeable.Explained somewhat here and even more so here, the culprit is because of line 'buffering' of output in Python. It appears that the buffer for stdout isn't being flushed out and it just gets 'lost'. Figuring out where it disappeared to is beyond the scope of this pull-request. The simplest solution to flushing this buffer is to include a keyword to print called
flush
. More about print() and it's flush keyword. The keyword was added in Python 3.3 and therefore isn't in danger of being non-existent for any versions of Python used or tested today.As observed inside of start.py, the issue is mitigated by sending logged output to stderr instead of stdout. This works, but doesn't feel like the right way to handle this. Just flush the buffers. If someone would rather I switch to using stderr, leave me a comment to that effect and I'll adjust.
Other options explored:
-u
, such aspython -u configure_workers_and_start.py
Neither seemed viable and felt intrusive to fix one tiny oversight in the way buffered output is handled.
There are two commits in this pull request that can be reverted. The one is harmless and probably would be good to keep. The other is bad and shouldn't be used long term on Github(tons of log spam):
Generating base homeserver config
that didn't have a mate when config already exists. I added that in before collecting logs while diagnosing the issue. Look forHomeserver config already exists at:
.COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1
to the complement CI workflow so when tests gets ran, you won't have to wait for a flake or an error to actually see the log output.Github CI Complement snippet output from before:
After:
And yes it's literally only two lines different. Apparently, the Github Complement suites generate their own configuration elsewhere or hide the log output of start.py if it's ran. This doesn't happen with the command-line ran script. That I tested by creating an obviously wrong worker type and feeding it through(See issue at top). The configuration was not created(which is the correct response) but because that part of configure_workers_and_start.py never gets touched(at least in these logs), it didn't produce the information that it was an incorrect worker type. I'm assuming that is something configurable with GO or Complement and is a rabbit hole for another day.
Output in
docker logs synapse
works as expected.Signed-off-by: Jason Little [email protected]