-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
many: deal with incorrect status from osbuild better #827
base: main
Are you sure you want to change the base?
Conversation
bib/pkg/progress/progress.go
Outdated
defer func() { | ||
// ensure osbuild is stopped even if we exit early | ||
if cmd.Process != nil { | ||
cmd.Process.Kill() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need a bit more care, maybe only do it in hte error case (as the happy case does a cmd.Wait already so no killing needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine as is but I would strongly recommend using PR_SET_PDEATHSIG if it's not already in use. That's a kernel-enforced way to lifecycle bind the child to the parent, such that if e.g. the parent dies due to SIGSEGV or whatever, the child gets killed too.
0f80abb
to
837d891
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
PR_SET_PDEATHSIG
feels like a nice follow-up, right, @mvo5 ?
837d891
to
43c4bdf
Compare
Yeah, I think we can look into this in a followup, while writing this I realized its actually more complicated because killing osbuild hard will probably leave a bunch of (dandling) bind mounts around. So the current approach is to send a SIGINT and hope for the best, osbuild should cleanup after itself. I'm not sure we have good options here but fortunately the error condition (bad progress reported) is rare. The other thing we could try is to make sure that the osbuild status scanner always advances, then we could also keep trying until we get valid json again (replacing the current bufio scanner essentially, however in the general case when we never get valid output again we would still appear hung for an extended period of time, but maybe a worthwhle tradeoff over erroring?) |
This commit changes the progress parser (again) to deal with errors from the osbuild json progress scanner. On errors we will now exit right away and potentially kill osbuild but provide an error message that hints how to workaround the issue. The original code assumed we get transient errors like json decode issues. However it turns out that this is not always the case, some errors from a bufio.Scanner are unrecoverable (like ErrTooBig) and trying again just leads to an endless loop. We can also not "break" and wait for the build to finish because that would appear like the progress is broken forever and we would still have to report an error (just much later).
This commit ensures that we do restrict the memory of the bib test conatiner to catch excessive memory usage. This is prompted by a memory leak when dealing with unrecoverable status messages that lead to failures in konflux.
This commit changes the centos-9 test to run with `podman -t` so that we have a test-case that uses the `terminal` progress. This is prompted by: a) we have no integration test currently that uses the terminal progress for the full build b) a konflux failure/memory leak that showed because there the test is run with `-t`
43c4bdf
to
c77149b
Compare
go.mod: upadate to latest
images
to get PR#1200This commit updates the images library to pull in the fix [0] for
the overly long messages from osbuild. This should test the
(previously) failing test that runs the centos-9 ISO with an
attached terminal.
[0] osbuild/images#1200
test: run the centos-9 test with an attached terminal
This commit changes the centos-9 test to run with
podman -t
sothat we have a test-case that uses the
terminal
progress. Thisis prompted by:
a) we have no integration test currently that uses the terminal
progress for the full build
b) a konflux failure/memory leak that showed because there the
test is run with
-t
This commit ensures that we do restrict the memory of the bib
test conatiner to catch excessive memory usage. This is prompted
by a memory leak when dealing with unrecoverable status messages
that lead to failures in konflux.
This commit changes the progress parser (again) to deal with
errors from the osbuild json progress scanner. On errors we
will now exit right away and potentially kill osbuild but
provide an error message that hints how to workaround the
issue.
The original code assumed we get transient errors like json
decode issues. However it turns out that this is not always
the case, some errors from a bufio.Scanner are unrecoverable
(like ErrTooBig) and trying again just leads to an endless
loop. We can also not "break" wait for the build to finish
because that would appear like the progress is broken
forever and we would still have to report an error (just
much later).