-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flakey/broken detection of criu version #18856
Comments
Likely a underlying error in the rpc protocol between podman and criu, I think the first step is to return the original error and not consider an error as minimum version not matched. |
There is weird issue containers#18856 which causes the version check to fail. Return the underlying error in these cases so we can see it and debug it. Signed-off-by: Paul Holzinger <[email protected]>
Here we go! f38 remote:
|
A friendly reminder that this issue had no activity for 30 days. |
A friendly reminder that this issue had no activity for 30 days. |
I assume we never saw it again and close it |
|
:( |
Today in system tests, rawhide remote:
|
In the podman CI we are seeing a weird flake during criu version detection[1]. The write to the socket just fails with broken pipe. The logical thing to assume here is that the child exited. However the current code never reports back the child error from wait nor does it try to capture the output from it. This fixes both. The cleanup error is now added to the returned error so the caller sees both. As errors.Join is used from the std lib bump the minumum go version to 1.20. [1] containers/podman#18856 Signed-off-by: Paul Holzinger <[email protected]>
In the podman CI we are seeing a weird flake during criu version detection[1]. The write to the socket just fails with broken pipe. The logical thing to assume here is that the child exited. However the current code never reports back the child error from wait nor does it try to capture the output from it. This fixes both. The cleanup error is now added to the returned error so the caller sees both. As errors.Join is used from the std lib bump the minimum go version to 1.20. [1] containers/podman#18856 Signed-off-by: Paul Holzinger <[email protected]>
In the podman CI we are seeing a weird flake during criu version detection[1]. The write to the socket just fails with broken pipe. The logical thing to assume here is that the child exited. However the current code never reports back the child error from wait nor does it try to capture the output from it. This fixes both. The cleanup error is now added to the returned error so the caller sees both. As errors.Join is used from the std lib bump the minimum go version to 1.20. [1] containers/podman#18856 Signed-off-by: Paul Holzinger <[email protected]> Signed-off-by: Radostin Stoyanov <[email protected]>
In the podman CI we are seeing a weird flake during criu version detection[1]. The write to the socket just fails with broken pipe. The logical thing to assume here is that the child exited. However the current code never reports back the child error from wait nor does it try to capture the output from it. This fixes both. The cleanup error is now added to the returned error so the caller sees both. As errors.Join is used from the std lib bump the minimum go version to 1.20. [1] containers/podman#18856 Signed-off-by: Paul Holzinger <[email protected]> Signed-off-by: Radostin Stoyanov <[email protected]>
In the podman CI we are seeing a weird flake during criu version detection[1]. The write to the socket just fails with broken pipe. The logical thing to assume here is that the child exited. However the current code never reports back the child error from wait. The cleanup error is now added to the returned error so the caller sees both. The output is not captured as this causes hangs when the fds are passed into child processes. As errors.Join is used from the std lib bump the minimum go version to 1.20. [1] containers/podman#18856 Signed-off-by: Paul Holzinger <[email protected]>
In the podman CI we are seeing a weird flake during criu version detection[1]. The write to the socket just fails with broken pipe. The logical thing to assume here is that the child exited. However the current code never reports back the child error from wait. The cleanup error is now added to the returned error so the caller sees both. The output is not captured as this causes hangs when the fds are passed into child processes. As errors.Join is used from the std lib bump the minimum go version to 1.20. [1] containers/podman#18856 Signed-off-by: Paul Holzinger <[email protected]>
As far as flakes go, this is a pretty sweet & mellow one, infrequent, easily recognized & categorized. one more, f40 |
There is no new version yet but we like to use the new code[1] to debug a flake[2] in the podman CI. It will not fix it but the new error might give us a better idea what is going on. [1] checkpoint-restore/go-criu#175 [2] containers#18856 Signed-off-by: Paul Holzinger <[email protected]>
There is no new version yet but we like to use the new code[1] to debug a flake[2] in the podman CI. It will not fix it but the new error might give us a better idea what is going on. [1] checkpoint-restore/go-criu#175 [2] containers#18856 Signed-off-by: Paul Holzinger <[email protected]>
Not much more helpful I fear Getting the actual stderr from criu would be useful but seems to be impossible to implement correctly |
Bizarre new flake:
All other checkpoint/restore tests pass on that same task, so obviously this is a false error.
Only one instance in my logs: f37 remote
First step, I would suggest, might be to instrument
libpod/container_internal_common.go:checkpointRestoreSupported()
so it emits the criu version it thinks it's finding.The text was updated successfully, but these errors were encountered: