-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pasta udp tests: new bytecheck helper #24238
pasta udp tests: new bytecheck helper #24238
Conversation
c7fb508
to
5fb4046
Compare
Cockpit tests failed for commit 5fb40466bedc4c76f3ad0705b3c49da26109cc0b. @martinpitt, @jelly, @mvollmer please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hmm, testCheckpointRestore started to fail rather often on Fedora 41 a few weeks ago. It's definitively a flake (most podman PRs are fine, and retries work), and we don't see this on our CI weather report at all. The screenshot shows the error message, but it's not very detailed -- calling
That file isn't captured in test artifacts, though. I upgraded a local F41 VM with all packages from podman-next COPR and ran the test 50 times successfully. So maybe the testing farm environment is different, but I also never saw that failure in our cockpit-podman PRs (i.e. with distro packages instead of podman-next). For now I make a PR that will attach that dump.log as an artifact in case of failures. Do you have any other idea about how to debug this? |
See also #24230. There's a new version of criu out there, and it's broken. |
See cockpit-project/cockpit-podman#1878 for the "collect criu dump.log" bit. |
5fb4046
to
f3daa63
Compare
Ephemeral COPR build failed. @containers/packit-build please check. |
f3daa63
to
a647991
Compare
Cockpit tests failed for commit a64799191ade12605f9c81af5cc5c3a80e816026. @martinpitt, @jelly, @mvollmer please check. |
@containers/podman-maintainers PTAL. Merging this will make it much much easier to test zstd. The pasta helper is less important now that @sbrivio-rh has a reproducer for #24147, but it's still a nice-to-have for tomorrow's bugs. |
test/NEW-IMAGES
Outdated
@@ -12,3 +12,4 @@ | |||
# | |||
# Format is one FQIN per line. Enumerate them below: | |||
# | |||
quay.io/libpod/testimage:20241011 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we get this into the CI image first? This will cause quay.io flakes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It already is in the latest CI images! (The zstd ones. Getting podman to work with zstd is just a SMOP).
Oh all right. I'll build new images with only that change...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I know it is annoying but I think we have established that we cannot pull from quay.io reliably enough for podman CI usage. It is not if but when it will flake, if I would know that the zstd image is ready quickly then I might be fine to carry this but given the amount of bugs I doubt it will happen fast enough.
A-ha!
@edsantiago That looks distinctly different from #24230. Also, this is a flake, it only fails in like 30% of the cases. At first this makes no sense, though: either iptables is installed or not, why wouldn't that be a persistent failure? Or is the iptables stuff not actually fatal and would always happen, and the real error is something else and hidden? |
Ah, or are you saying it's persistently broken in this PR only, due to bumping the test image? (I am not sure now if I saw that failure in other PRs) |
@martinpitt I'm sorry, I'm a bit lost. Are you suggesting that this PR is somehow breaking cockpit? That seems unlikely to me. My mail filters autodelete all |
@edsantiago I don't have conclusions yet. I'm saying that there is a bug in runc or criu that sometimes breaks Above you mentioned #24230 as a possible issue for a broken criu, but that looks distinctly different. That's why I mentioned you specifically. |
This PR doesn't change any podman code only our own tests so there is no way this breaks cockpit test unless you somehow depend our our tests. I clicked rerun to see if this is indeed just a flake. |
Passed now, thanks @Luap99 for retrying! I can reproduce the failure if I remove the |
...for debugging containers#24147, because "md5sum mismatch" is not the best way to troubleshoot bytestream differences. socat is run on the container, so this requires building a new testimage (20241011). Bump to new CI VMs[1] which include it. [1] containers/automation_images#389 Signed-off-by: Ed Santiago <[email protected]>
I'm assuming this was buildah#5595: the COMMENT field moved around. Deal with it, and add a few more checks while we're at it. Signed-off-by: Ed Santiago <[email protected]>
a647991
to
fe96c84
Compare
Done. The VM bump is IMO low risk: no significant changes since yesterday's VM bump (#24270) |
Same criu checkpoint failure just happened in #24300, so it's confirmed independent. This is fallout from containers-common-extra dropping its iptables dependency and netavark now depending on nftables. This uncovered a missing iptables dependency from criu, which I reported to https://bugzilla.redhat.com/show_bug.cgi?id=2319310 (it has already affected RHEL 10 for a while. I added a workaround in cockpit-project/cockpit-podman#1883 which will hopefully put out this particular piece of noise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: edsantiago, giuseppe, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
...for debugging #24147, because "md5sum mismatch" is not
the best way to troubleshoot bytestream differences.
socat is run on the container, so this requires building a
new testimage (20241011). Bump to new VMs which include it:
containers/automation_images#389
This new image breaks APIv2 tests, almost certainly due
to containers/buildah#5595 . Fix
those tests, and add a few new ones while we're at it.
Signed-off-by: Ed Santiago [email protected]