-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: test overlay and vfs #20161
CI: test overlay and vfs #20161
Conversation
00ab112
to
1246120
Compare
Draft because I'm pretty sure this is going to fail due to missing settings in .cirrus.yml |
Awright, who wants to speculate on why overlay make (only) podman-remote slower? |
@edsantiago can you share a link to the timing results? |
Durrr. Sorry.
|
Are you sure they're slower and did not hang+timeout? |
Good question. No, I'm not sure, but my reasoning is that all the test logs end at about 2900s, which is ~48m, which is close to the 50m limit. And the tail of each log is different (different tests). Granted, buffering means that we probably don't get to see the very very tail ... but in non-overlay tests, we never get to 2900s, so I still think something is slowing down. If I were more talented/hardworking/capable I'd write a minitool to check timestamps on each line and look for big jumps; or to compare against a regular vfs run. Maybe I'll do so today. |
1246120
to
9903e2f
Compare
Here we go. This is a comparison of a good (standard) CI run against a bad one (this PR). Some tests run faster with overlay, but the huge majority run slower. I'm still trying to chase this down, but ITM would anyone like to look at the table below and see if they can figure out a common factor among the slow tests?
Methodology: I grabbed timing results from |
bb006e9
to
f0f3b0b
Compare
Cockpit tests failed for commit f0f3b0be736b96993c2d720ebc30ccad9a03fa32. @martinpitt, @jelly, @mvollmer please check. |
f0f3b0b
to
bb7634b
Compare
I need help with this permissions-check failure:
I cannot reproduce on 1minutetip, because on a default 1minutetip the permissions are correct:
Somehow, under Cirrus, the There's also a |
test/e2e/common_test.go
Outdated
// When using overlay, podman leaves a stray mount behind. This | ||
// leak causes remote tests to take a loooooong time, and time out. | ||
overlayPath := path + "/root/overlay" | ||
if _, err := os.Stat(overlayPath); err == nil { | ||
umount := exec.Command("umount", overlayPath) | ||
umount.Stdout = GinkgoWriter | ||
umount.Stderr = GinkgoWriter | ||
if err = umount.Run(); err != nil { | ||
GinkgoWriter.Printf("Error umounting %s: %v\n", overlayPath, err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the test hang.
We actually have a long history with this issue of leaked mounts: https://github.com/containers/buildah/blob/0d717cd0a52376c80cb527a344a1d7446b0d56e7/tests/helpers.bash#L116-L123
That dates to 2020. See containers/buildah#1991 .
Is there really no way to fix this leak?
8e93f67
to
6d2e4b6
Compare
@giuseppe PTAL |
Eyeballs welcome, but this is not ready to merge. I'm seeing a LOT of new flakes, like this new container-cp one. |
e33f221
to
3a2dea7
Compare
3a2dea7
to
e7b22f0
Compare
2bbf478
to
254b9e5
Compare
Ephemeral COPR build failed. @containers/packit-build please check. |
Cockpit tests failed for commit 254b9e59427822d4b4f040931a585329f332563b. @martinpitt, @jelly, @mvollmer please check. |
254b9e5
to
fe0b71a
Compare
Ephemeral COPR build failed. @containers/packit-build please check. |
Cockpit tests failed for commit fe0b71afabe599b17ebfbeba5fb7473d899b7ade. @martinpitt, @jelly, @mvollmer please check. |
fe0b71a
to
2df168e
Compare
Ephemeral COPR build failed. @containers/packit-build please check. |
14247ae
to
082d582
Compare
This is as ready as I can make it. Any feedback on my flake question?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I fine to merge is as. If it flakes to often we can always skip later.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: edsantiago, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
We're only testing vfs in CI. That's bad. containers#18822 tried to remedy that but that only worked on system tests, not e2e. Here we introduce CI_DESIRED_STORAGE, to be set in .cirrus.yml in the same vein as all the other CI_DESIRED_X. Since it's 2023 we default to overlay, testing vfs only in priorfedora. Fixes required: - e2e tests: - in cleanup, umount ROOT/overlay to avoid leaking mounts - system tests: - fix a few badly-written tests that assumed/hardcoded overlay - buildx test: add weird exception to device-number test - mount tests: add special case code for vfs - unprivileged test: disable one section that is N/A on vfs Signed-off-by: Ed Santiago <[email protected]>
082d582
to
a10b88c
Compare
/lgtm |
We're only testing vfs in CI. That's bad. #18822 tried to
remedy that but that only worked for system tests, not e2e.
Here we introduce CI_DESIRED_STORAGE, to be set in .cirrus.yml
in the same vein as all the other CI_DESIRED_X. Since it's 2023
we default to overlay, testing vfs only in priorfedora.
Skip the "split imagestore" test under overlay (#19748)
Signed-off-by: Ed Santiago [email protected]