-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New buildah/timeout/external container flakes #18631
Comments
Flakes being seen in the wild, outside of the treadmill. Next guess: something to do with the |
Everything points to something broken in containers/storage. |
All the 13808 failures below are the buildah treadmill PR. All the 17831 failures below are from my hammer-CI PR, during a period this weekend when I had cherry-picked #18634, a test PR that included a new containers/storage. That is why I believe the problem is in containers/storage. Today I have reverted that PR from my hammer-CI PR, and am no longer seeing those buildah failures.
|
Uh-oh. this might be the same bug, *without( the new containers/storage. It's really hard to tell, because nobody is submitting PRs the last few weeks.
|
That ( It seems to me that the test is buggy — it claims to test concurrent removal, but it tests concurrent (build + removal); and if an image is removed while another is being built from the same layers, I can imagine that failing (a build doesn’t hold a race-free “reference” to a layer that would protect it against concurrent removals). Actually, well, the lack of a race-free “reference” is arguably a bug, but it’s not the bug reportedly being tested by the test; #9266 only intended to make removes not fail against concurrent removals. |
Thank you @mtrmac. I just panicked without looking closely. |
@edsantiago I do think that build&rm one is worth tracking and fixing. |
Filed #18659 for my concurrent |
Aside from containers/buildah#4813, I think containers/storage#1571 is causing podman (with a containers/storage that incorporates it) and the system default version of buildah (built with a containers/storage that didn't incorporate it) to default to different storage drivers, so they see different sets of layers and images. There are multiple options for getting them to agree. Set |
@mtrmac I was testing it with |
Regarding following tests
I think
|
I'm having trouble following the thread, and I don't have a way to play with debian. But I just think it's important to point out that system tests do not (and must not) muck around with setting |
A friendly reminder that this issue had no activity for 30 days. |
@edsantiago update on this issue? |
I think this is similar what we saw in treadmill and was recently fixed by doing a workaround for debian. But I think @edsantiago can confirm this. |
I do not understand the status of this bug. As best I can tell, #18822 changed the CI setup something something overlay/vfs/debian. Since then I have only run the buildah treadmill 2-3 times, because there have been very few buildah PRs. I have not seen these errors. To me that means the bug has been swept under the carpet, not fixed. Opinions differ on this: some may believe the issue is fixed; some may believe that it's a difficult situation with no real good fix. I will step out of this discussion for now. If someone truly believes this is fixed, feel free to close this issue. If at all possible, please include a comment defending your actions. |
My very imprecise impression, just skimming the conversations and not checking the details myself, is that Nalin diagnosed this in #18631 (comment) , and that @flouthoc fixed the c/storage behavior in containers/storage#1618 / containers/storage#1637 . Those c/storage changes are included in Podman since e5399aa . @flouthoc PTAL and correct me if I’m wrong. |
@mtrmac Yes, regarding #18822 we had a discussion in a watercooler that it is hard to ensure this compat for older version between |
#13808 is passing again. Closing. |
New weird flakes seen in buildah treadmill. I'm filing all under one issue, but they might be unrelated:
(ISTR something changing in buildah signal handling recently)
And:
And:
All of these are buildah-related. I haven't seen them in any other PRs, but this has been the lightest PR week in memory. If we don't see these in regular podman CI, it might be something in the new buildah vendoring.
The text was updated successfully, but these errors were encountered: