-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop podman.service: inspecting object: no such object: "top_etc" #17904
Comments
From the last few days
|
Most recent: last night, f36 root, in a CI run that included my ignore-enoent patch. |
OK, I think we have a really bad problem in the e2e tests which may have been hidden by the 3 restarts. Setting environment variables impacts all goroutines. So doing an Instead, we need to pass environment variables via @Luap99 can you double check my theory? I am still a bit slow these days. |
I am not 100% on this but I do not think this is a problem, if it were would would see much more weird things. Reading https://onsi.github.io/ginkgo/#spec-cleanup-aftereach-and-defercleanup shows a pattern similar to ours.
Therefore each node has it own set of envs and they should not collide. However if a test forgets Unsetenv() then yes we are leaking the env into following tests. |
Would it help to instrument ginkgo such that it dumps ENV on failure? Noisy, but if someone points me at a starting point I'll add it to my test PR. |
Here we go - this is why we can't enable sqlite system tests. This is a failure in my enable-sqlite PR. It's an int failure, not sys, and it's a triple-ginkgo failure, and this is what I see when sqlite is enabled globally via As I mentioned in comment 0, but this is important and needs repeating, the podman that runs in this test is not the usual e2e podman. It does not have the long boring series of Anyhow. #17954 can't merge until this is fixed. |
That is great news actually. I couldn't put the pieces together when thinking this was a flake. A consistent failure makes more sense. |
It is a flake. If it were consistent, none of the |
What the ... I didn´t take a look yet but why would it flake with containers.conf being set? That is something static. Given the tests don´t have an unexpected interference (see my theory on CONTAINERS_CONF), then it may be a bug/race in libpod? |
@edsantiago can you please try cherry-picking #17937 ontop of your PR? I have a theory that WAL mode may not write/sync immediately such that the container is kept in memory but isn't written to disk yet. That means that |
I did. Not in #17954, but in #17831. I ran that all day yesterday, with #17937, and still see the flake. You can confirm by scrolling to the top, clicking the Build ID, and clicking the Commit SHA. Note that this is a single-fail, because in that PR I disable triple-retries. |
Apologies, I feel almost like abusing you trying out different ideas, @edsantiago. To test whether tests interfere, could we try running with |
Oh, good idea. Will try that and report back. Thanks. |
I am more or less thinking out loud at the moment and hope for our collective swarm brain to find a solution. One thing that strikes me:
While the @edsantiago, can you confirm that it's always the |
New variant (I think): seeing in the
|
I think that's the EBUSY flake. I've seen it a while ago in this PR (#17904 (comment)). |
Yes, it's unlinkat/EBUSY ... but this is happening in Anyhow, this does not give me any useful information, it's just a hmmmm. |
While debugging containers#17904 we found the test to be missing the common podman flags. Add them to the podman invocations and remove some clutter. Signed-off-by: Valentin Rothberg <[email protected]>
With #18056 merged, let's close this issue. |
In setup, write a containers.conf.d file with db_backend as specified in .cirrus.yml. This is actually much scarier and more achy-breaky than merely "sqlite system tests": it enables sqlite in e2e tests. ("But wait, we already do that!" -- no, not really. sqlite in e2e is being done via --db-backend option, and some podman commands in e2e do not use the standard options. See containers#17904. This is unlikely to get merged any time soon (March, maybe even April) because sqlite is still too fragile; this will trigger more flakes than are currently acceptable. Also, the nasty auto-update flake seems to trigger much more reliably with sqlite. We need that one fixed. Signed-off-by: Ed Santiago <[email protected]>
Happens very, very often when using sqlite:
Important note: this is an "Execing" invocation of podman, completely different from the normal e2e test "Running" invocation. It does not have the usual e2e
--this --that
options. sqlite is selected via containers.conf in this setup.The text was updated successfully, but these errors were encountered: