-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reenable remote system tests #7111
Reenable remote system tests #7111
Conversation
eb216a0
to
f27d71e
Compare
This is uncomfortable: the remote test timed out but with no actual log of how far it got:
Restarted in hopes that it's a purely-coincidental infrastructure flake. |
@cevich I need help here please. The failures look like real timeouts, but the logs are useless (incomplete). I have to assume there's a buffering problem, as if stdout/stderr aren't getting flushed properly. Does that sound familiar? Can you think of any way we can get full logs? |
@edsantiago taking a peek... |
@edsantiago this is a tricky one! The main thing that stands out to me here is the size of the onion. Maybe try peeling back a layers? For example, the wait/retry loop in a makefile is two complexities wrapped in an enigma. I know make has it's fingers into the stdio and exit-code pie, what happens w/o |
63e9478
to
d7bcb71
Compare
I think the Ubuntu issue is solved: Ubuntu I see two alarming failures in special_testing_rootless:
The first: where is that The second, well, I'm just going to disable that test for podman-remote - but what is happening here too? It's just a |
d7bcb71
to
b51cd63
Compare
Okay, I don't think podman-remote is ready for prime time. These failures are weird. Giving up for the week. |
I can't tell you the number of times this catches me...a genuine PITA for no good reason IMHO...or at least no good reason I'm aware of. I'm glad to read that you got past the odd hanging/garbage problem.
Also odd, I wonder if we're running into our old IFS bats bug again? IIRC, that was causing command-line args to not get passed through properly. In any case, it should be possible reasonably easy to run the command manually and see if it reproduces, no?
That occurred to me as well, like perhaps it's exiting just at the wrong time. Could it be due to a hiccup talking to quay.io maybe? Otherwise, given it's proximity to the first failure, I wonder if the two are somehow related. Is it possible the remote-end got wedged or crashed in some way causing both failures? |
podman-remote is in better shape now. Let's see what needs to be done to reenable remote system tests. - logs test: skip multilog, it doesn't work remote - diff test: use -l only when local, not with remote - many other tests: skip_if_remote, with 'FIXME: pending #xxxx' where xxxx is a filed issue. Unrelated: added new helper to skip_if_remote and _if_rootless, where we check if the source message includes "remote"/"rootless" and insert it if missing. This is a minor usability enhancement to make it easier to understand at-a-glance why a skip triggers. Signed-off-by: Ed Santiago <[email protected]>
b51cd63
to
a4fcf09
Compare
This is ready for review - or at least as close as I can get it. Unfortunately, I just don't think podman-remote is ready for the world: it's not reliable. This PR has seen a history of flakes, most particularly what I believe is a race condition in which the client never sees output (#7195). Until that is resolved, I think it's a bad idea to enable podman-remote tests in CI: we will get a ton of flakes. But I also think it's a bad idea to leave podman-remote untested. Comments welcome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @edsantiago! Your findings are concerning and suggest to dedicate some extra attention to podman-remote soon. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: edsantiago, rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@rhatdan I'm ambivalent. If we merge it, there is going to be a lot of developer pain because of frequent flakes and reruns. But I also don't see any incentive to fix podman-remote, so if we don't merge it I can see us going another six months without testing and with more bugs creeping into podman-remote. I lean toward merging, I just want everyone to know that it's going to be painful if we do. (Until the flake bug is fixed). |
/lgtm |
podman-remote is in better shape now. Let's see what needs
to be done to reenable remote system tests.
Signed-off-by: Ed Santiago [email protected]