Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system test flake -- tcp connection test #16916

Closed
cdoern opened this issue Dec 21, 2022 · 4 comments · Fixed by #16929
Closed

system test flake -- tcp connection test #16916

cdoern opened this issue Dec 21, 2022 · 4 comments · Fixed by #16929
Assignees
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@cdoern
Copy link
Contributor

cdoern commented Dec 21, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I thought this test was failing on my SSH PR due to my changes but now I am seeing it in 2 PRs of very different styles.

The podman system connection tcp test flakes:
https://github.com/containers/podman/pull/16601/checks?check_run_id=10234988773
https://cirrus-ci.com/task/4577298783928320

the output is

Cannot connect to Podman. Please verify your connection to the Linux system using podman system connection list, or try podman machine init and podman machine start to manage a new Linux VM Error: unable to connect to Podman socket: Get "http://d/v4.4.0/libpod/_ping": dial tcp [::1]:63371: connect: connection refused

or something like this with different ports. Could be something that changed recently in system connection add.

I have only seen it in aarch system tests and fedora 37 root system tests.

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 21, 2022
@cdoern cdoern self-assigned this Dec 21, 2022
@cdoern cdoern added the flakes Flakes from Continuous Integration label Dec 21, 2022
@vrothberg
Copy link
Member

@cdoern, I suspect that wait_for_port is not enough to make sure the service is running (see https://github.com/containers/podman/blob/main/test/system/272-system-connection.bats#L115-L117).

Waiting for the port to be available does not imply that the system service is ready to serve, so I think we have a race condition in the test. Instead, I suggest to do an exponential back off waiting for podman info to succeed.

WDYT?

@rhatdan
Copy link
Member

rhatdan commented Dec 22, 2022

I think the service supports sd_notify, we could also attempt to implement something with that and have a read on leaked socket. But I guess that would hang.

@vrothberg
Copy link
Member

I think the service supports sd_notify, we could also attempt to implement something with that and have a read on leaked socket. But I guess that would hang.

Right. It must run in parallel, so there's always some race waiting for it to be ready.

@vrothberg
Copy link
Member

I opened #16929 to address the flake as it is hitting hard the moment.

vrothberg added a commit to vrothberg/libpod that referenced this issue Jan 2, 2023
The test was only waiting for the port to be ready but that doesn't
imply the server being ready to serve requests.  Hence, add a loop
waiting for the `info` call to succeed.

Fixes: containers#16916
Signed-off-by: Valentin Rothberg <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 6, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants