Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly tests did not succeed on fedora-38/podman-next: testHealthcheckUser timeout #1443

Closed
cockpituous opened this issue Oct 11, 2023 · 3 comments · Fixed by #1447
Closed
Labels

Comments

@cockpituous
Copy link
Contributor

cockpituous commented Oct 11, 2023

Tests failed on 008ac87

log

@martinpitt martinpitt changed the title Nightly tests did not succeed on fedora-38/podman-next Nightly tests did not succeed on fedora-38/podman-next: testHealthcheckUser timeout Oct 11, 2023
@martinpitt
Copy link
Member

I've seen the same failure in containers/podman#20322 (comment) (earlier version, not current). This could be a new race condition, and needs investigation.

@martinpitt
Copy link
Member

@martinpitt
Copy link
Member

I tried to reproduce this locally, with running

for i in `seq 5`; do test/check-application TestApplication.testHealthcheckUser $RUNC || break; done

(and ...System in $RUNC2 in parallel). The loop is stable against current F38.

Then

dnf -y copr enable rhcontainerbot/podman-next
dnf update --repo='copr*' -y

This updates podman-5:4.7.0-1.fc38.x86_64 to podman-102:4.8.0~dev-1.20231011135052610412.main.2121.d437ca8fd.fc38.x86_64 , and of course a lot of related packages (netavark, crun, etc.), but I believe healthchecks belong into podman. This makes both *User and *System fail reliably, but later than where it failed on CI:

  File "/var/home/martin/upstream/cockpit-podman/test/check-application", line 2419, in testHealthcheckUser
    self._testHealthcheck(False)
  File "/var/home/martin/upstream/cockpit-podman/test/check-application", line 2347, in _testHealthcheck
    b.wait_visible(".ct-listing-panel-body tbody tr:nth-child(2)")
[...]
wait_js_cond(ph_is_present(".ct-listing-panel-body tbody tr:nth-child(2)")): Uncaught (in promise) Error: condition did not become true

Indeed clicking "Run health check" now doesn't do anything, neither does podman healthcheck run healthy. I have to reload the page to make them appear.

One important difference is that on current F38, I get an additional exec_died event when a healthcheck finishes:

2023-10-11 15:40:26.5563557 +0000 UTC container exec_died e781541fc1204729e2b36d2c5fabc21beb6d00e05d8f89ef953b81cead9fc8db (image=localhost/test-busybox:latest, name=healthy)
2023-10-11 15:40:26.568280289 +0000 UTC container health_status e781541fc1204729e2b36d2c5fabc21beb6d00e05d8f89ef953b81cead9fc8db (image=localhost/test-busybox:latest, name=healthy, health_status=healthy)

while with podman-next, I just get:

2023-10-11 15:42:08.858067536 +0000 UTC container health_status 1507d14a25b3c9d424d6dba58f9c27072ebf48f4682ed4ab0f382ba23341f3ff (image=localhost/test-busybox:latest, name=healthy, health_status=healthy)

Indeed we ignore the health_status event, that was a workaround for containers/podman#19237 which got fixed recently in podman.

When I fix/relax the workaround, it seems to work fine.

martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Oct 11, 2023
We previously didn't react to `health_status` events as they were
broken, and only reacted to `exec_died` instead. With the upcoming
podman release, `health_status` events are reliable, and they will also
not be accompanied by an `exec_died` event any more. So start updating
the container status on them.

Still keep listening to `exec_died` to support older podman releases.

Fixes cockpit-project#1443
martinpitt added a commit to martinpitt/cockpit-podman that referenced this issue Oct 12, 2023
We previously didn't react to `health_status` events as they were
broken, and only reacted to `exec_died` instead. With the upcoming
podman release, `health_status` events are reliable, and they will also
not be accompanied by an `exec_died` event any more. So start updating
the container status on them.

Still keep listening to `exec_died` to support older podman releases.

Fixes cockpit-project#1443
jelly pushed a commit that referenced this issue Oct 12, 2023
We previously didn't react to `health_status` events as they were
broken, and only reacted to `exec_died` instead. With the upcoming
podman release, `health_status` events are reliable, and they will also
not be accompanied by an `exec_died` event any more. So start updating
the container status on them.

Still keep listening to `exec_died` to support older podman releases.

Fixes #1443
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants