checkpoint tests time out under $CONTAINER #15015

edsantiago · 2022-07-21T11:41:29Z

All checkpoint-related tests are failing under containerized environment in CI (note: that is not a colorized/hyperlinked log. It is impossible to read without my greasemonkey extension).

Command timed out after 90s.   (basically, every test that uses podman checkpoint/restore)

The text was updated successfully, but these errors were encountered:

@Luap99

...and enable the at-test-time confirmation, the one that double-checks that if CI requests runc we actually use runc. This exposed a nasty surprise in our setup: there are steps to define $OCI_RUNTIME, but that's actually a total fakeout! OCI_RUNTIME is used only in e2e tests, it has no effect whatsoever on actual podman itself as invoked via command line such as in system tests. Solution: use containers.conf Given how fragile all this runtime stuff is, I've also added new tests (e2e and system) that will check $CI_DESIRED_RUNTIME. Image source: containers/automation_images#146 Since we haven't actually been testing with runc, we need to fix a few tests: - handle an error-message change (make it work in both crun and runc) - skip one system test, "survive service stop", that doesn't work with runc and I don't think we care. ...and skip a bunch, filing issues for each: - containers#15013 pod create --share-parent - containers#15014 timeout in dd - containers#15015 checkpoint tests time out under $CONTAINER - containers#15017 networking timeout with registry - containers#15018 restore --pod gripes about missing --pod - containers#15025 run --uidmap broken - containers#15027 pod inspect cgrouppath broken - ...and a bunch more ("podman pause") that probably don't even merit filing an issue. Also, use /dev/urandom in one test (was: /dev/random) because the test is timing out and /dev/urandom does not block. (But the test is still timing out anyway, even with this change) Also, as part of the VM switch we are now using go 1.18 (up from 1.17) and this broke the gitlab tests. Thanks to @Luap99 for a quick fix. Also, slight tweak to containers#15021: include the timeout value, and reword message so command string is at end. Also, fixed a misspelling in a test name. Fixes: containers#14833 Signed-off-by: Ed Santiago <[email protected]>

vrothberg · 2022-07-26T12:43:59Z

@edsantiago did the tests pass at some point before?

edsantiago · 2022-07-26T12:51:33Z

Yes, they all worked fine prior to #14972 . That is the recent PR that did the VM switcheroo in CI. Here's an example of a PR that ran before 14972. Search in-page for " checkpoint" (space-checkpoint, to eliminate other checkpoint strings from podman info).

14972 is a huge monster, so it's impossible to know what changed, but the likely culprit is something different in the f36 image. Maybe criu, maybe kernel, I really can't begin to guess.

vrothberg · 2022-07-26T13:02:24Z

Thanks, @edsantiago !

rst0git · 2022-08-03T08:44:26Z

@edsantiago Is this problem with checkpoint/restore tests still present?

edsantiago · 2022-08-03T11:22:32Z

@rst0git I assume so. The tests are completely disabled, so there's no way to find out except to reenable them. Since the VM images are unchanged, I don't think that would give us any information we don't already have.

github-actions · 2022-09-03T00:07:00Z

A friendly reminder that this issue had no activity for 30 days.

github-actions · 2022-10-06T00:12:01Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2023-07-29T11:18:47Z

@edsantiago should we reenable the test, to see if a miracle happened?

edsantiago · 2023-07-31T12:57:11Z

I hate relying on miracles... but both f37 and f38 had a successful CI run in my hammer-sqlite PR, with questionable but valid timings (42:18 and 37:13 respectively, compared to 37:23 / 33:03 on a PR in main. Five minutes seems a little concerning, but I guess checkpointing is expensive?)

I verified that tests ran by grepping for 'checkpoint' in the summary lines and eyeballing the results, and by comparing N tests skipped in the bottom summary against another PR without the skips removed. It's about 35 tests that were being skipped and no longer are. This is consistent with grep -wc It test/e2e/checkpoint_{,image_}test.go.

Real PR now in the works.

And lo, a miracle occurred. Containerized checkpoint tests are no longer hanging. Reenable them. (Followup miracle: tests are still passing, after a year of not running!) Closes: containers#15015 Signed-off-by: Ed Santiago <[email protected]>

edsantiago added the kind/bug Categorizes issue or PR as related to a bug. label Jul 21, 2022

edsantiago mentioned this issue Jul 21, 2022

Bump VMs, to Ubuntu 2204 with cgroups v1 #14972

Merged

github-actions bot added the stale-issue label Sep 3, 2022

rhatdan removed the stale-issue label Sep 5, 2022

github-actions bot added the stale-issue label Oct 6, 2022

rhatdan removed the stale-issue label Jul 29, 2023

edsantiago mentioned this issue Jul 31, 2023

CI: e2e: reenable containerized checkpoint tests #19449

Merged

openshift-merge-robot closed this as completed in #19449 Jul 31, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 30, 2023

github-actions bot locked as resolved and limited conversation to collaborators Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkpoint tests time out under $CONTAINER #15015

checkpoint tests time out under $CONTAINER #15015

edsantiago commented Jul 21, 2022

vrothberg commented Jul 26, 2022

edsantiago commented Jul 26, 2022

vrothberg commented Jul 26, 2022

rst0git commented Aug 3, 2022

edsantiago commented Aug 3, 2022

github-actions bot commented Sep 3, 2022

github-actions bot commented Oct 6, 2022

rhatdan commented Jul 29, 2023

edsantiago commented Jul 31, 2023

checkpoint tests time out under $CONTAINER #15015

checkpoint tests time out under $CONTAINER #15015

Comments

edsantiago commented Jul 21, 2022

vrothberg commented Jul 26, 2022

edsantiago commented Jul 26, 2022

vrothberg commented Jul 26, 2022

rst0git commented Aug 3, 2022

edsantiago commented Aug 3, 2022

github-actions bot commented Sep 3, 2022

github-actions bot commented Oct 6, 2022

rhatdan commented Jul 29, 2023

edsantiago commented Jul 31, 2023