CI: debian remote startup is very slow #19010

edsantiago · 2023-06-27T14:39:31Z

[Split from #15272 ]

Podman-remote e2e tests on debian are very slow.

Take for example these results: f38 and debian (usage cgroupns test chosen pseudorandomly, no special reason). 2.760s on f38, 4.650 on debian.

Almost every part of the Ginkgo task is slower on debian. Looking at the Exit lines on each, we see:

	f38	debian
Exit [BeforeEach] TOP-LEVEL	435ms	1.026s
Exit [BeforeEach] Verify podman containers.conf usage	312ms	861ms
Exit [It] cgroupns	1.651s	2.476s
Exit [AfterEach] TOP-LEVEL	360ms	285ms
Total	2.760s	4.650s

I downloaded the raw logs, ran a little perl/find/math on them, and time spent in BeforeEach on debian is 2122s, compared to 908 on f38. That's pretty huge. (Time spent in It and in AfterEach is comparable).

I think there's something broken in podman-remote startup on debian. I think it's taking too long. Can someone with access to a debian VM PTAL?

The text was updated successfully, but these errors were encountered:

Luap99 · 2023-06-27T16:33:51Z

Do we have a good matrix somewhere between what is different in these setups?
runc vs crun
cgroupv1 vs v2
netavark vs cni
etc...

That might help to nail it down.
Also if I remember there was a go issue with ubutntu version a while back that @baude found which causes all tests run to be slower.

edsantiago · 2023-06-27T17:01:18Z

I don't know of any matrix. But... why would only remote be slow?

I wonder... could it be the podman info that's run as part of every remote test? podman info is super slow on Gentoo, and I once tracked it down to the package-version code. Could someone with access to a debian VM please try starting a server and running time podman info?

edsantiago · 2023-06-27T18:43:43Z

Followup: podman info is not the culprit. I changed it to podman version, and no difference (52 minutes).

edsantiago · 2023-06-27T23:45:43Z

Might be apparmor. I instrumented my PR so the server runs with trace (don't try this at home, kids!) and although the log timestamp is only one-second resolution, there is a very strong pattern of the form:

...
time=".....:41Z" level=debug msg="..."
time=".....:42Z" level=debug msg="Found apparmor_parser binary in /sbin/apparmor_parser"
time=".....:42Z" level=debug msg="..."

(sometimes the +1 is on the line after apparmor_parser instead.)

I think I'm going to try rm -f /sbin/apparmor_parser as part of debian VM setup. But not tonight.

edsantiago · 2023-06-28T10:57:41Z

Nope, no difference. Giving up on this for now. This is going to require someone with access to a debian VM.

Luap99 · 2023-07-10T12:36:24Z

I tried it with hack/get_ci_vm.sh and yes debian is incredible slow for some reason.

# time bin/podman-remote version
...
real    0m0.966s
user    0m0.025s
sys     0m0.021s

while local

# time bin/podman version
...

real    0m0.043s
user    0m0.014s
sys     0m0.033s

However for info it is slow for both:

# time bin/podman info
...
real    0m0.755s
user    0m0.420s
sys     0m0.357s
# time bin/podman-remote info
...
real    0m0.718s
user    0m0.024s
sys     0m0.023s

And no idea how podman-remote version manages to be slower than info.

Luap99 · 2023-07-10T13:18:51Z

Ok it is likely to the version querying.
The reaosn version is even slower is because the remote endpoint first querys the normal info, than redundantly querys the conmon/oci runtime version again which is could juts read of the info.

edsantiago · 2023-07-10T13:23:03Z

but........ but podman version does not show those versions? podman info queries the runtime, and conmon, and slirp4netns, and pasta, and everything, and I would expect podman info to be very slow. podman version only shows compiled-in strings, and should not need to exec or do anything expensive? I am obviously missing something very basic.

Luap99 · 2023-07-10T13:25:11Z

see pkg/api/handlers/compat/version.go

There is no libpod only endpoint so it reuses whatever the docker version endpoint produces.

Do not use podman info/version as they are expensive and clutter the log for no reason. Just checking if we can connect to the socket should be good enough and much faster. Fix the non existing error checking, so that we actually see an useful error when this does not work. Also change the interval, why wait 2s for a retry lets take 100ms steps instead. Fixes containers#19010 Signed-off-by: Paul Holzinger <[email protected]>

edsantiago added the flakes Flakes from Continuous Integration label Jun 27, 2023

edsantiago mentioned this issue Jul 10, 2023

e2e: default to "podman version" in is-server-up tests #19154

Closed

Luap99 self-assigned this Jul 10, 2023

Luap99 mentioned this issue Jul 10, 2023

test/e2e: wait for socket #19182

Merged

openshift-merge-robot closed this as completed in #19182 Jul 10, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 9, 2023

github-actions bot locked as resolved and limited conversation to collaborators Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: debian remote startup is very slow #19010

CI: debian remote startup is very slow #19010

edsantiago commented Jun 27, 2023 •

edited

Loading

Luap99 commented Jun 27, 2023

edsantiago commented Jun 27, 2023

edsantiago commented Jun 27, 2023

edsantiago commented Jun 27, 2023

edsantiago commented Jun 28, 2023

Luap99 commented Jul 10, 2023

Luap99 commented Jul 10, 2023

edsantiago commented Jul 10, 2023

Luap99 commented Jul 10, 2023

CI: debian remote startup is very slow #19010

CI: debian remote startup is very slow #19010

Comments

edsantiago commented Jun 27, 2023 • edited Loading

Luap99 commented Jun 27, 2023

edsantiago commented Jun 27, 2023

edsantiago commented Jun 27, 2023

edsantiago commented Jun 27, 2023

edsantiago commented Jun 28, 2023

Luap99 commented Jul 10, 2023

Luap99 commented Jul 10, 2023

edsantiago commented Jul 10, 2023

Luap99 commented Jul 10, 2023

edsantiago commented Jun 27, 2023 •

edited

Loading