Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: debian remote startup is very slow #19010

Closed
edsantiago opened this issue Jun 27, 2023 · 9 comments · Fixed by #19182
Closed

CI: debian remote startup is very slow #19010

edsantiago opened this issue Jun 27, 2023 · 9 comments · Fixed by #19182
Assignees
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

edsantiago commented Jun 27, 2023

[Split from #15272 ]

Podman-remote e2e tests on debian are very slow.

Take for example these results: f38 and debian (usage cgroupns test chosen pseudorandomly, no special reason). 2.760s on f38, 4.650 on debian.

Almost every part of the Ginkgo task is slower on debian. Looking at the Exit lines on each, we see:

f38 debian
Exit [BeforeEach] TOP-LEVEL 435ms 1.026s
Exit [BeforeEach] Verify podman containers.conf usage 312ms 861ms
Exit [It] cgroupns 1.651s 2.476s
Exit [AfterEach] TOP-LEVEL 360ms 285ms
Total 2.760s 4.650s

I downloaded the raw logs, ran a little perl/find/math on them, and time spent in BeforeEach on debian is 2122s, compared to 908 on f38. That's pretty huge. (Time spent in It and in AfterEach is comparable).

I think there's something broken in podman-remote startup on debian. I think it's taking too long. Can someone with access to a debian VM PTAL?

@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Jun 27, 2023
@Luap99
Copy link
Member

Luap99 commented Jun 27, 2023

Do we have a good matrix somewhere between what is different in these setups?
runc vs crun
cgroupv1 vs v2
netavark vs cni
etc...

That might help to nail it down.
Also if I remember there was a go issue with ubutntu version a while back that @baude found which causes all tests run to be slower.

@edsantiago
Copy link
Member Author

I don't know of any matrix. But... why would only remote be slow?

I wonder... could it be the podman info that's run as part of every remote test? podman info is super slow on Gentoo, and I once tracked it down to the package-version code. Could someone with access to a debian VM please try starting a server and running time podman info?

@edsantiago
Copy link
Member Author

Followup: podman info is not the culprit. I changed it to podman version, and no difference (52 minutes).

@edsantiago
Copy link
Member Author

Might be apparmor. I instrumented my PR so the server runs with trace (don't try this at home, kids!) and although the log timestamp is only one-second resolution, there is a very strong pattern of the form:

...
time=".....:41Z" level=debug msg="..."
time=".....:42Z" level=debug msg="Found apparmor_parser binary in /sbin/apparmor_parser"
time=".....:42Z" level=debug msg="..."

(sometimes the +1 is on the line after apparmor_parser instead.)

I think I'm going to try rm -f /sbin/apparmor_parser as part of debian VM setup. But not tonight.

@edsantiago
Copy link
Member Author

Nope, no difference. Giving up on this for now. This is going to require someone with access to a debian VM.

@Luap99
Copy link
Member

Luap99 commented Jul 10, 2023

I tried it with hack/get_ci_vm.sh and yes debian is incredible slow for some reason.

# time bin/podman-remote version
...
real    0m0.966s
user    0m0.025s
sys     0m0.021s

while local

# time bin/podman version
...

real    0m0.043s
user    0m0.014s
sys     0m0.033s

However for info it is slow for both:

# time bin/podman info
...
real    0m0.755s
user    0m0.420s
sys     0m0.357s
# time bin/podman-remote info
...
real    0m0.718s
user    0m0.024s
sys     0m0.023s

And no idea how podman-remote version manages to be slower than info.

@Luap99
Copy link
Member

Luap99 commented Jul 10, 2023

Ok it is likely to the version querying.
The reaosn version is even slower is because the remote endpoint first querys the normal info, than redundantly querys the conmon/oci runtime version again which is could juts read of the info.

@edsantiago
Copy link
Member Author

but........ but podman version does not show those versions? podman info queries the runtime, and conmon, and slirp4netns, and pasta, and everything, and I would expect podman info to be very slow. podman version only shows compiled-in strings, and should not need to exec or do anything expensive? I am obviously missing something very basic.

@Luap99
Copy link
Member

Luap99 commented Jul 10, 2023

see pkg/api/handlers/compat/version.go

There is no libpod only endpoint so it reuses whatever the docker version endpoint produces.

@Luap99 Luap99 self-assigned this Jul 10, 2023
Luap99 added a commit to Luap99/libpod that referenced this issue Jul 10, 2023
Do not use podman info/version as they are expensive and clutter the log
for no reason. Just checking if we can connect to the socket should be
good enough and much faster.

Fix the non existing error checking, so that we actually see an useful
error when this does not work.

Also change the interval, why wait 2s for a retry lets take 100ms steps
instead.

Fixes containers#19010

Signed-off-by: Paul Holzinger <[email protected]>
ashley-cui pushed a commit to ashley-cui/podman that referenced this issue Jul 13, 2023
Do not use podman info/version as they are expensive and clutter the log
for no reason. Just checking if we can connect to the socket should be
good enough and much faster.

Fix the non existing error checking, so that we actually see an useful
error when this does not work.

Also change the interval, why wait 2s for a retry lets take 100ms steps
instead.

Fixes containers#19010

Signed-off-by: Paul Holzinger <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 9, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants