Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman pod create --share-parent vs --share=cgroup #12930

Merged
merged 1 commit into from
Feb 4, 2022

Conversation

cdoern
Copy link
Contributor

@cdoern cdoern commented Jan 19, 2022

separated cgroupNS sharing from setting the pod as the cgroup parent,
made a new flag --share-parent which sets the pod as the cgroup parent for all
containers entering the pod

resolves #12765

Signed-off-by: cdoern [email protected]

@TomSweeneyRedHat
Copy link
Member

All kinds of test redness @cdoern

@cdoern
Copy link
Contributor Author

cdoern commented Jan 20, 2022

the failures seem to be coming from the mixture of --share=cgroup and --share-parent should these be mutually exclusive @mheon? that would mean we remove cgroup from the defaultkernelnamespaces (as it basically was before since we were just sharing a parent)

@cdoern cdoern force-pushed the podCgroup branch 2 times, most recently from 649a394 to 679d95b Compare January 21, 2022 15:14
@cdoern
Copy link
Contributor Author

cdoern commented Jan 24, 2022

@mheon @giuseppe any idea what this failure on the rootless tests means: Error: OCI runtime error: runc: container_linux.go:380: starting container process caused: process_linux.go:402: getting the final child's pid from pipe caused: EOF

@mheon
Copy link
Member

mheon commented Jan 24, 2022

That's Conmon failing - potentially segfaulting? We'd need to look in syslog to figure out the details, that's the only place it logs to (and then, only with --log-level=debug active)

@cdoern cdoern force-pushed the podCgroup branch 3 times, most recently from 7495ef1 to 98114ce Compare January 26, 2022 17:53
@cdoern
Copy link
Contributor Author

cdoern commented Jan 27, 2022

So the issue is that we can't read from the conmon pidfile, it is returning -1 rather than the conmon pid I am unsure why as I cannot reproduce this locally, could it be the testing suite @mheon ?

@rhatdan
Copy link
Member

rhatdan commented Jan 27, 2022

@cdoern Is this something we need to get into podman 4.0, or can it wait for podman 4.1?

@cdoern
Copy link
Contributor Author

cdoern commented Jan 27, 2022

@rhatdan I think @mheon said this should be 4.0 since --share currently does not work as expected

@rhatdan rhatdan added the 4.0 label Jan 27, 2022
@mheon
Copy link
Member

mheon commented Jan 27, 2022

Ideally 4.0 since this could theoretically be viewed as a breaking change.

@cdoern
Copy link
Contributor Author

cdoern commented Jan 28, 2022

@giuseppe @mheon I have been racking my brain trying to figure out why this is blowing up. My only thought left is this scenario of sharing the cgroupNS in a pod (since it has never actually happened) breaks something unnecessarily within conmon. Either that or within podman libpod/container_validate.go we need some other stipulation about pid namespaces and cgroup namespaces if coming from a pod.

@rhatdan
Copy link
Member

rhatdan commented Jan 28, 2022

@giuseppe PTAL

@cdoern cdoern force-pushed the podCgroup branch 2 times, most recently from d470cbd to f6eedc1 Compare February 2, 2022 03:53
@cdoern
Copy link
Contributor Author

cdoern commented Feb 2, 2022

I am putting a bunch of debugs on where I think this is failing to try and figure out why, this is not reproducible for me, @giuseppe does this happen on your machine?

@cdoern
Copy link
Contributor Author

cdoern commented Feb 2, 2022

@giuseppe I think this portion is the issue but I am unfamiliar with how these pipes work... https://github.com/containers/podman/blob/f6eedc1fe86090a9e388e1ac8e962dcb4e09d4eb/libpod/oci_conmon_linux.go#L1585-L1598 could this all be because of user.slice as the cgroup parent?

@giuseppe
Copy link
Member

giuseppe commented Feb 2, 2022

I am putting a bunch of debugs on where I think this is failing to try and figure out why, this is not reproducible for me, @giuseppe does this happen on your machine?

please drop the debugging statements and the "--syslog=true" so we can see where it is failing in the CI.

If I run locally, it fails here:

		inspect := podmanTest.InspectContainer(ctrCreate.OutputToString())
		Expect(data.CgroupPath).To(HaveLen(0))
		Expect(inspect[0].HostConfig.CgroupParent).ToNot(Equal(data.CgroupPath))

Both inspect[0].HostConfig.CgroupParent and data.CgroupPath are empty strings.

Since the second check expects data.CgroupPath to be an empty string, could we rewrite for clarity the second one to not depend on data.CgroupPath? Is it expected to not have len 0?

@cdoern
Copy link
Contributor Author

cdoern commented Feb 2, 2022

@giuseppe I am talking about this: https://storage.googleapis.com/cirrus-ci-6707778565701632-fcae48/artifacts/containers/podman/5437340771418112/html/int-podman-fedora-34-rootless-host.log.html#t--podman-pod-create-share-parent-test--3 failure. The syslogs are in because all I was seeing before was could not get pid from pipe, it is returning -1. The failure you are getting fails locally for me now too but I think that is an easy fix, this one that only happens in the testing site is stumping me.

@giuseppe
Copy link
Member

giuseppe commented Feb 2, 2022

@giuseppe I am talking about this: https://storage.googleapis.com/cirrus-ci-6707778565701632-fcae48/artifacts/containers/podman/5437340771418112/html/int-podman-fedora-34-rootless-host.log.html#t--podman-pod-create-share-parent-test--3 failure. The syslogs are in because all I was seeing before was could not get pid from pipe, it is returning -1. The failure you are getting fails locally for me now too but I think that is an easy fix, this one that only happens in the testing site is stumping me.

I think fedora-34 is configured with cgroup v1, and rootless cannot use cgroups with cgroup v1. I think you just need to skip the test when it is rootless on cgroupv1 (SkipIfRootlessCgroupsV1(...))

@cdoern
Copy link
Contributor Author

cdoern commented Feb 2, 2022

ok thanks @giuseppe that makes sense, I will fix the other test now and remove all of the debugs.

@cdoern cdoern force-pushed the podCgroup branch 2 times, most recently from 9602945 to a46a8b9 Compare February 3, 2022 04:44
separated cgroupNS sharing from setting the pod as the cgroup parent,
made a new flag --share-parent which sets the pod as the cgroup parent for all
containers entering the pod

remove cgroup from the default kernel namespaces since we want the same default behavior as before which is just the cgroup parent.

resolves containers#12765

Signed-off-by: cdoern <[email protected]>
Signed-off-by: cdoern <[email protected]>
Signed-off-by: cdoern <[email protected]>
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2022
@rhatdan
Copy link
Member

rhatdan commented Feb 4, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 4, 2022
@openshift-merge-robot openshift-merge-robot merged commit 956664f into containers:main Feb 4, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cgroup is not displayed as shared namespace in pod inspection
6 participants