Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not reset storage when running inside of a container #9240

Merged
merged 1 commit into from
Feb 16, 2021

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented Feb 5, 2021

Currently if the host shares container storage with a container
running podman, the podman inside of the container resets the
storage on the host. This can cause issues on the host, as
well as causes the podman command running the container, to
fail to unmount /dev/shm.

podman run -ti --rm --privileged -v /var/lib/containers:/var/lib/containers quay.io/podman/stable podman run alpine echo hello
* unlinkat /var/lib/containers/storage/overlay-containers/a7f3c9deb0656f8de1d107e7ddff2d3c3c279c11c1635f233a0bffb16051fb2c/userdata/shm: device or resource busy
* unlinkat /var/lib/containers/storage/overlay-containers/a7f3c9deb0656f8de1d107e7ddff2d3c3c279c11c1635f233a0bffb16051fb2c/userdata/shm: device or resource busy

Since podman is volume mounting in the graphroot, it will add a flag to
/run/.containerenv to tell podman inside of container whether to reset storage or not.

Since the inner podman is running inside of the container, no reason to assume this is a fresh reboot, so if "container" environment variable is set then skip
reset of storage.

Also added tests to make sure /run/.containerenv is runnig correctly.

Fixes: #9191

Signed-off-by: Daniel J Walsh [email protected]

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 5, 2021
@rhatdan
Copy link
Member Author

rhatdan commented Feb 5, 2021

Setting [NO TESTS NEEDED] since,I do not know of an easy way to test this. @edsantiago Any ideas. We need a configured podman inside of a container to test.

@edsantiago
Copy link
Member

@rhatdan one of the CI steps is integration tests in container (see the aqua box near top center). Would it be possible to do something like:

  • Create a $TMPDIR/fake-var-lib-containers
    • inside there, create a file that non-container podman would normally delete
  • run podman -v $TMPDIR/fake-var-lib-containers:/var/lib/containers run ....
  • Then the test forks: if running with $container=SET, confirm that the file is untouched; otherwise, confirm that file is removed

doRefresh = true
} else {
return errors.Wrapf(err, "error reading runtime status file %s", runtimeAliveFile)
if os.Getenv("container") == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're just trading one set of issues for another here - if you actually are the first Podman call after a reboot, things are going to be massively messed up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first podman would have been the outer shell which would have done the reset. I think the other cases are unlikely. Podman running within Docker, CRI-O Buildah as the first time.

@rhatdan rhatdan force-pushed the reset branch 5 times, most recently from 98bdedd to 9bcb012 Compare February 13, 2021 12:09
@rhatdan
Copy link
Member Author

rhatdan commented Feb 15, 2021

@mheon
Copy link
Member

mheon commented Feb 15, 2021

I'm still opposed on the grounds that this is potentially very harmful. If the user runs Podman in an environment where the CONTAINER environment variable is set and no refresh is performed, we can end up with deadlocks (refresh includes recreating the SHM locks and reacquiring lock ownership - without that any newly-created containers or locks will re-use existing lock IDs).

@rhatdan
Copy link
Member Author

rhatdan commented Feb 15, 2021

But this is preventing a far more likely case which is actually a real world occurance where customers hit this by mounting their /var/lib/containers into a container, and it screwed up their database. The use case you are worried about is difficult for me to imagine. Where as we have blogs describing that users should do this to share their images into privileged containers.

@rhatdan rhatdan changed the title [NO TESTS NEEDED] Do not reset storage when running inside of a container Do not reset storage when running inside of a container Feb 16, 2021
Currently if the host shares container storage with a container
running podman, the podman inside of the container resets the
storage on the host. This can cause issues on the host, as
well as causes the podman command running the container, to
fail to unmount /dev/shm.

podman run -ti --rm --privileged -v /var/lib/containers:/var/lib/containers quay.io/podman/stable podman run alpine echo hello
	* unlinkat /var/lib/containers/storage/overlay-containers/a7f3c9deb0656f8de1d107e7ddff2d3c3c279c11c1635f233a0bffb16051fb2c/userdata/shm: device or resource busy
	* unlinkat /var/lib/containers/storage/overlay-containers/a7f3c9deb0656f8de1d107e7ddff2d3c3c279c11c1635f233a0bffb16051fb2c/userdata/shm: device or resource busy

Since podman is volume mounting in the graphroot, it will add a flag to
/run/.containerenv to tell podman inside of container whether to reset storage or not.

Since the inner podman is running inside of the container, no reason to assume this is a fresh reboot, so if "container" environment variable is set then skip
reset of storage.

Also added tests to make sure /run/.containerenv is runnig correctly.

Fixes: containers#9191

Signed-off-by: Daniel J Walsh <[email protected]>
@mheon
Copy link
Member

mheon commented Feb 16, 2021

New version LGTM

@TomSweeneyRedHat
Copy link
Member

As @mheon is hip with the changes,
LGTM

@rhatdan rhatdan added the lgtm Indicates that a PR is ready to be merged. label Feb 16, 2021
@openshift-merge-robot openshift-merge-robot merged commit 5004212 into containers:master Feb 16, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Open mounts all over the place on userdata/shm
6 participants