Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rootless: use a pause process to keep namespaces alive #3084

Merged

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented May 8, 2019

use a pause process to keep the user and mount namespace alive.

The pause process is created immediately on reload, and all successive Podman processes will refer to it for joining the user&mount namespace.

This solves all the race conditions we had on joining the correct namespaces using the conmon processes.

As a fallback if the join fails for any reason (e.g. the pause process was killed), then we try to join the running containers as we were doing before.

This PR also adds an alternative way of joining the namespaces. We can now attempt to join them as soon as Podman starts, before the Go runtime kicks in, so that we don't need to re-exec and use just one process 🎉

This version doesn't try to recover and re-create a pause process if for any reason it terminated, but prints an error. This is not trivial to recover from, as there can be more podman processes already running and will require some sort of locking.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 8, 2019
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L labels May 8, 2019
@giuseppe giuseppe force-pushed the rootless-pause-process branch 8 times, most recently from 0353ec2 to 3d4f40f Compare May 8, 2019 20:28
@giuseppe giuseppe force-pushed the rootless-pause-process branch 3 times, most recently from 20f2b54 to 9c8ffc5 Compare May 9, 2019 09:24
@giuseppe giuseppe changed the title [WIP] rootless: introduce a pause process to keep namespaces alive rootless: introduce a pause process to keep namespaces alive May 9, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 9, 2019
@giuseppe giuseppe changed the title rootless: introduce a pause process to keep namespaces alive rootless: use a pause process to keep namespaces alive May 9, 2019
@giuseppe giuseppe force-pushed the rootless-pause-process branch 3 times, most recently from 3766e4f to e80d342 Compare May 9, 2019 13:41
@giuseppe giuseppe changed the title rootless: use a pause process to keep namespaces alive [WIP] rootless: use a pause process to keep namespaces alive May 9, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 9, 2019
@giuseppe giuseppe force-pushed the rootless-pause-process branch 3 times, most recently from 74fe761 to 7d7f7cb Compare May 9, 2019 19:19
@giuseppe giuseppe force-pushed the rootless-pause-process branch 2 times, most recently from dfa45eb to a8f585c Compare May 10, 2019 08:17
@giuseppe giuseppe changed the title [WIP] rootless: use a pause process to keep namespaces alive rootless: use a pause process to keep namespaces alive May 10, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 10, 2019
@giuseppe giuseppe force-pushed the rootless-pause-process branch from a8f585c to e3a2152 Compare May 13, 2019 08:48
@giuseppe
Copy link
Member Author

tests are passing here, @mheon if there are no objections, I'd like to get this into the next Podman release so that we have some time to play with it before 8.1

@mheon
Copy link
Member

mheon commented May 13, 2019

I'm reluctant to get this into 1.3.1, which is supposed to be a small bugfix release... Maybe we cut a 1.4 very soon (before end of month) to get this out quickly?

@@ -113,6 +114,35 @@ func setupRootless(cmd *cobra.Command, args []string) error {
MainGlobalOpts,
remoteclient,
}

pausePidPath, err := util.GetRootlessPauseProcessPidPath()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we start moving things like this block of code out of cmd/podman and consolidating them someplace? Maybe pkg/rootless? I want to make it easier to integrate this into other Libpod consumers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree. The only reason I've left it here for now, is that the logic for detecting running containers and try to join their namespaces (and this is particularly important to not break rootless containers once again on an upgrade) cannot be easily moved around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at it now, but there are parts that cannot be easily moved down to the pkg/rootless package. For example we depend on util.GetRootlessRuntimeDir() and in the case of not having a pause process running, we need to retrieve a list of conmon processes.

@rh-atomic-bot
Copy link
Collaborator

☔ The latest upstream changes (presumably #3104) made this pull request unmergeable. Please resolve the merge conflicts.

giuseppe added 5 commits May 17, 2019 20:47
this leaves the containers stopped but we won't risk to use the wrong
user namespace.

Signed-off-by: Giuseppe Scrivano <[email protected]>
use a pause process to keep the user and mount namespace alive.

The pause process is created immediately on reload, and all successive
Podman processes will refer to it for joining the user&mount
namespace.

This solves all the race conditions we had on joining the correct
namespaces using the conmon processes.

As a fallback if the join fails for any reason (e.g. the pause process
was killed), then we try to join the running containers as we were
doing before.

Signed-off-by: Giuseppe Scrivano <[email protected]>
add a shortcut for joining immediately the namespace so we don't need
to re-exec Podman.

With the pause process simplificaton, we can now attempt to join the
namespaces as soon as Podman starts (and before the Go runtime kicks
in), so that we don't need to re-exec and use just one process.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the rootless-pause-process branch from e3a2152 to 53a7622 Compare May 17, 2019 18:48
@giuseppe
Copy link
Member Author

@mheon @rhatdan any other comment on this PR or could we merge it?


/* There can be another process at this point trying to configure the user namespace and the pause
process, do not override the pid file if it already exists. */
if (renameat2 (AT_FDCWD, tmp_file_path, AT_FDCWD, pause_pid_file_path, RENAME_NOREPLACE) < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this doesn't really look racy, but I think it's still worth asking: should we consider a file lock here to ensure only one process is trying to make the PID file and become the init process?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to avoid the lock using only a renameat2(RENAME_NOREPLACE). If the file already exists, then it means another process managed to create it faster as it wasn't there earlier. If we fail at this point then Podman attempts to join the pause process instead (which we assume was created by the process who won the race).

If the file exists but the process is not running, then Podman raises an error and asks the user to fix it by deleting the file. We can improve this in future, but for now I preferred to have a clear error message

@mheon
Copy link
Member

mheon commented May 21, 2019

One question, but overall I think we're pretty good to merge. LGTM

@baude
Copy link
Member

baude commented May 21, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 21, 2019
@openshift-merge-robot openshift-merge-robot merged commit 536fd6a into containers:master May 21, 2019
@rufoa
Copy link

rufoa commented May 27, 2019

@giuseppe
I'm running into a problem which I believe is related to this change.

It appears that pause.pid is not being deleted after use, so next time I run podman I get errors like this:

cannot join pause process pid 18740.  You may need to remove /run/user/111/libpod/pause.pid and stop all containers
you can use `system migrate` to recreate the pause process
open /proc/18740/ns/user: no such file or directory

note:

  • process 18740 had indeed exited (cleanly, I believe), but the pid file was left behind
  • I'm running podman rootlessly as uid 111
  • Deleting pause.pid resolves the problem temporarily, but then the same problem reoccurs next run

I'm still trying to figure out the root cause. It occurs reliably every time a particular CI job of mine runs, which executes various podman commands in quick succession (build, run, top, inspect, rm).

Do you have a hunch what might be the cause of this, so I can investigate further? Where in podman is the call which is supposed to delete pause.pid? Many thanks.

@mheon
Copy link
Member

mheon commented May 27, 2019

The pause process does not appear to be pausing - which is our problem. If it's exiting cleanly, it's not doing its job of living as long as a user's session to preserve namespaces

@giuseppe
Copy link
Member Author

the pause process should be left alive after the first execution. It is implemented to avoid some race conditions trying to join the existing namespaces.

Does it exist immediately after you run podman? What do you see if you run something like the following?

$ podman images
$ pgrep -fa pause

If you exit the current user session, the pause process might be killed by systemd. There is a fix for this issue in: #3188

In particular this patch: 30ef6ba

You can try it by yourself with loginctl enable-linger $UID and see if it makes any difference

@rufoa
Copy link

rufoa commented May 28, 2019

Thanks for the info! It appears to work correctly now.

I had to manually enable-linger for my user as root, because Debian appears not to install policykit by default, so non-root users cannot enable-linger for themselves.

@rufoa
Copy link

rufoa commented May 28, 2019

Perhaps the deb package should recommend/suggest policykit-1 as a dependency

@giuseppe
Copy link
Member Author

@lsm5 can we add that dependency?

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants