Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch podman image scp from depending on machinectl to just os/exec #12867

Merged
merged 1 commit into from
Jan 27, 2022

Conversation

cdoern
Copy link
Contributor

@cdoern cdoern commented Jan 14, 2022

switch podman image scp from depending on machinectl to just os/exec

machinectl does not propogate error messages and adds extra lines in the output, exec.Cmd is able to clear the env besides PATH and TERM, and use the given UID and GID to execute the command properly.

machinectl is still used to create a user session. Ubuntu support is limited by this.

Signed-off-by: cdoern [email protected]

@cdoern cdoern changed the title [NO TESTS NEEDED] switch podman image scp from depending on machinectl to systemd-run [NO NEW TESTS NEEDED] switch podman image scp from depending on machinectl to systemd-run Jan 14, 2022
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work because systemd-run does not create the required login session.

I think you have to use both, first run machinectl ... sleep inf to make sure that the login session is created (just start the process), then run your systemd-run command, when that is done kill the machinectl process.

@cdoern
Copy link
Contributor Author

cdoern commented Jan 14, 2022

@Luap99 think I got all of the necessary changes in there, just made a utils function to start the machinectl sleep.

@eriksjolund
Copy link
Contributor

This does not work because systemd-run does not create the required login session.

I think you have to use both, first run machinectl ... sleep inf to make sure that the login session is created (just start the process), then run your systemd-run command, when that is done kill the machinectl process.

I wonder if machinectl ... sleep inf is really needed.
It seems the command
systemd-run --machine=test11@ --quiet --user --collect --pipe --wait ...
starts a login session.

[root@laptop ~]# useradd test11
[root@laptop ~]# systemd-run --machine=test11@ --quiet --user --collect --pipe --wait loginctl
SESSION  UID USER     SEAT  TTY
      2 1000 esjolund seat0 tty2
     32 1019 test11         

2 sessions listed.
[root@laptop ~]# systemd-run --machine=test11@ --quiet --user --collect --pipe --wait systemctl --user status
● laptop
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Sat 2022-01-15 08:10:23 CET; 131ms ago
   CGroup: /user.slice/user-1019.slice/[email protected]
           ├─session.slice 
           │ └─dbus-broker.service 
           │   ├─6371 /usr/bin/dbus-broker-launch --scope user
           │   └─6374 dbus-broker --log 4 --controller 9 --machine-id c341780e88a04800982a41eea47639d3 --max-bytes 100000000000000 --max-fds 25000000000000 --max-matches 5000000000
           ├─app.slice 
           │ └─run-u0.service 
           │   ├─6376 systemctl --user status
           │   └─6378 less
           └─init.scope 
             ├─6346 /usr/lib/systemd/systemd --user
             └─6350 (sd-pam)
[root@laptop ~]# cat /etc/fedora-release 
Fedora release 35 (Thirty Five)
[root@laptop ~]# 

@cdoern
Copy link
Contributor Author

cdoern commented Jan 17, 2022

This does not work because systemd-run does not create the required login session.
I think you have to use both, first run machinectl ... sleep inf to make sure that the login session is created (just start the process), then run your systemd-run command, when that is done kill the machinectl process.

I wonder if machinectl ... sleep inf is really needed. It seems the command systemd-run --machine=test11@ --quiet --user --collect --pipe --wait ... starts a login session.

[root@laptop ~]# useradd test11
[root@laptop ~]# systemd-run --machine=test11@ --quiet --user --collect --pipe --wait loginctl
SESSION  UID USER     SEAT  TTY
      2 1000 esjolund seat0 tty2
     32 1019 test11         

2 sessions listed.
[root@laptop ~]# systemd-run --machine=test11@ --quiet --user --collect --pipe --wait systemctl --user status
● laptop
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Sat 2022-01-15 08:10:23 CET; 131ms ago
   CGroup: /user.slice/user-1019.slice/[email protected]
           ├─session.slice 
           │ └─dbus-broker.service 
           │   ├─6371 /usr/bin/dbus-broker-launch --scope user
           │   └─6374 dbus-broker --log 4 --controller 9 --machine-id c341780e88a04800982a41eea47639d3 --max-bytes 100000000000000 --max-fds 25000000000000 --max-matches 5000000000
           ├─app.slice 
           │ └─run-u0.service 
           │   ├─6376 systemctl --user status
           │   └─6378 less
           └─init.scope 
             ├─6346 /usr/lib/systemd/systemd --user
             └─6350 (sd-pam)
[root@laptop ~]# cat /etc/fedora-release 
Fedora release 35 (Thirty Five)
[root@laptop ~]# 

not sure if executing on a container is what we are looking for here, this functionality is similar to that of machinectl and I am unsure if this would propagate errors the way we want to.

--machine operates much differently than --uid if I am not mistaken, @Luap99 @edsantiago WDYT?

@edsantiago
Copy link
Member

I'm old and crotchety, and I really, really do not want to learn about systemd-run and all this newfangled crap. Can someone ELI5 why sudo or su -l -c "%s" USER aren't enough? (preferably documenting this in the code itself, so noone else has to ask again)

@Luap99
Copy link
Member

Luap99 commented Jan 17, 2022

I think --machine is what we want because it seems to create the login session. I just was never able to make it work on my test vm. You should still get the correct exit code according to the man page.

@Luap99
Copy link
Member

Luap99 commented Jan 17, 2022

I'm old and crotchety, and I really, really do not want to learn about systemd-run and all this newfangled crap. Can someone ELI5 why sudo or su -l -c "%s" USER aren't enough? (preferably documenting this in the code itself, so noone else has to ask again)

Because both sudo and su do not create a login session they just change the uid/gid. Podman needs the systemd session to function properly. The only way su/sudo will work when you set enable-linger for that user set.

@edsantiago
Copy link
Member

The only way su/sudo will work when you set enable-linger for that user set.

But podman is completely unusable without linger, so in what possible situation would anyone use podman image scp on a user that doesn't have the magic linger?

@Luap99
Copy link
Member

Luap99 commented Jan 17, 2022

You do not need linger!!!! If you login in you systemd as a user systemd will create the user session for you and you can use podman just fine. If you logout the session is destroyed so all containers are stopped.
Linger is only needed when you want to run containers at boot or after logout.

@cdoern
Copy link
Contributor Author

cdoern commented Jan 17, 2022

also @Luap99 I am having the same issue as you, --machine does not work on my machine and reports: Failed to get machine PTY: No machine 'charliedoern@' same fails with [email protected]

even when using --collect --pipe:

Failed to connect to bus: $DBUS_SESSION_BUS_ADDRESS and $XDG_RUNTIME_DIR not defined (consider using --machine=<user>@.host --user to connect to bus of other user)

those env variables are in fact set too...

@Luap99
Copy link
Member

Luap99 commented Jan 17, 2022

@cdoern On which os did you test this? I tested on f34 and it failed but I just tried with f35 and @eriksjolund commands worked there.

@cdoern
Copy link
Contributor Author

cdoern commented Jan 17, 2022

@Luap99 ubuntu and f34

@eriksjolund
Copy link
Contributor

I tested a few examples on Fedora 35 with a brand new user. (linger is off). It seems to work.

[root@laptop ~]# cat /etc/fedora-release 
Fedora release 35 (Thirty Five)
[root@laptop ~]# useradd test26
[root@laptop ~]# systemd-run --machine=test26@ --quiet --user --collect --pipe --wait podman run --rm docker.io/library/alpine echo hello
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob 59bf1c3509f3 done  
Copying config c059bfaa84 done  
Writing manifest to image destination
Storing signatures
hello
[root@laptop ~]# systemd-run --machine=test26@ --quiet --user --collect --pipe --wait /bin/false ; echo $?
1
[root@laptop ~]# systemd-run --machine=test26@ --quiet --user --collect --pipe --wait /bin/true ; echo $?
0
[root@laptop ~]# systemd-run --machine=test26@ --quiet --user --collect --pipe --wait ls -l /non-existent ; echo $?
ls: cannot access '/non-existent': No such file or directory
2
[root@laptop ~]# systemd-run --machine=test26@ --quiet --user --collect --pipe --wait /bin/bash -c "exit 34" ; echo $?
34
[root@laptop ~]# 

Relevant PR
systemd/systemd#17967
hence systemd v248

@cdoern
Copy link
Contributor Author

cdoern commented Jan 17, 2022

@eriksjolund I think we are going to stick with the current approach as a fix for the moment, but I am going to look into if there is a better way to start the login session. Your approach is preferable but does not work on f34 or ubuntu.

@cdoern cdoern force-pushed the scp branch 2 times, most recently from 9f4ed80 to e6d0958 Compare January 17, 2022 16:49
@edsantiago
Copy link
Member

It does not seem to be working when run as root (on my f35 laptop):

$ sudo bin/podman image scp foo.bar/nonesuch/c_zr5sx1le5n:mytag esm@localhost::
Running as unit: run-re976e3d02b7c4edea1205642220f85b8.service
Running as unit: run-r04f0570cc1cd4e778ed6368f137c8032.service
$ bin/podman images
[ does not show foo.bar/nonesuch/anything]

Note also that the expected output of image scp is 'Copying blob...Writing manifest...Storing signatures...' but none of that appears, only Running as unit. Could you test this as root?

@cdoern
Copy link
Contributor Author

cdoern commented Jan 17, 2022

sorry @edsantiago left an extra argument in execSystemd works now as expected.

@edsantiago
Copy link
Member

Now I get:

$ sudo bin/podman image scp foo.bar/nonesuch/c_zr5sx1le5n:mytag esm@localhost::
Error: exit status 208

Adding --log-level=debug yields:

DEBU[0000] Executing via systemd: "/usr/bin/sudo /usr/bin/systemd-run --pty -q --wait --uid 0 /home/esm/src/atomic/2018-02.podman/libpod/bin/podman --log-level=debug save --output /var/tmp/podman997723794 foo.bar/nonesuch/c_zr5sx1le5n:mytag"

@cdoern
Copy link
Contributor Author

cdoern commented Jan 17, 2022

@edsantiago this seems to be an selinux issue, https://bugzilla.redhat.com/show_bug.cgi?id=1559409 not sure how to get around this one...

@Luap99
Copy link
Member

Luap99 commented Jan 17, 2022

/usr/bin/sudo /usr/bin/systemd-run --pty -q --wait --uid 0 /home/esm/src/atomic/2018-02.podman/libpod/bin/podman --log-level=debug save --output /var/tmp/podman997723794 foo.bar/nonesuch/c_zr5sx1le5n:mytag The command looks wrong should't it be --uid <uid from esm> instead of --uid 0 which is pointless since it should load the image as user.

@Luap99
Copy link
Member

Luap99 commented Jan 17, 2022

Also why do you use sudo when you are already root

@cdoern
Copy link
Contributor Author

cdoern commented Jan 17, 2022

@Luap99 in the example @edsantiago is giving the save is --uid 0 and the load is --uid <edsantiago>

issue here is exit code 208 from systemd-run this works on ubuntu meaning the syntax is correct, the incorrect part is the selinux permssions on f35

@cdoern
Copy link
Contributor Author

cdoern commented Jan 17, 2022

--collect --pipe works on fedora rather than --pty, let me try that

SeLinux blocks the executing of the podman binary....... via exit code 203

@cdoern cdoern marked this pull request as ready for review January 20, 2022 03:51
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 20, 2022
@cdoern cdoern removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 20, 2022
@cdoern
Copy link
Contributor Author

cdoern commented Jan 24, 2022

@edsantiago @Luap99 thoughts on this? I am not sure if runuser creates a login session so I left the machinectl login as is.

@edsantiago
Copy link
Member

I'm sorry, I can't help. "User session" is not a UNIX concept, it's a systemd concept, and one that I've never been able to understand. @giuseppe might know.

@rhatdan
Copy link
Member

rhatdan commented Jan 24, 2022

runuser will NOT create a user session it is just setting the UID and perhaps running the bash init scripts.
User Sessions require pam_systemd to run, and runuser specifically does not execute the PAM stack.

@cdoern
Copy link
Contributor Author

cdoern commented Jan 24, 2022

Ok, that makes sense @rhatdan I think this is all set then @containers/podman-maintainers PTAL

pkg/domain/infra/abi/images.go Outdated Show resolved Hide resolved
Comment on lines 867 to 875
verb := runUser
args := []string{"-l", execUser.Username, "-c"}
cmd := utils.CreateSCPCommand(exec.Command(verb, args...), []string{strings.Join(command, " ")})
Copy link
Member

@giuseppe giuseppe Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely untested, but wouldn't something like the following:

	cmd := exec.Command(command[0], command[1:]...)
	cmd.Env = os.Environ()
	cmd.Stderr = os.Stderr
	cmd.Stdout = os.Stdout
	uid, err := strconv.ParseInt(execUser.Uid, 10, 32)
	if err != nil {
		return err
	}
	gid, err := strconv.ParseInt(execUser.Gid, 10, 32)
	if err != nil {
		return err
	}
	cmd.SysProcAttr = &syscall.SysProcAttr{
		Credential: &syscall.Credential{
			Uid:         uint32(uid),
			Gid:         uint32(gid),
			Groups:      nil,
			NoSetGroups: false,
		},
	}

run the command with the specified UID/GID without requiring runuser?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this leaks the rootful environment variables such as XDG_RUNTIME_DIR and thus make the rootless podman fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't that leak already in CreateSCPCommand?

AFAICS runuser doesn't clean up the environment:

$ sudo XDG_RUNTIME_DIR=wrong runuser -u gscrivano printenv XDG_RUNTIME_DIR
wrong

In any case, you are right, and I think we need to drop the Environ block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should block the environment.

Copy link
Contributor Author

@cdoern cdoern Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean the environment is leaked? Doesnt -l clear all env variables except for those specified? @Luap99 clears all the environment variables except for TERM and variables specified by --whitelist-environment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay, sorry I was looking at this on mobile. From what i have tried over the iterations, directly calling a n exec.Cmd via a different user using the sysprocattr basically never works and always leaves something behind from the previous session.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you specify some Env not nil? From the doc:

	// If Env is nil, the new process uses the current process's
	// environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm maybe I should try to keep $TERM like runuser does and see if that works?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would set $TERM and $PATH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to work, I am getting a storage error regarding the graph driver: ERRO[0000] User-selected graph driver "vfs" overwritten by graph driver "overlay" from database - delete libpod local files to resolve but the command finishes successfully. This could be my machine so I will push here and let the tests run

@cdoern cdoern force-pushed the scp branch 2 times, most recently from 3d7c8ab to 74d4186 Compare January 25, 2022 15:29
@cdoern
Copy link
Contributor Author

cdoern commented Jan 25, 2022

I think it might be possible to consolidate transferRootless and transferRootful now but on my first try it failed due to the uid/gid aspect of transferRootful, I think after this merges I'll open a separate PR to work on that.

@cdoern
Copy link
Contributor Author

cdoern commented Jan 26, 2022

@giuseppe any idea why this is failing? This error has happened to me before in my initial iterations of scp, do I need to UID/GID map as well in the SysProcAttr?

this works on f34 and f35 for me as is when using runc as the runtime

@cdoern cdoern changed the title switch podman image scp from depending on machinectl to runuser switch podman image scp from depending on machinectl to just os/exec Jan 26, 2022
@giuseppe
Copy link
Member

@giuseppe any idea why this is failing? This error has happened to me before in my initial iterations of scp, do I need to UID/GID map as well in the SysProcAttr?

no we don't neet to specify the map, it has simply to run with the specified UID/GID

@giuseppe
Copy link
Member

the tests are failing because it doesn't find the newuidmap executable. Please propagate also PATH in addition to TERM

@cdoern cdoern force-pushed the scp branch 2 times, most recently from 95b7619 to 30271bc Compare January 26, 2022 16:33
@cdoern
Copy link
Contributor Author

cdoern commented Jan 26, 2022

I initially tried to do this with PATH as well and got the same error, could be because I was not doing PATH=... I will try that

machinectl does not propogate error messages and adds extra lines in the output, exec.Cmd is able to clear the env besides PATH and TERM,
and use the given UID and GID to execute the command properly.

machinectl is still used to create a user session. Ubuntu support is limited by this.

Signed-off-by: cdoern <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Jan 27, 2022

/approve
/lgtm

I am going to merge, since I believe this is an improvement over what we currently have, and we should get it into podman 4.0.
@cdoern continue working on improving it. and fixing other bugs in podman image scp.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 27, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2022
@openshift-merge-robot openshift-merge-robot merged commit 5659b07 into containers:main Jan 27, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants