Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass Systemd LISTEN_* environment to the container #10448

Closed
wants to merge 1 commit into from

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented May 24, 2021

If a container is running within a systemd service and it is socket
activated, we need to leak the LISTEN_* environment variables into the
container.

Fixes: #10443

Signed-off-by: Daniel J Walsh [email protected]

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 24, 2021
@rhatdan
Copy link
Member Author

rhatdan commented May 24, 2021

@eriksjolund I would love to work with you on this. I think this PR is needed to move this forward.

Basically this PR passes the environment variables into the container so PID1 of the container can use them. We force the LISTEN_PID=1 since systemd will originally set it to the Podman PID.

I am hoping that conmon and crun/runc handle down the open file descriptors correctly. If you could build a simple test script to test this, I could finish it up.

@rhatdan
Copy link
Member Author

rhatdan commented May 24, 2021

@giuseppe PTAL

@rhatdan
Copy link
Member Author

rhatdan commented May 24, 2021

@haircommander @giuseppe I believe we have to update the --preservefds as well when executing conmon.

@eriksjolund
Copy link
Contributor

@rhatdan Sure, I'll create a simple test script. I could try it out during the weekend.

@rhatdan rhatdan force-pushed the systemd branch 4 times, most recently from ba439bc to b479da3 Compare May 27, 2021 18:56
libpod/oci_conmon_linux.go Outdated Show resolved Hide resolved
// If the podman service is setup as socket activated, we don't want any containers
// to inherit these settings from the service.
for _, lEnv := range []string{"LISTEN_PID", "LISTEN_FDS", "LISTEN_FDNAMES"} {
os.Unsetenv(lEnv)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are also unset in the coreos/go-systemd library when the service is started. See

that is called during NewServer...() before Serve()

Copy link
Member Author

@rhatdan rhatdan May 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except I don't see activation.Files(true) ever being called.

 grep -r 'activation\.Files(' . 
./vendor/github.com/docker/go-plugins-helpers/sdk/unix_listener_systemd.go:	listenFds := activation.Files(false)
./libpod/oci_conmon_linux.go:			fds := activation.Files(false)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From activation.Listeners() which is called in NewServer...()

If a container is running within a systemd service and it is socket
activated, we need to leak the LISTEN_* environment variables into the
container.

Fixes: containers#10443

Signed-off-by: Daniel J Walsh <[email protected]>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 30, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan rhatdan changed the title [WIP] Pass Systemd LISTEN_* environment to the container Pass Systemd LISTEN_* environment to the container Jun 30, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 30, 2021
@rhatdan
Copy link
Member Author

rhatdan commented Jun 30, 2021

@eriksjolund PTAL

@rhatdan
Copy link
Member Author

rhatdan commented Jun 30, 2021

@giuseppe @mheon @vrothberg @edsantiago PTAL, Anyone have a simple test to see if this works?

@eriksjolund
Copy link
Contributor

Just some thoughts. (Disclaimer: sending this quite late in the evening so it might not be crystal clear).

crun takes the LISTEN_FDS information from the environment

grep -r LISTEN_FDS .
./src/create.c:  if (getenv ("LISTEN_FDS"))
./src/create.c:    crun_context.preserve_fds += strtoll (getenv ("LISTEN_FDS"), NULL, 10);
./src/run.c:  if (getenv ("LISTEN_FDS"))
./src/run.c:    crun_context.preserve_fds += strtoll (getenv ("LISTEN_FDS"), NULL, 10);
./src/exec.c:  if (getenv ("LISTEN_FDS"))
./src/exec.c:    crun_context.preserve_fds += strtoll (getenv ("LISTEN_FDS"), NULL, 10);

That looks good.

I think we need to keep the preserve_fds information and LISTEN_FDS information apart in the Podman and Conmon code.

In the source code for Podman and Conmon the term preserve_fds could for instance be "reserved" to always mean the user-provided argument --preserve-fds from podman run and podman exec. It would be a constant value that would not change.

One thing to think about

Quote from: https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html

#define SD_LISTEN_FDS_START 3

In other words. The first LISTEN_FDS file descriptor needs to come directly after stdin, stdout, stderr.

Comment on lines +1064 to +1066
if uint(listenFds) > ctr.config.PreserveFDs {
ctr.config.PreserveFDs = uint(listenFds)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

		if uint(listenFds) > ctr.config.PreserveFDs {
			ctr.config.PreserveFDs = uint(listenFds)
		}

seems to be
(pseudo code)

ctr.config.PreserveFDs := max(listenFds, ctr.config.PreserveFDs)

I don't understand this code. Why should it be a max value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just guessing here, that if the user specified --preserve_fds=1 and LISTEN_FDS=2, then we need to leak at least 2 FDs into the container. But I am not really sure what should be the behaviour. Perhaps we should ignore the LISTEN_FDS and force the user to tell us how many FDS to leak.

Copy link
Contributor

@eriksjolund eriksjolund Jul 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first sight runc and crun are doing the same:
See the runc example:

 LISTEN_PID=$pid_of_runc LISTEN_FDS=3 runc run --preserve-fds 5 <container>

from https://github.com/opencontainers/runc/blob/master/docs/terminals.md

The first 3 (3, 4, and 5) were passed due to LISTEN_FDS and the other 5 (6, 7, 8, 9, and 10)
were passed due to --preserve-fds


Quote: basically, it can just leave the FDs and $LISTEN_FDS untouched
https://systemd.io/CONTAINER_INTERFACE/

I guess runc and crun could have ignored LISTEN_FDS and just make it the responsibility of
the caller to specify a --preserve-fds argument that includes the LISTEN_FDS.

But that is not how it is done right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should do nothing with them except leak them into the OCI Runtimes. Will crun and runc rename the LISTEN_PID to pid1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. crun and runc preserve the FDS but only runc sets LISTEN_PID=1 (Cc: @giuseppe @flouthoc).

@rhatdan
Copy link
Member Author

rhatdan commented Jul 1, 2021

I don't want to add another option about leaked FDs into conmon, since conmon would not know which one to take priority with.
@giuseppe @vrothberg @haircommander WDYT?

@eriksjolund
Copy link
Contributor

@rhatdan Regarding the test scripts, I gave it a try but I haven't finished it yet.
I wanted to do a simple server that echoed the input back.

Here is some code that might be useful if anyone is trying to do the same

$ egrep -r 'echo.sock|5555' . | grep service
./oneshot-test.service:ExecStart=bash -c "echo abc | socat tcp:localhost:5555 - > %t/reply.txt"
./echo.service:#ExecStart=socat TCP4-LISTEN:5555,fork EXEC:/bin/cat
./echo.service:ExecStart=socat UNIX-LISTEN:%t/echo.sock,fork EXEC:/bin/cat
./socket-activated-echo.service: #ExecStart=/usr/lib/systemd/systemd-socket-proxyd localhost:5555
./socket-activated-echo.service: ExecStart=/usr/lib/systemd/systemd-socket-proxyd %t/echo.sock

I don't remember what was not working. (Adding podman to the commands would be the next step).

@rhatdan
Copy link
Member Author

rhatdan commented Jul 12, 2021

@eriksjolund any progress?

@eriksjolund
Copy link
Contributor

@rhatdan No, I have been busy with other things.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

@vrothberg
Copy link
Member

Closing, @rhatdan aksed me to take over.

@vrothberg vrothberg closed this Aug 24, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Socket activation works with TCP socket but not with Unix Domain socket
4 participants