Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd multi seat support #1105

Closed
totaam opened this issue Jan 29, 2016 · 31 comments
Closed

systemd multi seat support #1105

totaam opened this issue Jan 29, 2016 · 31 comments

Comments

@totaam
Copy link
Collaborator

totaam commented Jan 29, 2016

See also #1129.

The seat with vt number may be challenging since we don't have a vt number...

@totaam
Copy link
Collaborator Author

totaam commented Jun 1, 2016

r12722 (+fixes in r12756 + r12761 + r12753 for osx) adds an "xpra" pam service so we can call pam_open_session early (before daemonizing) when starting a server.
The pam_systemd module should also ensure that the directories are present for #1129.
We'll see if this is enough to prevent us from getting killed.

See also: Is linux-PAM session same as linux process session?: The short answer is no, they're different things, but processes that handle login sessions should handle both of them.
We're not a login session per-se, but as close as can be.

systemd-devel: The whole su/pkexec session debate: This way, screen will keep an "active" reference to the session and systemd-logind will not mark it as "closing". So the session that was nitiated by sshd will be kept open by "screen". Note that pam_open_session() without pam_authenticate() will not create a new session but only attach to the current session.

@totaam
Copy link
Collaborator Author

totaam commented Jun 1, 2016

Wait, as per [https://lists.freedesktop.org/archives/systemd-devel/2013-December/014996.html]: The session is still marked as "closing" but because processes still exist it never quite dies. And yes, the kill processes option (which is a nice thing to enable if possible) would indeed kill the screen.

@jonathan.underwood: How on earth are we supposed to fix this thing?
We don't want or need root, just tell logind to move the process into its own session.

@totaam
Copy link
Collaborator Author

totaam commented Jun 18, 2016

2016-06-18 17:27:19: jonathan.underwood commented


Well, I am no expert here :) But this is a somewhat hot topic at the moment. I very much think xpra is in the same boat as Screen and tmux. In case you missed it, this is a nice summary of why it's a hot topic:

http://lwn.net/Articles/689732/

The best thing xpra could do, i think, is start in a new process tree. Quite what the right mechanism for that is is unclear - I expect you don't want to do the dbus dance to talk to the systemd daemon to create a new session and control group (which would be the systemd maintainers preferred route).

Something along the lines of this comment might be one way to go:

http://lwn.net/Articles/690795/

This also makes for interesting reading:

tmux/tmux#428

ps. Sorry for the late reply and lack of packaging activity in recent weeks - have changed jobs. I should be getting back to packaging now though.

@totaam
Copy link
Collaborator Author

totaam commented Jun 18, 2016

2016-06-18 17:32:37: jonathan.underwood commented


Actually, probably the "right" way to go on systems using systemd is to use systemd-run to launch xpra:

https://www.freedesktop.org/software/systemd/man/systemd-run.html

@totaam
Copy link
Collaborator Author

totaam commented Aug 10, 2016

the "right" way to go on systems using systemd is to use systemd-run to launch xpra

Users shouldn't really need to care about this low-level plumbing, so when they issue an "xpra start", they expect it to survive their current session (be it an ssh session, or even a full desktop environment). That's especially true of ssh sessions started with "xpra start ssh:HOST --start=xterm".

So we would need to do this from "xpra start ...":

  • find out if systemd is pid1 - how?
  • optionally, find out if KillUserProcesses=yes and skip the workaround if it isn't needed?
  • call systemd-run --scope --user xpra _start $@ (and make "xpra _start" the same as before)

I tried to test this using a guest account:

  • set KillUserProcesses=yes
  • ran loginctl disable-linger
  • ssh guest@localhost
  • start xpra server
  • exit ssh

And the xpra server survived... Fedora 24 all up to date.
What am I missing?
@jonathan.underwood: see also #1129 comment:21

@totaam
Copy link
Collaborator Author

totaam commented Aug 17, 2016

2016-08-17 08:58:48: antoine uploaded file systemd-run.patch (5.0 KiB)

wrap xpra server command with systemd-run automatically

@totaam
Copy link
Collaborator Author

totaam commented Aug 17, 2016

Actually, probably the "right" way to go on systems using systemd is to use systemd-run to launch xpra:

As of r13378, we now run server commands via systemd-run:

$ xpra start --start=xterm --no-daemon --systemd-run-args="-p MemoryAccounting=true -p MemoryLimit=64M" 
using systemd-run to wrap 'start' server command
'systemd-run' '--scope' '--user' '-p' 'MemoryAccounting=true' '-p' 'MemoryLimit=64M' '/usr/bin/xpra' \
    'start' '--start=xterm' '--systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M' '--daemon=no'
Running scope as unit run-rd905fbd12caf4ec8b400030991401a14.scope.
(...)
● run-rd905fbd12caf4ec8b400030991401a14.scope - /usr/bin/xpra start --start=xterm --systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M --daemo
   Loaded: loaded
Transient: yes
  Drop-In: /run/user/1000/systemd/user/run-rd905fbd12caf4ec8b400030991401a14.scope.d
           └─50-Description.conf, 50-MemoryAccounting.conf, 50-MemoryLimit.conf
   Active: active (running) since Wed 2016-08-17 16:09:09 ICT; 51s ago
   CGroup: /user.slice/user-1000.slice/[email protected]/run-rd905fbd12caf4ec8b400030991401a14.scope
           ├─25491 /bin/python /usr/bin/xpra start --start=xterm --systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M --daemon=no
           ├─25502 /usr/libexec/Xorg -noreset -nolisten tcp +extension GLX +extension RANDR +extension RENDER -auth /run/user/1000/gdm/Xauthority -logfi
           ├─25509 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
           ├─25639 xterm
           └─25641 bash

 Aug 17 16:09:09 desktop systemd[1417]: Started /usr/bin/xpra start --start=xterm --systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M --daemon
Aug 17 16:09:09 desktop python[25491]: pam_systemd(xpra:session): pam-systemd initializing
Aug 17 16:09:09 desktop python[25491]: pam_systemd(xpra:session): Asking logind to create session: uid=1000 pid=25491 service=xpra type=x11 class=user d
Aug 17 16:09:09 desktop python[25491]: pam_systemd(xpra:session): Failed to create session: Access denied

So we end up with a cgroup for the session, but there are problems:

  • the parent scope is still wrong (and likely to get killed on user logout which will clear the user slice):
$ systemd-cgls
Control group /:
-.slice
├─init.scope
(...)
└─user.slice
  └─user-1000.slice
    ├─session-1.scope
    └─[email protected]
      (...)
      ├─run-r450625a9be2343f0bfb2034b01db64ee.scope
      │ ├─13164 /bin/python /usr/bin/xpra start --start=xterm --bind-tcp=0.0....
      │ ├─13180 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 -...
      │ ├─13306 xterm
      │ └─13308 bash
python: pam_systemd(xpra:session): Asking logind to create session: uid=1000 pid=11564 service=xpra type=x11 class=user desktop=xpra seat= vtnr=0 tty= display=#001 remote=no remote_user= remote_host=
python: pam_systemd(xpra:session): Failed to create session: Access denied

(re-tested after the r13505 pam fix for xauth data)

See also #1335

@totaam
Copy link
Collaborator Author

totaam commented Oct 9, 2016

r14062 disables pam_open for now because it causes the service (#1335) to run in a user-0 slice instead of the system slice.

@totaam
Copy link
Collaborator Author

totaam commented Oct 29, 2016

Instead of ensuring that the session survives, this seems to have the exact opposite effect (and worse - requiring a reboot to properly clear things), details in #1348.
I've tested both with KillUserProcesses=no and KillUserProcesses=yes with the same result.

xpra does get killed unceremoniously but worst of all this seems to have an effect on ssh making the next login attempt take forever. (looks similar to systemd issue 2863)

I've asked for help on the systemd-devel mailing list: PAM session hooks for independent session

Alternatively, we could expand the proxy server to start new sessions on behalf of other users. The proxy server runs as root and should have sufficient privileges to invoke logind's createsession. Downsides: we don't currently require the proxy server to be running and this may slow down session startup.

@totaam
Copy link
Collaborator Author

totaam commented Nov 16, 2016

2016-11-16 06:33:42: antoine uploaded file polkit.patch (4.6 KiB)

start polkit automatically (requires session management)

@totaam
Copy link
Collaborator Author

totaam commented Nov 22, 2016

The answer from the systemd mailing list is that we do need a suid binary to do the registration: https://lists.freedesktop.org/archives/systemd-devel/2016-November/037700.html

Too late to start messing with the suid / socket activation approaches now.

@totaam
Copy link
Collaborator Author

totaam commented May 18, 2017

2017-05-18 17:23:07: antoine commented


Some related changes:

  • r15819 allows to run even stdin / stdout / stderr are closed
  • r15882 can quieten systemd-run

r15810 added uid and gid support when running as root (added benefits: can listen to ports below 1024 without running as root or using iptables)
So theoretically we could ask the root proxy server to start sessions for us and do the pam / logind registration. (that bit seems to work?)
The permissions could be restricted using regular authentication or even SO_PEERCRED / SCM_CREDENTIALS (probably the former).
So far so good.

But then I found:

  • KillUserProcesses in logind.conf is broken in Fedora 26? xpra survives out of the box - at least for now:
$ sudo loginctl disable-linger guest
$ loginctl show-user | grep Kill
KillUserProcesses=yes
$ sudo loginctl show-user guest | grep Linger
Linger=no
$ loginctl user-status
guest (1001)
           Since: Thu 2017-05-18 22:23:32 +07; 25min ago
           State: active
        Sessions: *10
          Linger: no
            Unit: user-1001.slice
                  ├─session-10.scope
                  │ ├─3122 sshd: guest [priv]
                  │ ├─3174 sshd: guest@pts/1
                  │ ├─3196 -bash
                  │ ├─4244 loginctl user-status
                  │ └─4245 less
                  ├─session-4.scope
                  │ ├─1658 /usr/bin/python2 /usr/bin/xpra --bind-tcp=0.0.0.0:10000 start :10 --start=xterm --systemd-ru
                  │ ├─1659 /usr/libexec/Xorg-for-Xpra-:10 -noreset -novtswitch -nolisten tcp +extension GLX +extension 
                  │ ├─1686 /usr/bin/dbus-daemon --syslog-only --fork --print-pid 5 --print-address 7 --session
                  │ ├─1932 pulseaudio --start -n --daemonize=false --system=false --exit-idle-time=-1 --load=module-sus
                  │ ├─2134 /usr/libexec/gvfsd
                  │ ├─2162 xterm
                  │ └─2164 bash
                  └─[email protected]
                    └─init.scope
                      ├─3127 /usr/lib/systemd/systemd --user
                      └─3155 (sd-pam)

Despite the documentation ([https://www.freedesktop.org/software/systemd/man/logind.conf.html]) stating that: Note that setting KillUserProcesses=yes will break tools like screen(1) and tmux(1), unless they are moved out of the session scope. See example in systemd-run(1). - EDIT: seems to work on another system...

  • systemd-run --user --scope ... doesn't work with unified cgroup hierarchy: this one is a blocker because it means that we cannot start xpra instances over SSH logins! (ouch, this is still the recommended way of "Creating Transient Cgroups with systemd-run":
    [https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Resource_Management_Guide/chap-Using_Control_Groups.html])
    systemd-run either needs to be disabled by default (only on the distributions affected? could be kernel configuration related...), or changed to "auto" so we can check before trying (or even fallback after failing?)

Links:

  • Fedora 26 Changes: KillUserProcesses by default / ticket: [https://bugzilla.redhat.com/show_bug.cgi?id=1357426], the wiki page says: work with upstream authors and Fedora maintainers of programs like screen and tmux to implement the ability to automatically start them in a way that survives a user session, and if the system policy does not allow that, to warn the user.
  • #1600 F25 System Wide Change: KillUserProcesses=yes by default: I think the feature is premature considering it doesn't completely fix the problem it's intended to solve, while excessively burdening a still as yet unknown number of programs with needing systemd specific changes.

@totaam
Copy link
Collaborator Author

totaam commented May 21, 2017

2017-05-21 19:49:34: antoine commented


Socket activation has been added (partial), see #1521.
Minor improvements to the system-wide proxy server in r15899, r15897, r15894.
Preparatory work in r15901, r15902, r15903, r15904.
Merged hidden "request-start" subcommand in r15906, ie:

xpra request-start --start=xterm :100

Will connect to the system-wide proxy server and make it start this session.

[[BR]]

There are two ways of changing uid:

  • using xpra's "--uid=" and "--gid=" switches, this works but the session scope ends up belonging to root: user-0.slice / session-cNNN.scope - maybe we should use systemd-run without uid here?
  • (code commented out): using "systemd-run --uid= --gid=", but then the xpra process doesn't have the privileges required for doing the pam logind dance... on the plus side, the xpra server instance end up in a system.slice / run-$UUID.scope. (not sure we could ask the system proxy to do it on our behalf)

Issues:

  • pam registration works, but the scope is still wrong..
  • pulseaudio doesn't seem to get killed? (and sometimes also dbus, etc)
  • we pass too many arguments to the new instances: ['/usr/bin/xpra', 'start', '--csc-modules=all', '--packet-encoders=rencode, bencode, yaml', '--video-decoders=all', '--encodings=all', '--compressors=lz4, lzo, zlib', '--start=xterm', '--video-encoders=all', '--env=XPRA_PROXY_START_UUID=3f2cc30518ea4e2498cd85c68c87f3ae', '--systemd-run=no', '--uid=1000', '--gid=1000'] (most of the values are equivalent to the defaults)
  • XDG_RUNTIME_DIR problems... again (Use XDG_RUNTIME_DIR #1129), ie: error: XDG_RUNTIME_DIR not set in the environment.
  • restarting the system proxy should not kill the sessions it has started!
  • the Xorg process ends up in the wrong scope! (it doesn't with the "systemd-run" option above..)
  • maybe the proxy instance process should also be in the new session scope (and it should have a better command description in the process list)
  • after the client requests a new session, we connect to it through the proxy which is unnecessary (and slower) - we should be able to tell the client where it needs to connect directly instead (and the client can then decide to continue or not, ie: it may have to continue if the direct connection to the server fails)
  • xauth errors: xauth: unable to generate an authority file name, ie:
    Error running "xauth add :2 MIT-MAGIC-COOKIE-1 8194181c038c4086bcb206ee7610e98d": non-zero exit code: 1

@totaam
Copy link
Collaborator Author

totaam commented May 22, 2017

2017-05-22 06:44:39: antoine uploaded file pam-session-v2.patch (9.7 KiB)

ask the proxy server to call pam_open on our behalf (ends up moving the proxy server process into the new session scope, not what we want..)

@totaam
Copy link
Collaborator Author

totaam commented May 22, 2017

2017-05-22 10:28:18: antoine commented


Mostly working as of r15907 + r15908 + r15909 via the new "request-start" subcommand, using "peercred" auth (#1524).
The xpra server process is started as root by the system proxy instance, it does the pam registration before changing uid, and updates the DISPLAY attribute once we have it.
We end up with a new session scope hanging off the user's slice:

Control group /:
-.slice
├─user.slice
│ ├─user-1000.slice
│ │ └─session-c32.scope
│ │   ├─31069 /bin/python /usr/bin/xpra start :100 --csc-modules=all ...
│ │   ├─31071 /usr/libexec/Xorg-for-Xpra-:100 -noreset -novtswitch -nolisten tcp ...
│ │   ├─31090 /usr/bin/dbus-daemon --syslog-only --fork --print-pid 5 --print-address 7 --session
│ │   ├─31318 pulseaudio --start -n --daemonize=false --system=false --exit-idle-time=-1 ...
│ │   ├─31457 xterm
│ │   ├─31459 bash
│ │   ├─31761 /usr/libexec/gvfsd
│ │   └─31767 /usr/libexec/gvfsd-fuse /home/antoine/.gvfs -f -o big_writes
...

And this is also shown as a session, without a seat or controlling TTY:

[antoine@desktop ~]$ loginctl list-sessions
   SESSION        UID USER             SEAT             TTY             
        c3         42 gdm              seat0            /dev/tty1       
       c32       1000 antoine                                           
        18       1000 antoine          seat0            /dev/tty2       

Exiting the xpra server terminates the whole session and all the processes get killed reliably.
Sessions started via ssh survive the logout too.

Still TODO:

  • socket activation for system proxy server #1521 selinux blocker
  • automatically use the system proxy with "xpra start" (and "start-desktop", "shadow"), fallback to systemd-run if unreachable / disabled (but not on F26 where "systemd-run --user --scope" is broken..)
  • this requires server "start" to use the client code... which is now separate (split client and server builds #1253)
  • maybe require "xpra" group membership to use the system proxy to start sessions?
  • too many places chown directories, dangerous
  • the proxy starts pretty quickly (50ms or so), but going via the system proxy is noticeably slower - maybe we can do better
  • if the session already exists, maybe we should connect to it rather than failing?
  • XDG_RUNTIME_DIR is not set? (try over TCP with a different auth module, instead of peercred + ssh)
  • better tools for using the system proxy instance? (listing all sessions, stopping them, etc)
    Still as per comment:15 :
  • proxy instance process description
  • too many arguments passed to subprocess
  • connect directly to new server

Some good documentation on control groups: LWN: Control groups series by Neil Brown

@totaam
Copy link
Collaborator Author

totaam commented May 24, 2017

2017-05-24 12:27:13: antoine commented


Debian packaging of the systemd service: #1530

@totaam
Copy link
Collaborator Author

totaam commented Jun 5, 2017

2017-06-05 15:16:37: antoine commented


Updates:

  • socket activation for system proxy server #1521 is done
  • r16021: adds "start-via-proxy" (on/off/auto), enabled by default - so we now transparently start sessions via the system-wide proxy server (and therefore can start the process as root and do the pam registration) - if the client package is not installed, it will fallback to the regular startup sequence
  • r16022 don't use "start-via-proxy" if "deamon=no" (the proxy is a daemon process)
  • r15911: better proxy process title
  • displayfd switch for xpra #1535: it would be nice to add "--displayfd" support to replace socket polling

Still TODO:

@totaam
Copy link
Collaborator Author

totaam commented Jun 20, 2017

2017-06-20 18:02:03: antoine commented


Updates:

  • XDG_RUNTIME_DIR updates, see XDG_RUNTIME_DIR not created #1537#comment:5
  • r16101: fix start-via-proxy fallback case: we have to re-exec xpra with this feature disabled because the client code initializes GTK
  • r16097: longer client command timeout when requesting a new session (adding the time we allow for the server to start)
  • r16095: proxy filter env, use new user's cwd

Tested OK on Fedora 26 and centos 7.x

@totaam
Copy link
Collaborator Author

totaam commented Jun 20, 2017

2017-06-20 22:53:12: antoine commented


  • r16106: don't start a new session if the client is exiting because of a signal
  • r16107: if we had connected to the session successfully, we can ignore more exit codes
  • r16108 + r16109: tighten permissions

Audit of all chown, chmod and mkdir calls (see r16108):

  • in create_runtime_dir, we may mkdir the user's XDG_RUNTIME_DIRECTORY as 0o700 with the uid and gid of that user: the actual path is fixed and contains the uid itself: [/var]/run/user/$UID, so this is safe
  • write_pidfile uses the path specified (not used by default) - so someone able to modify the configuration used by root could cause more damage already, and I think we do want to save the file before we change uid (saving after would be safer but we could also fail to save it)
  • write_runner_shell_scripts no longer does any fchown or chmod - we never call it as non-root
  • find_log_dir may create some directories as 0o7000 and chown them - this could be made tighter? (nothing should be needed when running as root since XDG_RUNTIME_DIR should exist already)

Last remaining issue: daemon=yes from r16108 seems to cause problems. The process tree is killed. Ouch.

@totaam
Copy link
Collaborator Author

totaam commented Jun 21, 2017

2017-06-21 13:55:31: antoine commented


Updates:

  • r16112: get extra environment variables from pam session (ie: XDG_SESSION_ID, DBUS_SESSION_BUS_ADDRESS, ..) and avoid spawning a new dbus server if pam gave us one already
  • r16113: don't create and chown log directories as root, just hope we find one that exists: XDG_RUNTIME_DIR should always exist

The problem referred to in comment:20 is actually a systemd problem..
We correctly ask logind to create a new scope by calling pam open, but somehow things get messed up and systemd spews:

systemd-logind[1098]: Failed to start session scope session-3.scope: Unit session-3.scope already exists.
python[30562]: pam_systemd(xpra:session): Failed to create session: File exists

I've also seen this variant:

systemd-logind[1098]: Failed to start session scope session-3.scope: Device or resource busy
pam_systemd(xpra:session): Failed to create session: Device or resource busy

(maybe after trying to cleanup the stale session file in /run/systemd/transient/?)

[[BR]]

Problem is that the pam call returns success... but systemd does a quick session start followed by a shutdown, full log:

Jun 21 14:32:14 systemd[1]: Created slice User Slice of guest.
Jun 21 14:32:14 systemd[1]: Starting User Manager for UID 1001...
Jun 21 14:32:14 systemd-logind[1098]: Failed to start session scope session-3.scope: Device or resource busy
Jun 21 14:32:14 audit[20751]: USER_START pid=20751 uid=0 auid=1000 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_localuser acct="guest" exe="/usr/bin/python2.7" hostname=localhost addr=? terminal=pts/7 res=success'
Jun 21 14:32:14 python[20751]: pam_systemd(xpra:session): Failed to create session: Device or resource busy
Jun 21 14:32:14 kernel: audit: type=1105 audit(1498048334.701:1132): pid=20751 uid=0 auid=1000 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_localuser acct="guest" exe="/usr/bin/python2.7" hostname=localhost addr=? terminal=pts/7 res=success'
Jun 21 14:32:14 audit[20753]: USER_ACCT pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 kernel: audit: type=1101 audit(1498048334.711:1133): pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 audit[20753]: USER_ROLE_CHANGE pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='pam: default-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 selected-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 kernel: audit: type=2300 audit(1498048334.770:1134): pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='pam: default-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 selected-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 kernel: audit: type=1006 audit(1498048334.770:1135): pid=20753 uid=0 subj=system_u:system_r:init_t:s0 old-auid=4294967295 auid=1001 tty=(none) old-ses=4294967295 ses=12 res=1
Jun 21 14:32:14 systemd[20753]: pam_unix(systemd-user:session): session opened for user guest by (uid=0)
Jun 21 14:32:14 kernel: audit: type=1105 audit(1498048334.771:1136): pid=20753 uid=0 auid=1001 ses=12 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 audit[20753]: USER_START pid=20753 uid=0 auid=1001 ses=12 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 systemd[20753]: Reached target Paths.
Jun 21 14:32:14 systemd[20753]: Starting D-Bus User Message Bus Socket.
Jun 21 14:32:14 systemd[20753]: Reached target Timers.
Jun 21 14:32:14 systemd[20753]: Listening on D-Bus User Message Bus Socket.
Jun 21 14:32:14 systemd[20753]: Reached target Sockets.
Jun 21 14:32:14 systemd[20753]: Reached target Basic System.
Jun 21 14:32:14 systemd[20753]: Reached target Default.
Jun 21 14:32:14 systemd[20753]: Startup finished in 35ms.
Jun 21 14:32:14 systemd[1]: Started User Manager for UID 1001.
Jun 21 14:32:14 kernel: audit: type=1130 audit(1498048334.816:1137): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 systemd[1]: Stopping User Manager for UID 1001...
Jun 21 14:32:14 systemd[20753]: Stopped target Default.
Jun 21 14:32:14 systemd[20753]: Stopped target Basic System.
Jun 21 14:32:14 systemd[20753]: Stopped target Timers.
Jun 21 14:32:14 systemd[20753]: Stopped target Paths.
Jun 21 14:32:14 systemd[20753]: Stopped target Sockets.
Jun 21 14:32:14 systemd[20753]: Closed D-Bus User Message Bus Socket.
Jun 21 14:32:14 systemd[20753]: Reached target Shutdown.
Jun 21 14:32:14 systemd[20753]: Starting Exit the Session...
Jun 21 14:32:14 systemd[20753]: Received SIGRTMIN+24 from PID 20780 (kill).
Jun 21 14:32:14 systemd[20772]: pam_unix(systemd-user:session): session closed for user guest
Jun 21 14:32:14 systemd[1]: Stopped User Manager for UID 1001.
Jun 21 14:32:14 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 kernel: audit: type=1131 audit(1498048334.829:1138): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 systemd[1]: Removed slice User Slice of guest.

If we can't rely on logind to create a session... we have a serious problem.

@totaam
Copy link
Collaborator Author

totaam commented Jun 24, 2017

2017-06-24 23:53:10: antoine commented


Minor fixes in r16129.

The problem from comment:21 leaves the session running but inaccessible since the /run/user/$UID directory gets nuked. The session can still be accessed if the user is a member of the "xpra" group through its socket in /run/xpra but the other sockets and the log files are lost..
I will try to write a more easily reproducible test case for reporting / asking upstream: daemonize, pam open, (start vfb?), create sockets, redirect stdout+stderr, etc.. (run as root)

@totaam
Copy link
Collaborator Author

totaam commented Jun 25, 2017

  • important fix in r16130: could have caused errors when the display name is fixed
  • r16131 mirrors the server startup and exposes the problem

It turns out that the problem is not with the code or the pam module, though pam failures to call logind are not returned as errors, simply using a different service name fixes everything. (ie: "login" instead of "xpra")
So r16132 uses a more complete pam configuration file and the test now works... but the server still does not. sigh.

@totaam
Copy link
Collaborator Author

totaam commented Jun 26, 2017

2017-06-26 20:37:05: antoine commented


Finally all fixed (I think - for real, this time) in r16134: the final piece was that we must keep the pam file descriptor open when redirecting stdout / stderr to the log file.

@smo: FYI, feel free to close. Sessions should be started via the system proxy on systems that have activated (or socket activated), which means they will survive KillUserProcesses=yes.

(commit at 30000 feet - woot!)

@totaam
Copy link
Collaborator Author

totaam commented Jun 28, 2017

2017-06-28 10:14:47: antoine uploaded file run-cleanups-priority.patch (3.1 KiB)

run cleanups with a priority value so we could run pam.close last, but this cannot be used because we are no longer root and dbus sends the uid..

@totaam
Copy link
Collaborator Author

totaam commented Jul 4, 2017

Some related changes:

  • r16182: avoid further errors on pam_start failure (unlikely)
  • r16183: simplify, we have the username so no need to look it up
  • r16184: reuse the pam cython code from the pam auth module

@totaam
Copy link
Collaborator Author

totaam commented Jul 17, 2017

One minor bug: #1582, need to continue to honour user preferences

@totaam
Copy link
Collaborator Author

totaam commented Jul 17, 2017

Another fix for the ticket that keeps on giving: r16391 (chdir so the cwd is what we expect)

@totaam
Copy link
Collaborator Author

totaam commented Jul 20, 2017

crickets - works for me, also tested on Debian: #1530

@totaam totaam closed this as completed Jul 20, 2017
@totaam
Copy link
Collaborator Author

totaam commented Jul 25, 2017

Important fix in r16502, we really need #1535 to be able to simplify this awful code.

@totaam
Copy link
Collaborator Author

totaam commented Jul 27, 2017

Likely to have caused a regression due to missing environment variables: #1602.

@totaam
Copy link
Collaborator Author

totaam commented Feb 11, 2020

See also #2042, #2585, #1536.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant