Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the log/event driver automatically fallback to file driver if journal is not readable by user ? #19600

Open
Romain-Geissler-1A opened this issue Aug 11, 2023 · 15 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Romain-Geissler-1A
Copy link
Contributor

Romain-Geissler-1A commented Aug 11, 2023

Issue Description

Hi,

This is an issue which honestly I am not sure is on podman side or on systemd side. This is the result of a real bug report we had inside Amadeus with one of our developer. We use a global ldap server to authenticate our employees, and this ldap server is also the source of user ids for all our shared linux development servers. Over time, we had many employee, and it seems that user ids have always been growing up on our side. Last week, a user came to us saying "podman logs" doesn't work for me, while it was working for absolutely all other users on the very same machine. That user had user id 61308. And then we discovered the existence of the "dynamic user id" range on systemd side: https://systemd.io/UIDS-GIDS/ (from 61184 to 65519, hardcoded at compilation time in systemd, impossible to change at run time).

So, the top level description of the problem is: when running a systemd system, users in the range of the system user id (by default < 1000 (except root or user in the journald groups), but configurable with /etc/login.defs) and users in the dynamic user id range have a non working "podman logs" command. It just returns no output, nothing, and no clear error about what's going on. It's because journalctl doesn't work for them, these users CANNOT read their own journal (which is actually the system journal). I am not sure if this is by design of systemd or not (I could find recently this pull request systemd/systemd#27961 which seems to enable journalctl for system users, but I tried this systemd version and it changes nothing to the journalctl issue for system users).

The question is, in this case (either by checking only the uids, either by "detecting" at runtime of a user is able to read it's own journal by trying to log some non empty string then trying to read it), should podman automatically switch to the file driver as a fallback (when no explicit log/event driver was specified) ?

Note: obivously, we wondered on our side if it makes sense that our ldap have users with user ids in the range of this "dynamic user id" range that we didn't know about. While new users are now created out of this range, it's definitely not easy on our side to migrate existing users from this range to another userid (as it means remapping all their files to the new userids on the all the machines they used already, or asking them to duplicate their linux account and manually ask them to migrate from the old one to the new one).

Cheers,
Romain

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create an upstream podman image with systemd installed and configured. In this image, the user 1000 podman is already created (by your team), and I add to it systemuser (user id 500) and dynamicuser (user id 62000):
rgeissler@ncerndobedev6097:~/wk/tmp/systemd> cat Dockerfile
FROM quay.io/podman/upstream

RUN dnf install -y systemd procps util-linux passwd sed less vim

RUN useradd -u 500 -m systemuser
RUN usermod --add-subuids 200000-299999 --add-subgids 200000-299999 systemuser
RUN useradd -u 62000 -m --add-subids-for-system dynamicuser
RUN usermod --add-subuids 300000-399999 --add-subgids 300000-399999 dynamicuser

# Remove the hardcoded log/event driver from https://github.com/containers/podman/blob/main/contrib/podmanimage/stable/containers.conf
RUN sed -i -e 's/^log_driver/#log_driver/' -e 's/^events_logger/#events_logger/' /etc/containers/containers.conf

ENTRYPOINT /usr/sbin/init
  1. Build and run this image:
rgeissler@ncerndobedev6097:~/wk/tmp/systemd> podman run --privileged --detach $(podman build --pull --quiet .)
66ec0a8b366dcec773135d046349bbaee8901f2c256e821eb3ee2312a38089c1
  1. Enter this image as root, then try to run a simple container generating logs with all 3 users. It works for user podman, but not for dynamicuser nor systemuser. All 3 users use the systemd log driver:
rgeissler@ncerndobedev6097:~/wk/tmp/systemd> podman exec -t -i 66ec0a8b366dcec773135d046349bbaee8901f2c256e821eb3ee2312a38089c1 bash


# For systemuser:
[root@66ec0a8b366d /]# su systemuser
[systemuser@66ec0a8b366d /]$ cd
[systemuser@66ec0a8b366d ~]$ podman info |grep logDriver
  logDriver: journald
[systemuser@66ec0a8b366d ~]$ podman run --detach --rm --cidfile=container.id fedora sh -c 'set -x && sleep 9999999'
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob 18ca996a454f done   |
Copying config 72c9e45642 done   |
Writing manifest to image destination
891b33d7f53c9a1a0fe0ed970462bf6a060c4cbcb4ba0e9f56a818bf238ef0e7
[systemuser@66ec0a8b366d ~]$ podman logs "$(cat container.id)"
[systemuser@66ec0a8b366d ~]$        <------ NO LOG HERE !
exit



# For podman:
[root@66ec0a8b366d /]# su podman
[podman@66ec0a8b366d /]$ cd
[podman@66ec0a8b366d ~]$ podman info |grep logDriver
  logDriver: journald
[podman@66ec0a8b366d ~]$ podman run --detach --rm --cidfile=container.id fedora sh -c 'set -x && sleep 9999999'
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob 18ca996a454f done   |
Copying config 72c9e45642 done   |
Writing manifest to image destination
125fda6c641814354635b5a05aefb2123768062fdae95837398177f80383059e
[podman@66ec0a8b366d ~]$ podman logs "$(cat container.id)"
+ sleep 9999999  <---- EXPECTED LOGS HERE
[podman@66ec0a8b366d ~]$
exit




# For dynamicuser:
[root@66ec0a8b366d /]# su dynamicuser
[dynamicuser@66ec0a8b366d /]$ cd
[dynamicuser@66ec0a8b366d ~]$ podman info |grep logDriver
  logDriver: journald
[dynamicuser@66ec0a8b366d ~]$ podman run --detach --rm --cidfile=container.id fedora sh -c 'set -x && sleep 9999999'
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob 18ca996a454f done   |
Copying config 72c9e45642 done   |
Writing manifest to image destination
7997c9ffec6bc789564d4ce31641712f1246962becc9e4895ca97df710acf503
[dynamicuser@66ec0a8b366d ~]$ podman logs "$(cat container.id)"
[dynamicuser@66ec0a8b366d ~]$        <------ NO LOG HERE !
exit

Describe the results you received

podman logs doesn't work for systemuser and dynamicuser.

Describe the results you expected

podman logs works for all users, by default, with no special configuration to make it work (ie automatic fallback to a file driver in these special cases ?)

podman info output

Not needed (upsteam podman)

Podman in a container

Yes

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

In our real life example (the case of the developer who opened us a bug report internally), we run a RHEL 9.2 server, with all the software updates applied. There is no "podman in podman" scenario like described in this bug report, but the issue is the same: no podman logs are seeable for this user in the dynamic user id range.

Additional information

No response

@Romain-Geissler-1A Romain-Geissler-1A added the kind/bug Categorizes issue or PR as related to a bug. label Aug 11, 2023
@Romain-Geissler-1A
Copy link
Contributor Author

Romain-Geissler-1A commented Aug 11, 2023

Note: even a dynamic user create "the normal way" via systemd cannot access it's own journal file:

[root@66ec0a8b366d /]# systemd-run -p DynamicUser=yes --shell
Running as unit: run-u32.service
Press ^] three times within 1s to disconnect TTY.
bash-5.2$ id
uid=64350(run-u32) gid=64350(run-u32) groups=64350(run-u32)
bash-5.2$ journalctl --user
No journal files were found.
-- No entries --
bash-5.2$ journalctl
Hint: You are currently not seeing messages from other users and the system.
      Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
      Pass -q to turn off this notice.
No journal files were opened due to insufficient permissions.
bash-5.2$
exit
Finished with result: exit-code
Main processes terminated with: code=exited/status=1
Service runtime: 11.048s
CPU time consumed: 31ms

For another project in my Amadeus (improving a bit security on our Jenkins nodes), I also wish to use one day systemd dynamic users to isolate different Jenkins build running on the same machine. If podman logs doesn't work by default for these users, it means what I want to do is impossible.

@Luap99
Copy link
Member

Luap99 commented Aug 11, 2023

The question is, in this case (either by checking only the uids, either by "detecting" at runtime of a user is able to read it's own journal by trying to log some non empty string then trying to read it), should podman automatically switch to the file driver as a fallback (when no explicit log/event driver was specified) ?

I don't think hardcoding based on UID is a good idea, this would mean we need to track upstream changes and so for possible changes then different distros could change it and so on.
The writing/reading approach sounds good to me. We already have a dynamic check for permissions on /run/log/journal and /var/log/journal but that check can fall short here. It would also help in cases like this: #19581

@vrothberg
Copy link
Member

[...] but that check can fall short here.

The code in question is here.

Extending the linked check above may slow Podman init down yet only for a corner case. Shall we make podman logs a bit smarter to detect this issue when attempting to read from the journal?

@giuseppe WDYT?

@Luap99
Copy link
Member

Luap99 commented Aug 14, 2023

Extending the linked check above may slow Podman init down yet only for a corner case.

Well I wouldn't suggest to do that for every podman command. Only the first command after boot and then we could write the logger to a tmpfile so all commands after that can just read the tmpfile to know which driver is in use.

Shall we make podman logs a bit smarter to detect this issue when attempting to read from the journal?

We cannot detect that in all cases. We simply get no results which is equal to no logs. And by the time someone runs podman logs it is already to late as the container was already created with the wrong log driver.

@vrothberg
Copy link
Member

Well I wouldn't suggest to do that for every podman command. Only the first command after boot and then we could write the logger to a tmpfile so all commands after that can just read the tmpfile to know which driver is in use.

That's an improvement but would still slow down containers on boot. They either all have to do the check concurrently or block on the first process doing it. It's an important use case for auto and Edge etc.

But I am purely speculating at the moment, and have to idea how big of a cost the check would be.

And by the time someone runs podman logs it is already to late as the container was already created with the wrong log driver.

Absolutely. I should have elaborated more: if we decide to not pay the performance penalty we may still consider improving the UX.

@Romain-Geissler-1A
Copy link
Contributor Author

Romain-Geissler-1A commented Aug 14, 2023

I don't know anything about journald API (neither the official API, neither the go implementation (or binding) you use, but doesn't journal provide a way to know if a user can read its own journal ? I mean, of course I don't want to start hardcoding the system/dynamic user range on podman side, but it's hardcoded already in the C systemd libraries, so, if there is any official API to query that, it would still be fast (given the implementation on systemd side seems to really just check uid numbers)

@Luap99
Copy link
Member

Luap99 commented Aug 14, 2023

Well I wouldn't suggest to do that for every podman command. Only the first command after boot and then we could write the logger to a tmpfile so all commands after that can just read the tmpfile to know which driver is in use.

That's an improvement but would still slow down containers on boot. They either all have to do the check concurrently or block on the first process doing it. It's an important use case for auto and Edge etc.

Ah right and based on CI flakes write journal followed by read journald is not reliable at all so we would need to some sleep back off which would make it very slow and unacceptable for these use cases.


To my knowledge there is no systemd API to know if you can read logs or not. Based on your systemd-run example you can see that journalctl --user runs successfully so we must assume there are no logs from the container not that we cannot read logs.

@Romain-Geissler-1A
Copy link
Contributor Author

Based on your systemd-run example you can see that journalctl --user runs successfully so we must assume there are no logs from the container not that we cannot read logs.

Actually it reads No journal files were found. so, it's kind of a hint that reading anything from an empty collection is likely to fail ;) But indeed I don't know if the systemd API provides this information. For people working in Red Hat, maybe you know some colleague working on systemd who can bring their knowledge here ?

@giuseppe
Copy link
Member

@msekletar is there an easy way to detect whether a user has read access to the journal?

We care how long it takes to run the first container for some use cases, so I am also against a check that runs on every boot for detecting an unusual configuration as here.

More in general, I think it is not a good idea to use a UID that could collide with systemd internal IDs and you should avoid that. You could end up with an unprivileged user that gains access to a system service because systemd picked that same ID for a system configured with DynamicUser=yes

@Romain-Geissler-1A
Copy link
Contributor Author

Romain-Geissler-1A commented Aug 14, 2023

More in general, I think it is not a good idea to use a UID that could collide with systemd internal IDs and you should avoid that.

While I agree with this, in practice some organization like ours have allocated these user ids even before systemd came up with the idea of this dynamic user range, which we couldn't really foresee ;)

Also, one might want to use systemd-run with DynamicUser=yes and to be able to use podman in the resulting shell, typically the use case of Jenkins CI avoiding to share resources between two builds (while today Jenkins with containers usually relies on a shared rootful docker daemon). I am really thinking about implementing this for our own CI to increase security.

You could end up with an unprivileged user that gains access to a system service because systemd picked that same ID for a system configured with DynamicUser=yes

Technically in https://0pointer.net/blog/dynamic-users-with-systemd.html that Lennart wrote when this was introduced, it's written:

You might wonder what happens if you already used UIDs from the 61184–65519 range on your system for other purposes. systemd should handle that mostly fine, as long as that usage is properly registered in the user database: when allocating a dynamic user we pick a UID, see if it is currently used somehow, and if yes pick a different one, until we find a free one. Whether a UID is used right now or not is checked through NSS calls. Moreover the IPC object lists are checked to see if there are any objects owned by the UID we are about to pick. This means systemd will avoid using UIDs you have assigned otherwise. Note however that this of course makes the pool of available UIDs smaller, and in the worst cases this means that allocating a dynamic user might fail because there simply are no unused UIDs in the range.

@giuseppe
Copy link
Member

Technically in https://0pointer.net/blog/dynamic-users-with-systemd.html that Lennart wrote when this was introduced, it's written:

You might wonder what happens if you already used UIDs from the 61184–65519 range on your system for other purposes. systemd should handle that mostly fine, as long as that usage is properly registered in the user database: when allocating a dynamic user we pick a UID, see if it is currently used somehow, and if yes pick a different one, until we find a free one. Whether a UID is used right now or not is checked through NSS calls. Moreover the IPC object lists are checked to see if there are any objects owned by the UID we are about to pick. This means systemd will avoid using UIDs you have assigned otherwise. Note however that this of course makes the pool of available UIDs smaller, and in the worst cases this means that allocating a dynamic user might fail because there simply are no unused UIDs in the range.

what would happen if a user is added later on while there are already running services using that ID? Would these services be restarted?

@Romain-Geissler-1A
Copy link
Contributor Author

Romain-Geissler-1A commented Aug 14, 2023

what would happen if a user is added later on while there are already running services using that ID? Would these services be restarted?

I don't know (as I don't know the code), but indeed I would expect in that case a temporary user id collision, until the systemd service stops and cleans up the temporary user.

@msekletar
Copy link
Contributor

@msekletar is there an easy way to detect whether a user has read access to the journal?

@giuseppe You can check manually by looking at individual files but I don't think this is easy. I'd suggest to use sd-journal API and try to open journal files owned by the user. For example,

/* gcc -o user-journal-check user-journal-check.c -lsystemd */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <systemd/sd-journal.h>

int main(void) {
        sd_journal *j;
        int r;

        r = sd_journal_open(&j, SD_JOURNAL_CURRENT_USER);
        if (r < 0) {
                fprintf(stderr, "Failed to open journal: %s\n", strerror(-r));
                return EXIT_FAILURE;
        }

        if (sd_journal_has_runtime_files(j) == 0 &&
            sd_journal_has_persistent_files(j) == 0)
                printf("User has no journal files.\n");
                
        sd_journal_close(j);
        return 0;
}   

When we get no output we know user has some journal files, otherwise "User has no journal files." gets printed but what this tells us is really just what it says on the box. Maybe user had some journal files previously but they were deleted or user (non-root) never had any journal files and it never will because SplitMode=none in /etc/systemd/journald.conf.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Sep 14, 2023

@giuseppe @msekletar @Romain-Geissler-1A What should be done with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

6 participants