Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conmon syslog proxy #176

Closed
goochjj opened this issue Jun 13, 2020 · 4 comments
Closed

conmon syslog proxy #176

goochjj opened this issue Jun 13, 2020 · 4 comments

Comments

@goochjj
Copy link
Contributor

goochjj commented Jun 13, 2020

With respect to container logging and /dev/log, conmon is our proxy layer - it maintains the stderr/stdout/stdin fd's, allows console connections and attachments, and proxies those logs from the container's cgroup+namespaces up into the host's journald/syslog/whatever. This maintains appropriate metadata (i.e. systemd units and LogExtraFields) for messages generated from the conmon process.

Bind mounting /dev/log causes systemd to find the metadata from the machine.slice, which doesn't have the unit metadata on it. If we instead provide a simple proxy in conmon, which creates a "slproxy" dgram socket in the bundle folder, and forwards anything written to it to /dev/log on the host, then the origin of those packets will come from conmon, and the metadata in the journal will be correctly attributed to the unit.

Open to feedback on the PR, it's a proof of concept at this stage, but it seems to fit in conman as our "proxy to the host" role. I'd REALLY like to also have CONTAINER_ID and CONTAINER_NAME and CONTAINER_TAG in the journal, but the only way I can think to do that would be to have conmon read the dgram, parse syslog format, and then rebroadcast to the journal directly - and I'm thinking that's beyond the scope of what conmon should do.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 13, 2020

pr #177

@haircommander
Copy link
Collaborator

in general the approach makes sense to me, one clarifying question:

Open to feedback on the PR, it's a proof of concept at this stage, but it seems to fit in conman as our "proxy to the host" role. I'd REALLY like to also have CONTAINER_ID and CONTAINER_NAME and CONTAINER_TAG in the journal, but the only way I can think to do that would be to have conmon read the dgram, parse syslog format, and then rebroadcast to the journal directly - and I'm thinking that's beyond the scope of what conmon should do.

doesn't the journald logger add CONTAINER_{NAME,ID,TAG}? When proxying to syslog, can we not pass those values along too?

@goochjj
Copy link
Contributor Author

goochjj commented Jun 15, 2020

It does - it can do that because it's passing fields directly to the JOURNALD socket, NOT /dev/log.

i.e. it writes
MESSAGE=blah
PRIORITY=something
CONTAINER_ID=id
CONTAINER_NAME=name
CONTAINER_ID_FULL=fullid
\0

to the /run/systemd/journal/socket socket.

In this way conmon is purposely adding those tags (and CONTAINER_TAG too) to every message as its written to the socket.

The syslog format doesn't allow for that additional metadata, it's just
<fac|prio>timestamp hostname etc.....

And that etc can be completely freeform (which is why I'd really like to avoid parsing the messages myself). I COULD parse out the timestamp, throw away the hostname, strip out the SYSLOG_IDENTIFIER and pid, and then pull the rest of the message into MESSAGE - parse the fac|prio and construct a message (with the container metadata) for journald - it just seemed like all that parsing of syslog was outside the scope of conmon... while simple socket proxying seems more palatable.

What systemd-journald then does, when it receives a message on /dev/log, is it goes searching for context. It pulls the uid + gid off the incoming socket, it pulls the PID of the sending process. It then pulls the cgroup of the sending process, which allows it to parse the systemd unit that spawned it (if possible). It can then read its own database and augment as necessary.

Which is why this whole PoC started up... When I did systemd-docker containers, I'd use --cgroup-parent=/system.slice/%n so all processes would be in the systemd cgroup. That meant when wrote to /dev/log and /dev/log was bound to the host's /dev/log, systemd would read the cgroup of that pid and get its way back to the unit that spawned the service, and tag the unit appropriately in the journal.

With podman, conmon works - because it's in systemd's cgroup, so even syslog messages get tagged to the unit correctly (but without CONTAINER_ID and other metadata if using /dev/log). But child processes of the container get resolved back to /machine.slice/libpod-CID.scope and it can't glean the unit from there. (i've similarly changed my stuff to use cgroup-parent /machine-%n.slice now, so at least I can visually see the unit from which that container was spawned, but systemd is unaware of that convention)

By having conmon simply recvfrom() and sendto() the messages, it offers up its own pid as the sender to the host process and the unit gets resolved properly. But it's still syslog format, so systemd can't introspect CONTAINER_ID and such, it doesn't know anything about them and conmon can't pass them along.

However.... systemd DOES have a field LogExtraFields, in the service, which can log extra things.

So on my services, I can do podman --name %N and add the LogExtraFields=CONTAINER_NAME=%N, and it'll track that. When the service is spawned, it creates a file in /run/systemd/units/logextrafields:%n, which has all the extra fields in it. I can't inject CONTAINER_ID this way, because the ID hasn't been assigned yet. So my next trick may be getting the hook to add additional extra fields to that file - to pass in the container ID and NAME... But it seems like the OCI hook doesn't actually GET the name passed anywhere - it's neither in the state, nor in config.json. Conmon gets it from command line arguments, but it seems the runtime spec doesn't include Name? - which is weird.

Assuming I pull that from podman inspect in the hook (which I'd rather avoid since it couples the hook to the container engine), I still wouldn't know what unit to write it to, since that's also not passed in because podman/conmon don't care what the unit name is... So I'd have to follow systemd's example, read the /proc/$$/cgroups text file and try to strip out the unit name, or, call systemdctl $$, and parse that to get the unit name, so I can find the file, so I can inject the fields.

Or I just use LogExtraFields in the unit to populate container_Name and hope they match, and inject the CONTAINER_ID into the file, assuming I work out all the other parsing.

UNLESS I've misunderstood any of this, but I've spent a bunch of time all over conmon and systemd-journald figuring all this stuff out in the past week or so.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 18, 2020

Superceded by using a hook
Also superceded by using containers/podman#6666 and conmon-delegated, with a bind mount.

@goochjj goochjj closed this as completed Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants