journald logger: fix race condition #10431

vrothberg · 2021-05-21T15:54:53Z

Fix a race in journald driver. Following the logs implies streaming
until the container is dead. Streaming happened in one goroutine,
waiting for the container to exit/die and signaling that event happened
in another goroutine.

The nature of having two goroutines running simultaneously is pretty
much the core of the race condition. When the streaming goroutines
received the signal that the container has exitted, the routine may not
have read and written all of the container's logs.

Fix this race by reading both, the logs and the events, of the container
and stop streaming when the died/exited event has been read. The died
event is guaranteed to be after all logs in the journal which guarantees
not only consistencty but also a deterministic behavior.

Note that the journald log driver now requires the journald event
backend to be set.

Fixes: #10323
Signed-off-by: Valentin Rothberg [email protected]

vrothberg · 2021-05-21T15:55:54Z

WIP for now as I still need to add some errors checks to enforce the journald log/event requirement.

I also want to go through the tests and run a bunch of tests with the journald logger.

vrothberg · 2021-05-21T15:57:08Z

@mheon @edsantiago PTAL if you find time. Early feedback is good feedback.

The reproducer from #10323 is running well on my machine.

edsantiago · 2021-05-21T17:37:31Z

I can't look closely right now, sorry. Can you try clearing this and re-pushing?

podman/test/system/130-kill.bats

Lines 11 to 12 in e48aa8c

    
           # Force the k8s-file driver until #10323 is fixed. 
        
           run_podman run --log-driver=k8s-file -d $IMAGE sh -c \

vrothberg · 2021-05-25T09:15:13Z

The origami continues :^) --events-backend is ignored (after validation) and always set to the defaults from containers.conf.

Will spin up another PR to address that first.

EDIT: My bad; the events-backend is not stored in the container.

libpod/container_log_linux.go

vrothberg · 2021-05-26T08:41:12Z

@edsantiago, can you take a look at the system-test changes?

vrothberg · 2021-05-26T10:40:38Z

@containers/podman-maintainers PTAL

Luap99

LGTM

edsantiago

Teeny fixes requested

test/system/035-logs.bats

Fix a race in journald driver. Following the logs implies streaming until the container is dead. Streaming happened in one goroutine, waiting for the container to exit/die and signaling that event happened in another goroutine. The nature of having two goroutines running simultaneously is pretty much the core of the race condition. When the streaming goroutines received the signal that the container has exitted, the routine may not have read and written all of the container's logs. Fix this race by reading both, the logs and the events, of the container and stop streaming when the died/exited event has been read. The died event is guaranteed to be after all logs in the journal which guarantees not only consistencty but also a deterministic behavior. Note that the journald log driver now requires the journald event backend to be set. Fixes: containers#10323 Signed-off-by: Valentin Rothberg <[email protected]>

openshift-ci · 2021-05-26T13:53:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [edsantiago,vrothberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vrothberg · 2021-05-26T14:57:01Z

The flake is stubborn

edsantiago · 2021-05-26T17:38:17Z

Six flake restarts later, tests are green. Tests LGTM.

rhatdan · 2021-05-26T20:37:31Z

/lgtm

openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 21, 2021

vrothberg force-pushed the journald-logs branch from a8deca0 to c35da8a Compare May 25, 2021 08:18

vrothberg force-pushed the journald-logs branch from c35da8a to 5d6ecff Compare May 25, 2021 11:03

Luap99 reviewed May 25, 2021

View reviewed changes

libpod/container_log_linux.go Show resolved Hide resolved

vrothberg force-pushed the journald-logs branch 2 times, most recently from 31576ae to 5a38974 Compare May 26, 2021 08:38

vrothberg force-pushed the journald-logs branch from 5a38974 to 71de133 Compare May 26, 2021 08:48

vrothberg changed the title ~~WIP - journald logger: fix race condition~~ journald logger: fix race condition May 26, 2021

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2021

Luap99 reviewed May 26, 2021

View reviewed changes

edsantiago suggested changes May 26, 2021

View reviewed changes

test/system/035-logs.bats Outdated Show resolved Hide resolved

test/system/035-logs.bats Outdated Show resolved Hide resolved

test/system/035-logs.bats Show resolved Hide resolved

vrothberg force-pushed the journald-logs branch from 71de133 to 10569c9 Compare May 26, 2021 12:52

edsantiago approved these changes May 26, 2021

View reviewed changes

openshift-ci bot assigned rhatdan May 26, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 26, 2021

openshift-merge-robot merged commit 5b4ffc7 into containers:master May 26, 2021

vrothberg deleted the journald-logs branch May 27, 2021 06:23

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

journald logger: fix race condition #10431

journald logger: fix race condition #10431

vrothberg commented May 21, 2021

vrothberg commented May 21, 2021

vrothberg commented May 21, 2021

edsantiago commented May 21, 2021

vrothberg commented May 25, 2021 •

edited

Loading

vrothberg commented May 26, 2021

vrothberg commented May 26, 2021

Luap99 left a comment

edsantiago left a comment

openshift-ci bot commented May 26, 2021

vrothberg commented May 26, 2021

edsantiago commented May 26, 2021

rhatdan commented May 26, 2021

journald logger: fix race condition #10431

journald logger: fix race condition #10431

Conversation

vrothberg commented May 21, 2021

vrothberg commented May 21, 2021

vrothberg commented May 21, 2021

edsantiago commented May 21, 2021

vrothberg commented May 25, 2021 • edited Loading

vrothberg commented May 26, 2021

vrothberg commented May 26, 2021

Luap99 left a comment

Choose a reason for hiding this comment

edsantiago left a comment

Choose a reason for hiding this comment

openshift-ci bot commented May 26, 2021

vrothberg commented May 26, 2021

edsantiago commented May 26, 2021

rhatdan commented May 26, 2021

vrothberg commented May 25, 2021 •

edited

Loading