podman logs -f: misses final container output #10675

edsantiago · 2021-06-14T16:16:55Z

This one smells a little like #10003 but it's podman-local, not -remote.

$ podman run -d quay.io/libpod/testimage:20210427 sh -c 'for i in 1 2 3 4 5 6 8 10 12 13 14 15 16 20 21 22 23 24 25 26 64; do trap "echo got: $i" $i; done;echo READY;while ! test -e /stop; do sleep 0.05; done;echo DONE'
$ podman kill -s X $cid
[lots more podman-kills]
$ podman exec $cid touch /stop
...container fails to echo DONE...

(The above is not a real reproducer, just an overview of the problem)

sys: podman kill - test signal handling in containers

fedora-33 : sys podman fedora-33 rootless host
fedora-34 : sys podman fedora-34 rootless host
ubuntu-2010 : sys podman ubuntu-2010 rootless host
- PR System tests: deal with crun 0.20.1 #10618
  - 06-09 13:33
- PR Always spawn a cleanup process with exec #10405
  - 06-11 11:49
ubuntu-2104 : sys podman ubuntu-2104 rootless host

The text was updated successfully, but these errors were encountered:

edsantiago · 2021-06-14T17:07:17Z

Here's a reproducer (copy-paste into foo.sh):

#!/bin/bash

while :;do
    cid=$(./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE')

    ./bin/podman logs -f $cid >| /tmp/foo.log &
    ./bin/podman exec $cid touch /stop
    ./bin/podman wait $cid
    grep DONE /tmp/foo.log || exit 1

    ./bin/podman rm -a
done

What it looks like to me is a resurrection of an old bug: podman logs -f sees that the container has died, and it exits without emitting the final output from the container.

Note that when this script fails, podman logs -l (correctly) shows DONE as the last line, but /tmp/foo.log does not include DONE.

Connected to @vrothberg's recent work on events and logging?

[EDITED: simplified the reproducer. Failure has nothing to do with exec or signals]

edsantiago · 2021-06-14T19:22:39Z

UPDATE: I edited the reproducer, simplifying it because signals and exec have nothing to do with the failure. It's just logs. But I'm also now seeing weird ERROR[0000] <nil> messages spit out at random:

$ sh -x ./foo.sh                                                                                                                                  - (2)
+ :
++ ./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE'
+ cid=082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
+ ./bin/podman exec 082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb touch /stop
+ ./bin/podman logs -f 082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
+ ./bin/podman wait 082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
ERRO[0000] <nil>
0
+ grep DONE /tmp/foo.log
DONE
+ ./bin/podman rm -a
082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
+ :
++ ./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE'
+ cid=9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
+ ./bin/podman exec 9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4 touch /stop
+ ./bin/podman logs -f 9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
ERRO[0000] <nil>
+ ./bin/podman wait 9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
0
+ grep DONE /tmp/foo.log
DONE
+ ./bin/podman rm -a
9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
+ :
++ ./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE'
+ cid=52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37
+ ./bin/podman exec 52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37 touch /stop
+ ./bin/podman logs -f 52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37
^C+ ./bin/podman wait 52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37
0
+ grep DONE /tmp/foo.log
+ exit 1

edsantiago · 2021-06-15T12:03:55Z

Yeah, there is something broken in podman logs. That's where the red ERRO[0000] <nil> is coming from. Changing the above reproducer as follows:

-     ./bin/podman logs -f $cid >| /tmp/foo.log &
+     ./bin/podman logs -f $cid >| /tmp/foo.log 2>&1 &

...makes the ERRO message go away. Instead, it shows up in foo.log as:

time="2021-06-15T05:59:25-06:00" level=error msg="<nil>"

And, bizarrely, it takes many more iterations for the failure to happen (about 10-20, instead of 2-5).

Luap99 · 2021-06-15T12:51:08Z

@edsantiago Just to confirm you are using the file log driver?

edsantiago · 2021-06-15T12:54:33Z

I'm just using whatever is default, I haven't touched anything. I assume the CI systems (which are failing, which is the motivation for filing this issue) also use default.

On my system:

$ podman info --format json | gron | grep -i log
json.host.eventLogger = "journald";

Is eventLogger the same as log driver?

Luap99 · 2021-06-15T13:00:24Z

Yeah the default is k8s-file, eventLogger is not the same as log driver.
You can check the container with podman container inspect --format "{{.HostConfig.LogConfig.Type}}" ....

edsantiago · 2021-06-15T13:02:22Z

$ ./bin/podman inspect -l --format '{{.HostConfig.LogConfig.Type}}'
k8s-file

edsantiago · 2021-06-15T15:37:43Z

I cannot reproduce this if I switch the podman run as follows:

-cid=$(./bin/podman run -d ... sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE')
+cid=$(./bin/podman run -d ... sh -c 'sleep 5;echo DONE')

(even if I keep the podman exec)

edsantiago · 2021-06-15T16:02:47Z

I cannot reproduce if I change sleep 0.05 to sleep 2. I can reproduce with sleep 1.

edsantiago · 2021-06-15T18:54:33Z

Followup from this morning's conversation: every so often, the reproducer will hang. I just got one of those; it was 30 seconds, and my CPU was pegged during that period:

$ ./foo.sh | ts
Jun 15 12:49:23 0
Jun 15 12:49:23 DONE
Jun 15 12:49:23 e51322e2b563e2154b77a90d2d6300b15a07e7978dc22a9193d6efd96b74132a
Jun 15 12:49:23 4b0c79103a429bb544df1a70a7a9c32fe674438555504ac0425a3a56d542098d
Jun 15 12:49:24 0
Jun 15 12:49:24 DONE
Jun 15 12:49:24 6424c6c49f9f61af84695d557978ab69af4a5f7c7eb3314ca0cebcb36056c2aa
ERRO[0000] <nil>
       [ this is where it hung ]
Jun 15 12:49:52 0
Jun 15 12:49:52 DONE

I don't have the granularity to know exactly where it hung: it could've been podman run, or podman logs, or podman exec, or podman wait.

edsantiago · 2021-06-15T19:02:22Z

The hang is in exec. strace -p is useless, showing futex(0x28709f0, FUTEX_WAIT_PRIVATE, 0, NULL, a pause, then ) = ?.

edsantiago · 2021-06-16T20:25:13Z

Filed #10701 for the exec hang.

vrothberg · 2021-06-21T12:06:35Z

I'll take a look. I recently touch the code and may have caused this regression.

vrothberg · 2021-06-21T12:38:03Z

Opened #10742 to clean up the ugly error log and fix the race condition.

Commit 84b55ee attempted to fix a race waiting for the container died event. Previously, Podman slept for duration of the polling frequence which I considerred to be a mistake. As it turns out, I was mistaken since the file logger will, in fact, NOT read until EOF and then stop logging but stop logging immediately _after_ it woke up. [NO TESTS NEEDED] as the race condition cannot be hit reliably. Fixes: containers#10675 Signed-off-by: Valentin Rothberg <[email protected]>

edsantiago added flakes Flakes from Continuous Integration rootless labels Jun 14, 2021

edsantiago changed the title ~~podman exec: container fails to exit~~ podman logs -f: misses final container output Jun 14, 2021

vrothberg self-assigned this Jun 21, 2021

vrothberg mentioned this issue Jun 21, 2021

logs: k8s-file: restore poll sleep #10742

Merged

openshift-merge-robot closed this as completed in #10742 Jun 21, 2021

This was referenced Jun 21, 2021

[v3.2] logs: k8s-file: restore poll sleep #10743

Closed

[v3.2] logs: k8s-file: restore poll sleep #10744

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman logs -f: misses final container output #10675

podman logs -f: misses final container output #10675

edsantiago commented Jun 14, 2021

edsantiago commented Jun 14, 2021 •

edited

Loading

edsantiago commented Jun 14, 2021

edsantiago commented Jun 15, 2021

Luap99 commented Jun 15, 2021

edsantiago commented Jun 15, 2021

Luap99 commented Jun 15, 2021

edsantiago commented Jun 15, 2021

edsantiago commented Jun 15, 2021

edsantiago commented Jun 15, 2021

edsantiago commented Jun 15, 2021

edsantiago commented Jun 15, 2021

edsantiago commented Jun 16, 2021

vrothberg commented Jun 21, 2021

vrothberg commented Jun 21, 2021

podman logs -f: misses final container output #10675

podman logs -f: misses final container output #10675

Comments

edsantiago commented Jun 14, 2021

sys: podman kill - test signal handling in containers

edsantiago commented Jun 14, 2021 • edited Loading

edsantiago commented Jun 14, 2021

edsantiago commented Jun 15, 2021

Luap99 commented Jun 15, 2021

edsantiago commented Jun 15, 2021

Luap99 commented Jun 15, 2021

edsantiago commented Jun 15, 2021

edsantiago commented Jun 15, 2021

edsantiago commented Jun 15, 2021

edsantiago commented Jun 15, 2021

edsantiago commented Jun 15, 2021

edsantiago commented Jun 16, 2021

vrothberg commented Jun 21, 2021

vrothberg commented Jun 21, 2021

edsantiago commented Jun 14, 2021 •

edited

Loading