Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman logs -f: misses final container output #10675

Closed
edsantiago opened this issue Jun 14, 2021 · 14 comments · Fixed by #10742
Closed

podman logs -f: misses final container output #10675

edsantiago opened this issue Jun 14, 2021 · 14 comments · Fixed by #10742
Assignees
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless

Comments

@edsantiago
Copy link
Member

This one smells a little like #10003 but it's podman-local, not -remote.

$ podman run -d quay.io/libpod/testimage:20210427 sh -c 'for i in 1 2 3 4 5 6 8 10 12 13 14 15 16 20 21 22 23 24 25 26 64; do trap "echo got: $i" $i; done;echo READY;while ! test -e /stop; do sleep 0.05; done;echo DONE'
$ podman kill -s X $cid
[lots more podman-kills]
$ podman exec $cid touch /stop
...container fails to echo DONE...

(The above is not a real reproducer, just an overview of the problem)

sys: podman kill - test signal handling in containers

@edsantiago edsantiago added flakes Flakes from Continuous Integration rootless labels Jun 14, 2021
@edsantiago
Copy link
Member Author

edsantiago commented Jun 14, 2021

Here's a reproducer (copy-paste into foo.sh):

#!/bin/bash

while :;do
    cid=$(./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE')

    ./bin/podman logs -f $cid >| /tmp/foo.log &
    ./bin/podman exec $cid touch /stop
    ./bin/podman wait $cid
    grep DONE /tmp/foo.log || exit 1

    ./bin/podman rm -a
done

What it looks like to me is a resurrection of an old bug: podman logs -f sees that the container has died, and it exits without emitting the final output from the container.

Note that when this script fails, podman logs -l (correctly) shows DONE as the last line, but /tmp/foo.log does not include DONE.

Connected to @vrothberg's recent work on events and logging?

[EDITED: simplified the reproducer. Failure has nothing to do with exec or signals]

@edsantiago edsantiago changed the title podman exec: container fails to exit podman logs -f: misses final container output Jun 14, 2021
@edsantiago
Copy link
Member Author

UPDATE: I edited the reproducer, simplifying it because signals and exec have nothing to do with the failure. It's just logs. But I'm also now seeing weird ERROR[0000] <nil> messages spit out at random:

$ sh -x ./foo.sh                                                                                                                                  - (2)
+ :
++ ./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE'
+ cid=082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
+ ./bin/podman exec 082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb touch /stop
+ ./bin/podman logs -f 082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
+ ./bin/podman wait 082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
ERRO[0000] <nil>
0
+ grep DONE /tmp/foo.log
DONE
+ ./bin/podman rm -a
082f99f2206144e2169cca936b8ff632e684eb4bfd0c623b0016f4fdecca3fdb
+ :
++ ./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE'
+ cid=9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
+ ./bin/podman exec 9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4 touch /stop
+ ./bin/podman logs -f 9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
ERRO[0000] <nil>
+ ./bin/podman wait 9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
0
+ grep DONE /tmp/foo.log
DONE
+ ./bin/podman rm -a
9dbddfcfa4de3b39b155db659e7b9e132525bc92b9502838966895270c70a8d4
+ :
++ ./bin/podman run -d quay.io/libpod/testimage:20210610 sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE'
+ cid=52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37
+ ./bin/podman exec 52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37 touch /stop
+ ./bin/podman logs -f 52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37
^C+ ./bin/podman wait 52b6e82bc2aa97254faf525b8eab789d4a051970ec33417da4d77ab4b1da9a37
0
+ grep DONE /tmp/foo.log
+ exit 1

@edsantiago
Copy link
Member Author

Yeah, there is something broken in podman logs. That's where the red ERRO[0000] <nil> is coming from. Changing the above reproducer as follows:

-     ./bin/podman logs -f $cid >| /tmp/foo.log &
+     ./bin/podman logs -f $cid >| /tmp/foo.log 2>&1 &

...makes the ERRO message go away. Instead, it shows up in foo.log as:

time="2021-06-15T05:59:25-06:00" level=error msg="<nil>"

And, bizarrely, it takes many more iterations for the failure to happen (about 10-20, instead of 2-5).

@Luap99
Copy link
Member

Luap99 commented Jun 15, 2021

@edsantiago Just to confirm you are using the file log driver?

@edsantiago
Copy link
Member Author

I'm just using whatever is default, I haven't touched anything. I assume the CI systems (which are failing, which is the motivation for filing this issue) also use default.

On my system:

$ podman info --format json | gron | grep -i log
json.host.eventLogger = "journald";

Is eventLogger the same as log driver?

@Luap99
Copy link
Member

Luap99 commented Jun 15, 2021

Yeah the default is k8s-file, eventLogger is not the same as log driver.
You can check the container with podman container inspect --format "{{.HostConfig.LogConfig.Type}}" ....

@edsantiago
Copy link
Member Author

$ ./bin/podman inspect -l --format '{{.HostConfig.LogConfig.Type}}'
k8s-file

@edsantiago
Copy link
Member Author

I cannot reproduce this if I switch the podman run as follows:

-cid=$(./bin/podman run -d ... sh -c 'while ! test -e /stop; do sleep 0.05; done;echo DONE')
+cid=$(./bin/podman run -d ... sh -c 'sleep 5;echo DONE')

(even if I keep the podman exec)

@edsantiago
Copy link
Member Author

I cannot reproduce if I change sleep 0.05 to sleep 2. I can reproduce with sleep 1.

@edsantiago
Copy link
Member Author

Followup from this morning's conversation: every so often, the reproducer will hang. I just got one of those; it was 30 seconds, and my CPU was pegged during that period:

$ ./foo.sh | ts
Jun 15 12:49:23 0
Jun 15 12:49:23 DONE
Jun 15 12:49:23 e51322e2b563e2154b77a90d2d6300b15a07e7978dc22a9193d6efd96b74132a
Jun 15 12:49:23 4b0c79103a429bb544df1a70a7a9c32fe674438555504ac0425a3a56d542098d
Jun 15 12:49:24 0
Jun 15 12:49:24 DONE
Jun 15 12:49:24 6424c6c49f9f61af84695d557978ab69af4a5f7c7eb3314ca0cebcb36056c2aa
ERRO[0000] <nil>
       [ this is where it hung ]
Jun 15 12:49:52 0
Jun 15 12:49:52 DONE

I don't have the granularity to know exactly where it hung: it could've been podman run, or podman logs, or podman exec, or podman wait.

@edsantiago
Copy link
Member Author

The hang is in exec. strace -p is useless, showing futex(0x28709f0, FUTEX_WAIT_PRIVATE, 0, NULL, a pause, then ) = ?.

@edsantiago
Copy link
Member Author

Filed #10701 for the exec hang.

@vrothberg vrothberg self-assigned this Jun 21, 2021
@vrothberg
Copy link
Member

I'll take a look. I recently touch the code and may have caused this regression.

@vrothberg
Copy link
Member

Opened #10742 to clean up the ugly error log and fix the race condition.

vrothberg added a commit to vrothberg/libpod that referenced this issue Jun 21, 2021
Commit 84b55ee attempted to fix a race waiting for the container
died event.  Previously, Podman slept for duration of the polling
frequence which I considerred to be a mistake.  As it turns out, I was
mistaken since the file logger will, in fact, NOT read until EOF and
then stop logging but stop logging immediately _after_ it woke up.

[NO TESTS NEEDED] as the race condition cannot be hit reliably.

Fixes: containers#10675
Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/libpod that referenced this issue Jun 21, 2021
Commit 84b55ee attempted to fix a race waiting for the container
died event.  Previously, Podman slept for duration of the polling
frequence which I considerred to be a mistake.  As it turns out, I was
mistaken since the file logger will, in fact, NOT read until EOF and
then stop logging but stop logging immediately _after_ it woke up.

[NO TESTS NEEDED] as the race condition cannot be hit reliably.

Fixes: containers#10675
Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/libpod that referenced this issue Jun 22, 2021
Commit 84b55ee attempted to fix a race waiting for the container
died event.  Previously, Podman slept for duration of the polling
frequence which I considerred to be a mistake.  As it turns out, I was
mistaken since the file logger will, in fact, NOT read until EOF and
then stop logging but stop logging immediately _after_ it woke up.

[NO TESTS NEEDED] as the race condition cannot be hit reliably.

Fixes: containers#10675
Signed-off-by: Valentin Rothberg <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless
Projects
None yet
3 participants