run --events-backend=file sometimes means journal #23750

edsantiago · 2024-08-26T11:40:04Z

Completely bizarre:

not ok 160 |090| events - labels included in container die in 1600ms
...
# # podman --events-backend=file run --rm --name=c-t160-kctzrzn9 --label=labelt160kctzrzn9=labelvalue-t160-kctzrzn9 mqfvh quay.io/libpod/testimage:20240123 sh -c exit 17
# [ rc=17 (expected) ]

# # podman --events-backend=file events --filter=container=c-t160-kctzrzn9 --filter=status=died --stream=false --format={{.Attributes.labelt160kctzrzn9}}
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: podman-events output includes container label
# #| expected: = labelvalue-t160-kctzrzn9 mqfvh
# #|   actual:   ''
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Only fails in parallel tests. The test itself is super short. There is no possible way it could fail (obviously this is a lie).

Most curious finding: in every case I looked at (three out of the five flakes), I go look at journal, search for t160, and the log line is in the journal. On successful test passes, the string t160 does not appear in the journal.

Aug 22 21:10:54 cirrus-task-4622351889858560 podman[139554]: 2024-08-22 21:10:54.971131464 -0500 CDT m=+0.256423311 container died 1a0173d68a5a56cfacba954f713f12d4ae71397f591ee8fe0a5b1a69891bd754 (image=quay.io/libpod/testimage:20240123, name=c-t160-qd8tfclf, created_at=2024-01-24T18:38:28Z, created_by=test/system/build-testimage, io.buildah.version=1.34.1-dev, labelt160qd8tfclf=labelvalue-t160-qd8tfclf RXZGE)

[sys] |090| events - labels included in container die

debian-13 : sys podman debian-13 root host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-25 07:43
fedora-39 : sys podman fedora-39 root host boltdb
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-23 17:22
fedora-39 : sys podman fedora-39 rootless host boltdb
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-22 22:16
rawhide : sys podman rawhide root host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-23 15:28
rawhide : sys podman rawhide rootless host sqlite
- PR WIP: system test parallelization: two-pass approach #23275
  - 08-22 22:15

x	x	x	x	x	x
sys(5)	podman(5)	rawhide(2)	root(3)	host(5)	sqlite(3)
		fedora-39(2)	rootless(2)		boltdb(2)
		debian-13(1)

The text was updated successfully, but these errors were encountered:

Luap99 · 2024-08-26T11:54:14Z

I see the problem, fix should be simple.

Luap99 · 2024-08-26T12:04:32Z

I see the problem, fix should be simple.

Nvm, I take that back. I think the flag is not being propagated correctly into the cleanup process but trying this out locally I see the correct conmon args being set for both cases.
--exit-command-arg --events-backend --exit-command-arg file

If the args are really the problem then my cleanup tracer from #23487 should log that. So I guess I should get that PR ready.

Luap99 · 2024-08-27T08:40:39Z

Ok I think I know what is triggering this. The died event is created in syncContainer(), and syncContainer() must be called every time we lock the container basically. This means that any other command that acts on all container may notice the conmon exit file and creates the died event based on that. So even something like podman (un-)mount, podman ps or or really any other command that acts on this container will trigger this tiny race...

The conclusion the test is not parallel safe. (And knowing that every test using --events-backend may not be parallel safe I fear, at least all the ones that need the died event)

Luap99 · 2024-08-27T08:47:02Z

If you want to verify this yourself (assuming journald is the default logger):

terminal 1

while :; do bin/podman --events-backend file run --rm quay.io/libpod/testimage:20240123 true; done

terminal 2

bin/podman --events-backend file events

terminal 3

bin/podman events

terminal 4 (without unshare when running as root)

while :; do bin/podman unshare bin/podman mount; done

Just after a few seconds I got the died event in terminal 3

edsantiago · 2024-08-28T14:51:29Z

When you say "is not parallel safe", do you mean "not currently but it needs to be fixed," or "will never be parallel safe and we cannot guarantee that --events-backend=file will actually ever work"?

Luap99 · 2024-08-28T15:28:30Z

will never be parallel safe and we cannot guarantee that --events-backend=file will actually ever work

This is expected behavior, you select the events backend per podman process so that does not exclude another podman process with a different event logger from creating events for your container created by the original process.
So this is totally normal behavior from how podman works (I known it is most certainly is surprising and not POLA).

If you want to make sure all test see the same events you must make sure all podman commands run are using the same events backend.

...or at least as much as possible. Some tests cannot be run in parallel due to containers#23750: "--events-backend=file" does not actually work the way a naïve user would intuit. Stop/die events are asynchronous, and can be gathered by *ANY OTHER* podman process running after it, and if that process has the default events-backend=journal, that's where the event will be logged. See containers#23987 for further discussion. Signed-off-by: Ed Santiago <[email protected]>

@Honny1

Any test that uses --events-backend=file cannot be run in parallel due to containers#23750. This seems to be a hard block, unfixable. All other tests, enable ci:parallel. And, bring in timing fixes containers#23600. Thanks, @Honny1! Signed-off-by: Ed Santiago <[email protected]>

github-actions · 2024-09-28T00:08:37Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2024-10-15T19:27:20Z

Closing: this is just something we will have to live with.

edsantiago added the flakes Flakes from Continuous Integration label Aug 26, 2024

Luap99 self-assigned this Aug 26, 2024

edsantiago mentioned this issue Sep 17, 2024

CI: make 090-events parallel-safe #23987

Merged

edsantiago mentioned this issue Sep 23, 2024

CI: parallelize logs test as much as possible #24046

Merged

github-actions bot added the stale-issue label Sep 28, 2024

edsantiago closed this as not planned Won't fix, can't repro, duplicate, stale Oct 15, 2024

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jan 14, 2025

stale-locking-app bot locked as resolved and limited conversation to collaborators Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run --events-backend=file sometimes means journal #23750

run --events-backend=file sometimes means journal #23750

edsantiago commented Aug 26, 2024

Luap99 commented Aug 26, 2024

Luap99 commented Aug 26, 2024

Luap99 commented Aug 27, 2024

Luap99 commented Aug 27, 2024

edsantiago commented Aug 28, 2024

Luap99 commented Aug 28, 2024

github-actions bot commented Sep 28, 2024

edsantiago commented Oct 15, 2024

run --events-backend=file sometimes means journal #23750

run --events-backend=file sometimes means journal #23750

Comments

edsantiago commented Aug 26, 2024

[sys] |090| events - labels included in container die

Luap99 commented Aug 26, 2024

Luap99 commented Aug 26, 2024

Luap99 commented Aug 27, 2024

Luap99 commented Aug 27, 2024

edsantiago commented Aug 28, 2024

Luap99 commented Aug 28, 2024

github-actions bot commented Sep 28, 2024

edsantiago commented Oct 15, 2024