Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system test: skip some quadlet tests when use journald log driver #19236

Closed
wants to merge 1 commit into from

Conversation

sstosh
Copy link
Contributor

@sstosh sstosh commented Jul 14, 2023

Some quadlet tests can only be run when the log driver is set to journald in containers.conf.
Therefore, those tests will be skipped if the log driver is not set to journald.

Does this PR introduce a user-facing change?

None

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 14, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sstosh
Once this PR has been reviewed and has the lgtm label, please assign giuseppe for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sstosh
Copy link
Contributor Author

sstosh commented Jul 14, 2023

In my environment, flakes similar to #18132 occur other than quadlet - basic test.

Error log
not ok 12 quadlet - selinux disable
# (from function `is' in file helpers.bash, line 851,
#  in test file 252-quadlet.bats, line 481)
#   `is "$output" '.*STARTED CONTAINER.*'' failed
# [17:47:30.010618898] # podman rm -t 0 --all --force --ignore
# [17:47:30.057466351] # podman ps --all --external --format {{.ID}} {{.Names}}
# [17:47:30.102161303] # podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
# [17:47:30.143942688] quay.io/libpod/testimage:20221018 f5a99120db64
# # /usr/libexec/podman/quadlet  /run/systemd/system
#
# # Automatically generated by /usr/libexec/podman/quadlet
# #
# [X-Container]
# Image=quay.io/libpod/testimage:20221018
# SecurityLabelDisable=true
# Exec=sh -c "echo STARTED CONTAINER; top"
#
# [Unit]
# SourcePath=/tmp/podman_bats.7KEOwB/quadlet.PUIUXL/basic_DtnfVrg5P7.container
# RequiresMountsFor=%t/containers
#
# [Service]
# Environment=PODMAN_SYSTEMD_UNIT=%n
# KillMode=mixed
# ExecStop=/usr/bin/podman rm -f -i --cidfile=%t/%N.cid
# ExecStopPost=-/usr/bin/podman rm -f -i --cidfile=%t/%N.cid
# Delegate=yes
# Type=notify
# NotifyAccess=all
# SyslogIdentifier=%N
# ExecStart=/usr/bin/podman run --name=systemd-%N --cidfile=%t/%N.cid --replace --rm --cgroups=split --sdnotify=conmon -d --security-opt label:disable quay.io/libpod/testimage:20221018 sh -c "echo STARTED CONTAINER; top"
# # systemctl  start basic_DtnfVrg5P7.service
#
# # systemctl status basic_DtnfVrg5P7.service
# ● basic_DtnfVrg5P7.service
#    Loaded: loaded (/tmp/podman_bats.7KEOwB/quadlet.PUIUXL/basic_DtnfVrg5P7.container; static; vendor preset: disabled)
#    Active: active (running) since Thu 2023-07-13 17:47:30 JST; 17ms ago
#  Main PID: 231379 (conmon)
#     Tasks: 2 (limit: 11122)
#    Memory: 1.7M
#    CGroup: /system.slice/basic_DtnfVrg5P7.service
#            tqlibpod-payload-7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15
#            x mq231391 top
#            mqruntime
#              mq231379 /usr/bin/conmon --api-version 1 -c 7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15 -u 7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15 -r /root/runc/runc -b /var/lib/containers/storage/overlay-containers/7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15/userdata -p /run/containers/storage/overlay-containers/7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15/userdata/pidfile -n systemd-basic_DtnfVrg5P7 --exit-dir /run/libpod/exits --full-attach -l k8s-file:/var/lib/containers/storage/overlay-containers/7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15/userdata/ctr.log --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15/userdata/oci-log --conmon-pidfile /run/containers/storage/overlay-containers/7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --network-config-dir --exit-command-arg --exit-command-arg --network-backend --exit-command-arg cni --exit-command-arg --volumepath --exit-command-arg /var/lib/containers/storage/volumes --exit-command-arg --db-backend --exit-command-arg boltdb --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15
#
# Jul 13 17:47:30 rhel89-20230626 systemd[1]: Starting basic_DtnfVrg5P7.service...
# Jul 13 17:47:30 rhel89-20230626 systemd[1]: Started basic_DtnfVrg5P7.service.
# Jul 13 17:47:30 rhel89-20230626 basic_DtnfVrg5P7[231310]: 7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15
# # systemctl show --value --property=ActiveState basic_DtnfVrg5P7.service
# active
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: [no test name given]
# #| expected: '.*STARTED CONTAINER.*' (using expr)
# #|   actual: '-- Logs begin at Thu 2023-07-13 11:38:17 JST, end at Thu 2023-07-13 17:47:30 JST. --'
# #|         > 'Jul 13 17:47:30 test systemd[1]: Starting basic_DtnfVrg5P7.service...'
# #|         > 'Jul 13 17:47:30 test systemd[1]: Started basic_DtnfVrg5P7.service.'
# #|         > 'Jul 13 17:47:30 test basic_DtnfVrg5P7[231310]: 7a231720acea671a12a04e6d4471c1883333c43953f18f452f9c84efe1950c15'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# # [teardown]
# [17:47:30.944374156] # podman pod rm -t 0 --all --force --ignore
# [17:47:30.988063212] # podman rm -t 0 --all --force --ignore
# [17:47:31.023748137] # podman network prune --force
# [17:47:31.059659670] # podman volume rm -a -f

@sstosh sstosh force-pushed the system-quadlet-logdriver branch from 8e24fd8 to bdbb928 Compare July 14, 2023 04:11
Notify=yes
LogDriver=passthrough
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this makes sense. One test for passthrough is enough we should not duplicate the test for every single one.

Only using wait_for_output should be enough to address flakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If LogDriver=passthrough is removed , STARTED CONTAINER will not be written to journald, and the test will fail.
(Is this a bug?)

Error log
#
# Jul 14 00:18:40 test systemd[1]: Starting basic_4SD5D8ioUH.service...
# Jul 14 00:18:40 test basic_4SD5D8ioUH[262279]: 910435b96ecd7dfb4900b86e4b9f0d19e9d0b9e93ba1c17653a688077f0d18d2
# Jul 14 00:18:40 test systemd[1]: Started basic_4SD5D8ioUH.service.
# # systemctl show --value --property=ActiveState basic_4SD5D8ioUH.service
# active
# [00:18:40.646492693] # podman logs systemd-basic_4SD5D8ioUH
# [00:18:40.683215557] STARTED CONTAINER
# [00:18:40.691517200] # podman logs systemd-basic_4SD5D8ioUH
# [00:18:40.723299024] STARTED CONTAINER
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: podman logs systemd-basic_4SD5D8ioUH
# #| expected: '.*STARTED CONTAINER.*' (using expr)
# #|   actual: '-- Logs begin at Thu 2023-07-13 11:38:17 JST, end at Fri 2023-07-14 00:18:40 JST. --'
# #|         > 'Jul 14 00:18:40 test systemd[1]: Starting basic_4SD5D8ioUH.service...'
# #|         > 'Jul 14 00:18:40 test basic_4SD5D8ioUH[262279]: 910435b96ecd7dfb4900b86e4b9f0d19e9d0b9e93ba1c17653a688077f0d18d2'
# #|         > 'Jul 14 00:18:40 test systemd[1]: Started basic_4SD5D8ioUH.service.'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you run the tests? Is your default log driver not journald?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No time to review right now, but sleep inf is evil: sleep is noninterruptible, so podman stop takes an extra ten seconds. Did you confirm that this change does not affect test runtimes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you run the tests? Is your default log driver not journald?

The default log driver is k8s-file. (This is the initial value set in the containers.conf for RHEL8 series.)
When the log driver was changed to journald from k8s-file, the test was successful without setting LogDriver=paththrough.
Considering this, it seems necessary to either change the log driver to journald or skip it when it is set to k8s-file.

@ygalblum
Copy link
Contributor

@edsantiago PTAL. Not sure it's really needed because I think the root cause is in configuration of @sstosh's local environment. But, still worth checking

Some quadlet tests can only be run when the log driver is set to journald in containers.conf.
Therefore, those tests will be skipped if the log driver is not set to journald.

Signed-off-by: Toshiki Sonoda <[email protected]>
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under this PR, test runtime goes from 85s to 136s (root), 68s->105s (rootless). Please revert back to top.

Also, rootless volume path test is consistently failing on my f38 laptop; also failing on main so it doesn't look like your fault. Still investigating. Not your fault. Filed #19267 to fix.

@sstosh sstosh force-pushed the system-quadlet-logdriver branch from bdbb928 to b872ef3 Compare July 18, 2023 02:15
@sstosh sstosh changed the title system test: fix flake in quadlet tests system test: skip some quadlet tests when use journald log driver Jul 18, 2023
@sstosh
Copy link
Contributor Author

sstosh commented Jul 18, 2023

When podman uses k8s-file as the log driver, it seems that the test fails because STARTED CONTAINER is not written to journald.
Therefore, I have made it so that the test is skipped when using a log driver other than journald.
Also, I considered a method to overwrite the log driver to journald, like CONTAINERS_CONF_OVERRIDE, but I could not come up with a good implementation.

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't help thinking that this is the wrong approach.

The proximate problem is, tests are looking for a string in the journal. Taking a step back, though, is this testing the journal or is it testing if a container has started? Can there be another way to verify that a container has started, like touching a file, or SDNOTIFY, or using podman logs?

@sstosh that's a question for @containers/podman-maintainers, not for you, but I'm blocking right now because I think it's important to step back and understand this problem better. Also because there's a little too much duplication in your approach.

Thank you for identifying this problem and bringing it to our attention.

@edsantiago
Copy link
Member

Followup:

skip_if_journald_unavailable "quadlet isn't really usable without journal"

Is this true? Is quadlet meaningless on k8s-file or anything other than journald? Or is journal only needed for these tests? I suspect the statement is untrue, because apparently many of these tests are passing on @sstosh's system even without journald.

@Luap99
Copy link
Member

Luap99 commented Jul 18, 2023

The proximate problem is, tests are looking for a string in the journal. Taking a step back, though, is this testing the journal or is it testing if a container has started? Can there be another way to verify that a container has started, like touching a file, or SDNOTIFY, or using podman logs?

Yes there seems no reason to check a sting in the journal.
There are tests which check the log driver then we want to make sure that it works but for these 4 cases here I do not see why we could not just remove this. If a container is not started systemctl should exit non zero and thus fail the test so there is no value in this check IMO.

Followup:

skip_if_journald_unavailable "quadlet isn't really usable without journal"

Is this true? Is quadlet meaningless on k8s-file or anything other than journald? Or is journal only needed for these tests? I suspect the statement is untrue, because apparently many of these tests are passing on @sstosh's system even without journald.

Not really, the proper check would be systemd unavailable but I guess this would be the same as journald unavailable.

@ygalblum
Copy link
Contributor

Followup:

skip_if_journald_unavailable "quadlet isn't really usable without journal"

Is this true? Is quadlet meaningless on k8s-file or anything other than journald? Or is journal only needed for these tests? I suspect the statement is untrue, because apparently many of these tests are passing on @sstosh's system even without journald.

@edsantiago You added this line here: cf5df5b where you actually say that it is an overkill.

I'm guessing you added it because the tests rely on the journal, not because Quadlet does. Also, please keep in mind that Quadlet started off as an opinionated solution which forces passthrough log-driver. So, maybe some assumptions in the tests no longer hold.

I agree that it seems that in some of the case the journal is used to check that the container is running and not for anything else. For these cases, we surely need to find a better way

@edsantiago
Copy link
Member

Oops. Thank you for that. It was obviously one of my emergency las-minute RHEL fixes. I will make time today to find a better solution.

@sstosh
Copy link
Contributor Author

sstosh commented Jul 27, 2023

Close this PR because #19276 was merged.

@sstosh sstosh closed this Jul 27, 2023
@sstosh sstosh deleted the system-quadlet-logdriver branch July 27, 2023 05:14
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants