Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/system/252-quadlet.bats: fix flake #18144

Merged
merged 1 commit into from
Apr 11, 2023

Conversation

vrothberg
Copy link
Member

@vrothberg vrothberg commented Apr 11, 2023

Wait for the expected logs to appear in the journal before using
journalctl. #18132 is likely flaking because journalctl does
not yet see the container's logs.

Fixes: #18132

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 11, 2023
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--wait waits for the unit to terminate according to the systemd docs.
This test wants a running unit to do its testing so this fix will not work and cause it to hang forever.
Also systemctl will already wait until we send READY=1 to the notify socket so we should have the log by then. Is it possible that the race is with journactl instead?

@vrothberg
Copy link
Member Author

--wait waits for the unit to terminate according to the systemd docs. This test wants a running unit to do its testing so this fix will not work and cause it to hang forever. Also systemctl will already wait until we send READY=1 to the notify socket so we should have the log by then. Is it possible that the race is with journactl instead?

Probably, needs more investigation (and me to exercise the test :^)).

Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in principle but I don't grok enough about quadlet to really approve. One minor nit inline. Thanks for addressing this so quickly.

@@ -150,14 +150,13 @@ EOF
run_quadlet "$quadlet_file"
service_setup $QUADLET_SERVICE_NAME

# check that we can read the logs from the container with podman logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was appropriate when this code ran below journalctl, but is less so now. A more helpful comment now would be something like "quadlet container emits a standard STARTED message once it's running". This is not worth re-pushing for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also more importantly wait_for_output uses regex to match. It is very important to match the full actual logs output here. This was added by me to make sure --log-driver passthrough works with podman logs. If we do not match 1 to 1 we could easily get warning or errors logged with logrus and not notice it or other output we do not want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch! I failed to notice that the original assertion was ==. If that's how quadlet works, and it guarantees emitting STARTED CONTAINER and absolutely no other output, then it's important to keep the check that way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the PR accordingly but notice that relying on passthrough being the default log-driver may be risky as it can be changed in the future. An explicit test for it may be needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL at the new version, should address everything. I also hardcoded the passthrough driver to make sure the test continues as intended.

@vrothberg vrothberg force-pushed the fix-18132 branch 2 times, most recently from bf373e4 to 57f5e09 Compare April 11, 2023 12:47
@@ -234,6 +234,7 @@ function wait_for_output {
local how_long=$PODMAN_TIMEOUT
local expect=
local cid=
local full_match=
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edsantiago please let me know what you think of these changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is undocumented in the comments below

@edsantiago
Copy link
Member

/hold

This is a huge change, much bigger than initially suggested; I will need time to review it carefully

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 11, 2023
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry... I can't approve the helpers changes. They're risky, undocumented, and actually misleading because they don't do what you think they do. Suggested alternative inline.

@@ -234,6 +234,7 @@ function wait_for_output {
local how_long=$PODMAN_TIMEOUT
local expect=
local cid=
local full_match=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is undocumented in the comments below

test/system/helpers.bash Outdated Show resolved Hide resolved
test/system/252-quadlet.bats Outdated Show resolved Hide resolved
Wait for the expected logs to appear in the journal before using
`journalctl`.  containers#18132 is likely flaking because `journalctl` does
not yet see the container's logs.

Also force the test to use the `passthrough` log driver to make sure
`podman logs` continues being tests.

Fixes: containers#18132
Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member Author

And one more time, @edsantiago :)
Thanks for checking!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [edsantiago,vrothberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@edsantiago
Copy link
Member

Much safer, thank you!

/lgtm
/hold

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 11, 2023
@edsantiago
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 11, 2023
@openshift-merge-robot openshift-merge-robot merged commit 915beba into containers:main Apr 11, 2023
@vrothberg vrothberg deleted the fix-18132 branch April 11, 2023 15:06
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 2, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[sys] quadlet tests: do not see STARTED CONTAINER
4 participants