Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto update: return restart error #17796

Merged
merged 2 commits into from
Mar 27, 2023

Conversation

vrothberg
Copy link
Member

Return the error when restarting the unit failed during an update. The task is correctly marked to have failed but we really need to return the error to the user.

@edsantiago 🤞

[NO NEW TESTS NEEDED] - The flakes in #17607 will reveal errors.

Does this PR introduce a user-facing change?

Return errors from systemd when restarting a unit failed during an auto update. 

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 15, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 15, 2023
Return the error when restarting the unit failed during an update.
The task is correctly marked to have failed but we really need to
return the error to the user.

[NO NEW TESTS NEEDED] - The flakes in containers#17607 will reveal errors.

Signed-off-by: Valentin Rothberg <[email protected]>
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edsantiago
Copy link
Member

Well, here's the error message [source]:`

Error: restarting unit container-c_image_OBvhQIJzmV.service during update: expected "done" but received "failed"

@vrothberg
Copy link
Member Author

This is really poking in the dark. I added some more debug logs to the test in hope to get something useful.

Really no idea what's going on on Debian. It's always the same pattern that restarting the systemd units fails during auto update.

Why is it only flaking on Debian? @edsantiago have you seen these tests fail on Fedora as well but maybe at such a low frequency that it fell under the (flake) radar?

Add debug logs from systemctl and journalctl in hope to get more data on
the Debian flakes tracked in containers#17796.

Signed-off-by: Valentin Rothberg <[email protected]>
@edsantiago
Copy link
Member

There are only two Fedora flakes in auto-update, and both are the "1 vs 2" flake:

#|     FAIL: Number of images updated from registry.
#| expected: '2'
#|   actual: '1'

@edsantiago
Copy link
Member

Still happening, still debian. Is there anything in that log that can help you?

@vrothberg
Copy link
Member Author

Still happening, still debian. Is there anything in that log that can help you?

I'd need to clone myself at the moment to find time to look into it, sorry :( Will take a look as soon as I find time.

@vrothberg
Copy link
Member Author

@edsantiago are you cool to merge or shall I remove the echos in the tests?

@edsantiago
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 27, 2023
@vrothberg
Copy link
Member Author

Thanks! My current "feeling" is that after this change we will only see podman auto-update failures as we now return the error and exit non-zero. All other flavors of failures in the tests should disappear.

I will look into the dbus/systemd dependency we use and check whether a bug may linger there. Other than that, we may try to restart the service another time if the first attempt has failed. But that would be my last resort as it feels like giving up.

@openshift-merge-robot openshift-merge-robot merged commit 78f1ebb into containers:main Mar 27, 2023
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 4, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants