Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Pod removal after OS hard shutdown #15654

Conversation

tyler92
Copy link
Contributor

@tyler92 tyler92 commented Sep 6, 2022

In case of a hard OS shutdown, containers may have a "removing"
state after a reboot, and an attempt to remove Pods with such
containers is unsuccessful:

error freeing lock for container ...: no such file or directory

[NO NEW TESTS NEEDED]

Signed-off-by: Mikhail Khachayants [email protected]

Does this PR introduce a user-facing change?

Fixed Pods removal error after OS reboot

@tyler92
Copy link
Contributor Author

tyler92 commented Sep 6, 2022

Not sure about tests and release notes.

@rhatdan
Copy link
Member

rhatdan commented Sep 6, 2022

LGTM
@mheon PTAL

@@ -797,7 +797,7 @@ func (r *Runtime) removeContainer(ctx context.Context, c *Container, force, remo
}

// Deallocate the container's lock
if err := c.lock.Free(); err != nil {
if err := c.lock.Free(); err != nil && !os.IsNotExist(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still warn on os.IsNotExist()? If this happens in a case other than a reboot, it's an indication something could be seriously wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean Warning in log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a new variant.

@mheon
Copy link
Member

mheon commented Sep 6, 2022

I don't think we can test this. Release note would be appreciated, something simple about fixing a bug where pods could error on removal after reboot?

In case of a hard OS shutdown, containers may have a "removing"
state after a reboot, and an attempt to remove Pods with such
containers is unsuccessful:

error freeing lock for container ...: no such file or directory

[NO NEW TESTS NEEDED]

Signed-off-by: Mikhail Khachayants <[email protected]>
@tyler92 tyler92 force-pushed the fix-ctr-remove-after-power-off branch from 4f53fe9 to 9585147 Compare September 6, 2022 17:41
@tyler92
Copy link
Contributor Author

tyler92 commented Sep 6, 2022

Release note would be appreciated, something simple about fixing a bug where pods could error on removal after reboot?

I've added "Fixed Pods removal error after OS reboot". I'm not very familiar with the release notes style, so please suggest another text if it's not good enough.

@mheon
Copy link
Member

mheon commented Sep 6, 2022

Release note is good enough, gives me enough crumbs to figure out what's going on. Code change LGTM.

/approve
/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 6, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 6, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon, tyler92

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 6, 2022
@rhatdan
Copy link
Member

rhatdan commented Sep 6, 2022

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 6, 2022
@openshift-merge-robot openshift-merge-robot merged commit ea3e7ef into containers:main Sep 6, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants