Add service ctr cleanup to PlayKubeDown #17821

umohnani8 · 2023-03-16T15:31:46Z

Since we can't guarantee when the worker queue will come
and clean up the service container in the remote case when
podman kube play --wait is called, cleanup the service container
at the end of PlayKubeDown() to ensure that it is removed right
after all the containers, pods, volumes, etc are removed.

[NO NEW TESTS NEEDED]

Fixes #17803
Foxes #17820

Signed-off-by: Urvashi Mohnani [email protected]

Does this PR introduce a user-facing change?

None

umohnani8 · 2023-03-16T15:34:06Z

@edsantiago I think this should fix the race. I will re-run the test a bunch of times in the f37 aarch64 root environment to verify.

edsantiago · 2023-03-16T17:24:35Z

This may be a stupid question, but what exactly is the purpose of --wait if it doesn't actually wait?

If --wait only guarantees that the containers are stopped, then maybe the fix is to remove -a from the podman ps -aq at the end? (Actually, removing the -q would be pretty helpful for future errors, too. --noheading would be better).

If --wait should guarantee that the containers are removed, then maybe these flakes are actually showing a real bug that needs to be fixed?

Either way, I think the documentation needs to be fixed to specify what --wait is intended to do.

umohnani8 · 2023-03-16T17:29:51Z

--wait is supposed to clean up the resources created once the pods have exited, so the -a is needed to verify that. I have tested this countless times and the resources have always been removed. I have no way of really reproducing this flake given that is only happening in one environment. My guess it that remote might be a bit slow in this environment in finishing the removal by the time we do a ps -aq to verify it.
I can add some more debugs to the test to see what may be happening if this flake continues to happen after this patch.

rhatdan · 2023-03-17T12:13:13Z

Seems most likely that the containers were marked for removal but not fully removed. Is the remote side waiting for the content to be removed? Is there a way to tell containers to be removed and return without waiting for them to be removed?

Luap99 · 2023-03-17T14:13:42Z

I think the problem is that the serviceContainer is removed via worker queue. This is not a problem for local podman because it waits for all queue jobs to be completed before it exits. However in the remote case the service will finish the API response but there is no way of controlling what jobs were done by the worker queue.

I think the client should make the equivalent to podman wait --condition removing serviceContianer in this case to ensure everything is cleaned up before it exits.

cc @vrothberg

vrothberg · 2023-03-20T08:02:13Z

I think the problem is that the serviceContainer is removed via worker queue. This is not a problem for local podman because it waits for all queue jobs to be completed before it exits. However in the remote case the service will finish the API response but there is no way of controlling what jobs were done by the worker queue.

Very thorough analysis, @Luap99!

I think the client should make the equivalent to podman wait --condition removing serviceContianer in this case to ensure everything is cleaned up before it exits.

Alternatively, the service container could be removed in PlayKubeDown() in the backend. Note that it must be removed last and after all containers, networks, volumes, etc.

umohnani8 · 2023-03-20T14:52:44Z

Thanks @vrothberg and @Luap99 - added service container clean up to PlayKubeDown()

pkg/domain/infra/abi/play.go

Since we can't guarantee when the worker queue will come and clean up the service container in the remote case when podman kube play --wait is called, cleanup the service container at the end of PlayKubeDown() to ensure that it is removed right after all the containers, pods, volumes, etc are removed. [NO NEW TESTS NEEDED] Signed-off-by: Urvashi Mohnani <[email protected]>

Luap99

LGTM

vrothberg

/lgtm

openshift-ci · 2023-03-21T08:50:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: umohnani8, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [umohnani8,vrothberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 16, 2023

umohnani8 force-pushed the detach branch from dec90de to 5faff74 Compare March 16, 2023 17:57

umohnani8 force-pushed the detach branch from 5faff74 to e519d5b Compare March 20, 2023 14:50

umohnani8 changed the title ~~Fix wait test to avoid race~~ Add service ctr cleanup to PlayKubeDown Mar 20, 2023

vrothberg reviewed Mar 20, 2023

View reviewed changes

pkg/domain/infra/abi/play.go Outdated Show resolved Hide resolved

umohnani8 force-pushed the detach branch from e519d5b to 3e405a2 Compare March 20, 2023 15:55

Luap99 reviewed Mar 20, 2023

View reviewed changes

vrothberg approved these changes Mar 21, 2023

View reviewed changes

openshift-ci bot assigned vrothberg Mar 21, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 21, 2023

openshift-merge-robot merged commit d8265f0 into containers:main Mar 21, 2023

edsantiago mentioned this pull request Mar 28, 2023

Add debug to --wait test #17964

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 5, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add service ctr cleanup to PlayKubeDown #17821

Add service ctr cleanup to PlayKubeDown #17821

umohnani8 commented Mar 16, 2023 •

edited

Loading

umohnani8 commented Mar 16, 2023

edsantiago commented Mar 16, 2023

umohnani8 commented Mar 16, 2023

rhatdan commented Mar 17, 2023

Luap99 commented Mar 17, 2023

vrothberg commented Mar 20, 2023

umohnani8 commented Mar 20, 2023

Luap99 left a comment

vrothberg left a comment

openshift-ci bot commented Mar 21, 2023

Add service ctr cleanup to PlayKubeDown #17821

Add service ctr cleanup to PlayKubeDown #17821

Conversation

umohnani8 commented Mar 16, 2023 • edited Loading

Does this PR introduce a user-facing change?

umohnani8 commented Mar 16, 2023

edsantiago commented Mar 16, 2023

umohnani8 commented Mar 16, 2023

rhatdan commented Mar 17, 2023

Luap99 commented Mar 17, 2023

vrothberg commented Mar 20, 2023

umohnani8 commented Mar 20, 2023

Luap99 left a comment

Choose a reason for hiding this comment

vrothberg left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Mar 21, 2023

umohnani8 commented Mar 16, 2023 •

edited

Loading