Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote run: fix error checks #7561

Merged
merged 1 commit into from
Sep 11, 2020

Conversation

vrothberg
Copy link
Member

@vrothberg vrothberg commented Sep 8, 2020

As error types are not preserved on the client side (due to marshaling),
we cannot use errors.Cause(...) and friends but, unfortunately, have
to fall back to looking for substring the error messages.

Change the error checks in remote run to do substring matches and fix
issue #7340.

Fixes: #7340
Signed-off-by: Valentin Rothberg [email protected]

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 8, 2020
Copy link
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other the tini nit.

pkg/domain/infra/tunnel/containers.go Outdated Show resolved Hide resolved
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@QiWang19
Copy link
Contributor

QiWang19 commented Sep 8, 2020

LGTM

@rhatdan
Copy link
Member

rhatdan commented Sep 8, 2020

/lgtm
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 8, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 8, 2020
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Sep 8, 2020
@vrothberg
Copy link
Member Author

@edsantiago PTAL

@edsantiago
Copy link
Member

@vrothberg did you try the reproducer in #7340? I'm still seeing it:

$ while :;do got=$(./bin/podman-remote run --rm -v testvol1:/etc/apk alpine_labels ls /etc/apk); test "$got" = "$expect" || echo "got: $got";done
Error: no container with name or ID 447c18f8966f912c98e113257a980351c94e47c0d4c5692cd83efc011058b6ef found: no such container
Error: no container with name or ID be5a8aa6cc0be4c6f5d7add69e748498ee29fd2c465e399a14ac678ed8d4a40f found: no such container
Error: no container with name or ID 354abc85b5046fcb82bed15099f4c0cfd9cf4a7004dd57cb787daf0dee75bb7e found: no such container

@vrothberg
Copy link
Member Author

@vrothberg did you try the reproducer in #7340? I'm still seeing it:

I ran it before for 20 minutes and didn't catch it anymore. I will recheck, maybe the last change screwed things up again. Thanks for checking.

/hold

@vrothberg
Copy link
Member Author

/hold

@TomSweeneyRedHat
Copy link
Member

and some possible test issues.

@vrothberg
Copy link
Member Author

Looks like there's more unpack. Both, tunnel and abi, are doing (if opts.Rm):

if err := ic.Libpod.RemoveContainer(ctx, ctr, false, true); err != nil {       
        if errors.Cause(err) == define.ErrNoSuchCtr ||                         
                errors.Cause(err) == define.ErrCtrRemoved {                    
                logrus.Errorf("Container %s does not exist: %v", ctr.ID(), err)
        } else {                                                               
                logrus.Errorf("Error removing container %s: %v", ctr.ID(), err)
        }                                                                      
}                                                                              

Both are suspect to a race with the podman container cleanup --rm process. However, the tunnel parts run into the additional issue that we're not preserving the error types (we're losing them over the wire).

@mheon, do you know if the seemingly redundant removal in ContainerRun serves a purpose I may not be aware of? Seeing the CI failures, there might be something going on with volumes. Are (libpod).RemoveContainer() and container cleanup --rm identical?

@mheon
Copy link
Member

mheon commented Sep 9, 2020

@vrothberg Basically, we have to be sure that the container is completely removed when the podman run process exits to prevent races - but we also need the cleanup process, because the podman run process can be killed, or the user can manually detach causing it to exit. So we need both. As you surmise, RemoveContainer() and cleanup --rm are the same thing - we're just invoking in duplicate to be certain the container is removed.

Generally speaking, the increasing priority of terminal-interactive tasks means that in the local case podman run will run before the cleanup process, so it will be able to call RemoveContainer() first. The cleanup process will fail, but it's not really logging its results anywhere, so that's perfectly fine. I imagine remote makes this considerably more complicated...

@vrothberg
Copy link
Member Author

Thanks, @mheon! Once #7574 is merged, I rebase and rework this PR.

As error types are not preserved on the client side (due to marshaling),
we cannot use `errors.Cause(...)` and friends but, unfortunately, have
to fall back to looking for substring the error messages.

Change the error checks in remote run to do substring matches and fix
issue containers#7340.

Fixes: containers#7340
Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg vrothberg changed the title remote run fixes remote run: fix error checks Sep 11, 2020
@vrothberg
Copy link
Member Author

On top of the latest changes, it was sufficient to just fix the error checking.

@mheon @rhatdan PTanotherL

@rhatdan
Copy link
Member

rhatdan commented Sep 11, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 11, 2020
@vrothberg
Copy link
Member Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 11, 2020
@openshift-merge-robot openshift-merge-robot merged commit 8f0a53c into containers:master Sep 11, 2020
@vrothberg vrothberg deleted the fix-7340 branch September 11, 2020 15:05
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

podman-remote: run --rm: Error removing container
8 participants