Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: pull --retry #14359

Closed
edsantiago opened this issue May 25, 2022 · 8 comments
Closed

RFE: pull --retry #14359

edsantiago opened this issue May 25, 2022 · 8 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

[like #14048 but for pull]

This is causing multiple CI flakes per day:

# podman [options] pull registry.access.redhat.com/ubi8-init
Trying to pull registry.access.redhat.com/ubi8-init:latest...
Error: copying system image from manifest list: Source image rejected: Get "https://access.redhat.com/webassets/docker/content/sigstore/ubi8-init@sha256=f6afbab2349ef86bd4ac0d59ba4b9b5df3e176b4bdeaff643c3e6386a7414c24/signature-2": net/http: TLS handshake timeout

We have no control over broken registries or network hiccups.

Could we just implement podman pull --retry 3 with perhaps an exponential backoff?

@edsantiago edsantiago added the kind/feature Categorizes issue or PR as related to a new feature. label May 25, 2022
@vrothberg
Copy link
Member

Podman is already doing three retries by default (see https://github.com/containers/common/blob/main/libimage/copier.go#L273) but it seem the specific error isn't caught in the retry semantic.

@vrothberg
Copy link
Member

@mtrmac AFAIKS, there's no public error type we could check for, so we may need to string compare. WDYT?

@rhatdan
Copy link
Member

rhatdan commented May 25, 2022

This would certainly make sense to retry on this case.

@mtrmac
Copy link
Collaborator

mtrmac commented May 25, 2022

@mtrmac AFAIKS, there's no public error type we could check for, so we may need to string compare.

What’s the exact type/value? Reading the code, I’d expect this to be an ordinary errors.Wrapf-ed HTTP error, testable via errors.As/errors.Is (In particular, it should not be a signature.PolicyRequirementError.)

Going further, net/http.tlsHandshakeTimeoutError (if that’s what the underlying error is) is not public, but it should be possible to test that via net.Error.Timeout(). Adding that to the retry logic would make sense, I think there’s at least one other report for missing retry logic that suggests testing that condition.

Now, the hard part about all of these reports is reproducing them enough to test the heuristics…

rhatdan added a commit to rhatdan/common that referenced this issue May 26, 2022
[NO NEW TESTS NEEDED] No idea how to cause this situation.

Fixes: containers/podman#14359

Signed-off-by: Daniel J Walsh <[email protected]>
@edsantiago
Copy link
Member Author

It's not just TLS timeout, there's also internal error:

Error: copying system image from manifest list: Source image rejected:
      Get "https://access.redhat.com/webassets/docker/content/sigstore/ubi8-minimal@sha256=9a81cce19ae2a962269d4a7fecd38aec60b852118ad798a265c3f6c4be0df610/signature-3":
       remote error: tls: internal error

(seen here)

rhatdan added a commit to rhatdan/common that referenced this issue May 26, 2022
[NO NEW TESTS NEEDED] No idea how to cause this situation.

Fixes: containers/podman#14359

Signed-off-by: Daniel J Walsh <[email protected]>
rhatdan added a commit to rhatdan/common that referenced this issue May 26, 2022
[NO NEW TESTS NEEDED] No idea how to cause this situation.

Fixes: containers/podman#14359

Signed-off-by: Daniel J Walsh <[email protected]>
rhatdan added a commit to rhatdan/common that referenced this issue May 26, 2022
[NO NEW TESTS NEEDED] No idea how to cause this situation.

Fixes: containers/podman#14359

Signed-off-by: Daniel J Walsh <[email protected]>
rhatdan added a commit to rhatdan/common that referenced this issue May 26, 2022
[NO NEW TESTS NEEDED] No idea how to cause this situation.

Fixes: containers/podman#14359

Signed-off-by: Daniel J Walsh <[email protected]>
rhatdan added a commit to rhatdan/common that referenced this issue May 26, 2022
[NO NEW TESTS NEEDED] No idea how to cause this situation.

Fixes: containers/podman#14359

Signed-off-by: Daniel J Walsh <[email protected]>
rhatdan added a commit to rhatdan/common that referenced this issue May 26, 2022
[NO NEW TESTS NEEDED] No idea how to cause this situation.

Fixes: containers/podman#14359

Signed-off-by: Daniel J Walsh <[email protected]>
rhatdan added a commit to rhatdan/common that referenced this issue May 26, 2022
[NO NEW TESTS NEEDED] No idea how to cause this situation.

Fixes: containers/podman#14359

Signed-off-by: Daniel J Walsh <[email protected]>
rhatdan added a commit to rhatdan/common that referenced this issue May 27, 2022
[NO NEW TESTS NEEDED] No idea how to cause this situation.

Fixes: containers/podman#14359

Signed-off-by: Daniel J Walsh <[email protected]>
@vrothberg
Copy link
Member

NOTE: this is not fixed until c/common@main is vendored into Podman

@praiskup
Copy link

Can we configure the number of retries, and delay somehow? Seems like Mock/Copr users have problems with pull failures, and we could afford even say 30s delays (instead of the default 1s).

@vrothberg
Copy link
Member

@praiskup, no, that's is currently not configurable for Podman. Please open a new issue if you desire such a feature.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 27, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

5 participants