Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow podman pull to specify --retry and --retry-delay #21659

Merged
merged 1 commit into from
Feb 19, 2024

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented Feb 14, 2024

Fixes: #19770

Does this PR introduce a user-facing change?

podman pull now supports --retry and --retry-delay.

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 14, 2024
@github-actions github-actions bot added the kind/api-change Change to remote API; merits scrutiny label Feb 14, 2024
@rhatdan rhatdan added 5.0 and removed kind/api-change Change to remote API; merits scrutiny labels Feb 14, 2024
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Some comments on tests

run_podman pull -q --retry 4 --retry-delay "10s" $IMAGE
run_podman 125 pull -q --retry 4 --retry-delay "foobar" $IMAGE
is "$output" 'Error: time: invalid duration "foobar"' "bad retry-delay"
run_podman rmi --all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eek, please no, that will force a subsequent re-fetch, which will then introduce flakes

@@ -370,6 +370,13 @@ EOF
run_podman --root $imstore/root rmi --all
}

@test "podman images with retry" {
run_podman pull -q --retry 4 --retry-delay "10s" $IMAGE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing much point to this. However, I think something like this could be made to work?

    (iptables -A OUTPUT -m tcp -p tcp --dport 443 -j REJECT; sleep 10; iptables -D FIXME-FIGURE-OUT-SOME-WAY-To-IDENTIFY-AND-REMOVE-THAT-LAST-RULE) &
    run_podman ... retry ... pull quay.io/thisdoes/not-exist:really
    assert "$output" =~ "Failed, retrying in .* pinging.*connection refused"
    # quay returns 403 for 404
    assert "$output" =~ "unauthorized: access .* not authorized"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good, please never ever block port 443. This is run on developer systems and even in CI it could cause all kinds of havoc, I would guess cirrus agent or whatever this thing is called communicates via some https API so you may end up blocking that as well.

If want an actual test for this then why not setup a local registry and try to pull from that. We can start/stop the registry container as we like to simulate pull errors which would then even work rootless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to add a registry and then pause the registry and do a pull, but the pull just hangs, so not sure if there is a good way to test. PTAL

@github-actions github-actions bot added the kind/api-change Change to remote API; merits scrutiny label Feb 15, 2024
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@edsantiago
Copy link
Member

Sigh. This is why I tried so hard to do it with iptables.

INCOMPLETE SUGGESTION -- NEEDS CLEANUP:

diff --git a/test/system/010-images.bats b/test/system/010-images.bats
index c50d780c6..bb01e6d31 100644
--- a/test/system/010-images.bats
+++ b/test/system/010-images.bats
@@ -396,12 +396,20 @@ EOF
     run_podman rmi $image1
     run_podman pull -q --retry 4 --retry-delay "0s" --authfile=$authfile \
         --tls-verify=false $image1
+    assert "${output:0:12}" = "$PODMAN_TEST_IMAGE_ID" "First pull (before stopping registry)"
     run_podman rmi $image1
-    stop_registry
-    # pause_registry
-    run_podman 125 pull -q --retry 4 --retry-delay "1s" --authfile=$authfile \
+
+    # This actually STOPs the registry, so the port is unbound...
+    pause_registry
+    # ...then, in eight seconds, we start it again
+    (sleep 8; unpause_registry) &
+    run_podman 0+w pull -q --retry 4 --retry-delay "5s" --authfile=$authfile \
             --tls-verify=false $image1
-    # unpause_registry
+    assert "$output" =~ "Failed, retrying in 5s.*Error: initializing.* connection refused"
+    assert "${lines[-1]:0:12}" = "$PODMAN_TEST_IMAGE_ID" "sdfsdf"
+    unpause_registry
+
+    run_podman rmi $image1
 }
 
 @test "podman images with concurrent removal" {
diff --git a/test/system/helpers.registry.bash b/test/system/helpers.registry.bash
index a49371203..b6986e64c 100644
--- a/test/system/helpers.registry.bash
+++ b/test/system/helpers.registry.bash
@@ -122,13 +122,8 @@ function pause_registry() {
         return
     fi
 
-    # For manual debugging; user may request keeping the registry running
-    if [ -n "${PODMAN_TEST_KEEP_LOGIN_REGISTRY}" ]; then
-        skip "[leaving registry running by request]"
-    fi
-
     opts="--storage-driver vfs $(podman_isolation_opts ${PODMAN_LOGIN_WORKDIR})"
-    run_podman $opts pause registry
+    run_podman $opts stop registry
 }
 
 function unpause_registry() {
@@ -137,11 +132,6 @@ function unpause_registry() {
         return
     fi
 
-    # For manual debugging; user may request keeping the registry running
-    if [ -n "${PODMAN_TEST_KEEP_LOGIN_REGISTRY}" ]; then
-        skip "[leaving registry running by request]"
-    fi
-
     opts="--storage-driver vfs $(podman_isolation_opts ${PODMAN_LOGIN_WORKDIR})"
-    run_podman $opts unpause registry
+    run_podman $opts start registry
 }

The gist: use podman stop, not pause. And start. That unbinds the port.

However:

  • Please move this out of 010. Tests move from simple to complicated. This is complicated. Please run this after 150-login.bats. (I will be totally OK if you add it there).
  • Please add nice friendly descriptions to the assertions
  • It might be time to refactor $opts. I can do that as a followon.

Comment on lines 363 to 364
assert "$output" =~ "Failed, retrying in 5s.*Error: initializing.* connection refused"
assert "${lines[-1]:0:12}" = "$PODMAN_TEST_IMAGE_ID" "sdfsdf"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests LGTM, although now I deeply regret saving myself the ten seconds it would've taken to add proper descriptions here....

@rhatdan
Copy link
Member Author

rhatdan commented Feb 17, 2024

@containers/podman-maintainers PTAL

retryDelayFlagName := "retry-delay"
flags.String(retryDelayFlagName, cli.PullPushRetryDelay.String(), "delay between retries in case of pull failures")
_ = cmd.RegisterFlagCompletionFunc(retryDelayFlagName, completion.AutocompleteNone)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK, does Docker have an equivalent?

@TomSweeneyRedHat
Copy link
Member

LGTM
but a hyperv test is kicking up it's heals.

@baude
Copy link
Member

baude commented Feb 18, 2024

this seems like it should be in containers common and not via cli ?

@rhatdan
Copy link
Member Author

rhatdan commented Feb 18, 2024

The CLI is currently in podman build and was asked for here.

@flouthoc
Copy link
Collaborator

Maybe in a different PR but I think podman manifest push/pull needs this as well since we recently added this to buildah here: containers/buildah#5315

Copy link
Collaborator

@flouthoc flouthoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Feb 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan rhatdan added the lgtm Indicates that a PR is ready to be merged. label Feb 19, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 80b1e95 into containers:main Feb 19, 2024
92 checks passed
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label May 20, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators May 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
5.0 approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Change to remote API; merits scrutiny lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Way to specify number of image pull attempts and the pause between attempts
6 participants