Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a retry on failing uploads / podman push #14048

Closed
xrow opened this issue Apr 28, 2022 · 10 comments
Closed

Implement a retry on failing uploads / podman push #14048

xrow opened this issue Apr 28, 2022 · 10 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@xrow
Copy link

xrow commented Apr 28, 2022

/kind feature

Description

I desire the ability to retry on network errors on push via cli parameter or some automatic handling.

Steps to reproduce the issue:

  1. Have a internet connection wiht some package loss

  2. podman push ...

Describe the results you received:

Getting image source signatures
Copying blob sha256:518a144c927e39441b520c6822058018e1ed7aa1c25f461dd226773399a7b580
Copying blob sha256:a4e267c20a44d1b461bff8bba4c5f9dc5878c9282586e1835c873490c27555c0
Copying blob sha256:53494403a146764387ecbbd1b0b99962d2414878c5a6ce6784aa95b631e07e16
Copying blob sha256:700a186f64039635bede8f29c3759e01d050dcaf11a67786a986836b266fb6ef
Copying blob sha256:9e6713d530bf59dd0ce8155e1a48372e2ad1773be06a8087deafeb5ad0fed586
Copying blob sha256:75bfbbe80458d9c3debe3469df4015a72d59c32002232f3efdf4f509f31bb09b
Copying blob sha256:2913c343f3819abe60586ded803cccfb1e0f776b7daec36092a73c36dae70a04
Copying blob sha256:acfababd775298c320d1d2cfa8333c7004ae7e8641346ad71015abaa5812ecf9
Copying blob sha256:54867274c4d15ca35da1c6b08f37c7b046fb569908aa9b84641a55c84f10f237
Copying blob sha256:48148327ed2cfbd80fb98ace50cceeccad2b26fed70ad5285e05f2655fe5bbcf
Error: writing blob: Patch "https://registry.gitlab.com/v2/xrow-public/repository/ibexa-experience/blobs/uploads/0ff84441-86eb-4e99-92c4-f43b507270df?_state=SFD3hLF07NgdrS-0gfZ3HUEcBqHa564iLIrHdOMu-Zx7Ik5hbWUiOiJ4cm93LXB1YmxpYy9yZXBvc2l0b3J5L2liZXhhLWV4cGVyaWVuY2UiLCJVVUlEIjoiMGZmODQ0NDEtODZlYi00ZTk5LTkyYzQtZjQzYjUwNzI3MGRmIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIyLTA0LTI4VDA3OjMyOjEzLjc5NTg3MjM0N1oifQ%3D%3D": write tcp 10.88.106.240:33858->35.227.35.254:443: use of closed network connection

Describe the results you expected:

Ability to retry on network errors via parameter.

Additional information you deem important (e.g. issue happens only occasionally):

As said before. You need a problematic connection to the server. Maybe you can try to pull the network cable in and out to simulate.

Output of podman version:

[root@localhost helm-ezplatform]# podman version
Client:       Podman Engine
Version:      4.0.2
API Version:  4.0.2
Go Version:   go1.17.5

Built:      Fri Mar 18 12:51:13 2022
OS/Arch:    linux/amd64

Output of podman info --debug:

[root@localhost helm-ezplatform]# podman info --debug
host:
  arch: amd64
  buildahVersion: 1.24.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: 8ef5de138efb6f0aad657082cdea22cf037792cb'
  cpus: 1
  distribution:
    distribution: '"centos"'
    version: "9"
  eventLogger: journald
  hostname: localhost
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-75.el9.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 129245184
  memTotal: 6087335936
  networkBackend: cni
  ociRuntime:
    name: crun
    package: crun-1.4.4-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.4
      commit: 6521fcc5806f20f6187eb933f9f45130c86da230
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-4.el9.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 73h 11m 50.5s (Approximately 3.04 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 30
    paused: 0
    running: 0
    stopped: 30
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 24
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.0.2
  Built: 1647604273
  BuiltTime: Fri Mar 18 12:51:13 2022
  GitCommit: ""
  GoVersion: go1.17.5
  OsArch: linux/amd64
  Version: 4.0.2

Package info (e.g. output of rpm -q podman or apt list podman):

[root@localhost helm-ezplatform]# rpm -q podman
podman-4.0.2-3.el9.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

No

Additional environment details (AWS, VirtualBox, physical, etc.):

VirtualBox

@openshift-ci openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 28, 2022
@mheon
Copy link
Member

mheon commented Apr 28, 2022

@vrothberg PTAL. I know we've talked about this before but I can't recall what the result was.

@vrothberg
Copy link
Member

Podman has three retries by default (also for pushing). I need to take a closer look at the exact error and at the retry semantics but either the retries are not triggered by the code, or the connection is just to unstable.

Podman will retry the entire push operation though. If the image is sufficiently big, or the connection sufficiently flaky, we may have a hard time.

@mtrmac, did we ever consider lifting the retries to the granularity of blobs?

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 29, 2022

This particular error looks like net.ErrClosed, i.e. a logic bug of using a connection after Close() was called. It’s not immediately obvious to me whether that’s a bug in c/image or in the standard library, though it does look closer to the latter.

I think we wouldn’t retry on that in any case.

Does it always fail this way?


Per-blob retries were ~infeasible with the public c/image API. Now that we have c/image/internal/private, it should be quite easy for transports to opt in. (We still wouldn’t retry on this particular error.)

@xrow
Copy link
Author

xrow commented May 2, 2022

@mtrmac I must admit it happens often. I always crashes the same way. I think the big question I have is if it is a bug or if you recommend me to create some retry outside of podman?

@mtrmac
Copy link
Collaborator

mtrmac commented May 2, 2022

At this point I think it’s a bug that should be fixed rather than hidden by retries … OTOH I appreciate that doesn’t get you to a working pull behavior immediately.

@xrow
Copy link
Author

xrow commented May 3, 2022

@mtrmac Thanks for your insides. I case you need anything additional from my end let me know and just write here. Otherwise I just await your results.

@github-actions
Copy link

github-actions bot commented Jun 3, 2022

A friendly reminder that this issue had no activity for 30 days.

@xrow
Copy link
Author

xrow commented Jun 22, 2022

Hi,

please have a look. I have a similar thing now. My podman man is more or less the lastest stable docker container.

[37b68beb5edacb8e36f49d664b59ecae41cc074e] Pushing registry.gitlab.com/xrow-shared/helm-ezplatform/solr:37b68beb5edacb8e36f49d664b59ecae41cc074e
Getting image source signatures
Error: trying to reuse blob sha256:f9912315b112f05355b86f5594c9c0a2fc988036a875207b0cd69eaea628686a at destination: pinging container registry registry.gitlab.com: Get "https://registry.gitlab.com/v2/": net/http: TLS handshake timeout

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jul 25, 2022

I am going to close this issue, since there is a retry in Podman now. There are lots of discussions on improving this at containers/image.

#14616

@rhatdan rhatdan closed this as completed Jul 25, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

No branches or pull requests

5 participants