Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman inspect: error getting graph driver info: stat/ENOENT #11392

Closed
edsantiago opened this issue Sep 1, 2021 · 3 comments · Fixed by #11476
Closed

podman inspect: error getting graph driver info: stat/ENOENT #11392

edsantiago opened this issue Sep 1, 2021 · 3 comments · Fixed by #11476
Assignees
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

New flake seen in f33 gating tests:

not ok 184 podman auto-update - label io.containers.autoupdate=image
# (from function `die' in file ./helpers.bash, line 431,
#  from function `_confirm_update' in file ./255-auto-update.bats, line 104,
#  in test file ./255-auto-update.bats, line 128)
#   `_confirm_update $cname $ori_image' failed with status 125
# # podman rm --all --force
# # podman ps --all --external --format {{.ID}} {{.Names}}
# # podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
# # podman pull quay.io/libpod/testimage:20210610
# Trying to pull quay.io/libpod/testimage:20210610...
# Getting image source signatures
# Copying blob sha256:9afcdfe780b4ea44cc52d22e3f93ccf212388a90370773571ce034a62e14174e
# Copying blob sha256:9afcdfe780b4ea44cc52d22e3f93ccf212388a90370773571ce034a62e14174e
# Copying config sha256:9f9ec7f2fdef9168f74e9d057f307955db14d782cff22ded51d277d74798cb2f
# Writing manifest to image destination
# Storing signatures
# 9f9ec7f2fdef9168f74e9d057f307955db14d782cff22ded51d277d74798cb2f
# # podman tag quay.io/libpod/testimage:20210610 quay.io/libpod/alpine:latest
# # podman run -d --name c_image_c0EAPOVtq2 --label io.containers.autoupdate=image quay.io/libpod/alpine:latest top -d 120
# 88d82c506bd8130b2eb9ba4ba6777acca668cad34eab05e70043eea47cde502b
# # podman generate systemd --new --files --name c_image_c0EAPOVtq2
# /run/systemd/system/container-c_image_c0EAPOVtq2.service
# # podman rm -f c_image_c0EAPOVtq2
# 88d82c506bd8130b2eb9ba4ba6777acca668cad34eab05e70043eea47cde502b
# â—� container-c_image_c0EAPOVtq2.service - Podman container-c_image_c0EAPOVtq2.service
#      Loaded: loaded (/run/systemd/system/container-c_image_c0EAPOVtq2.service; disabled; vendor preset: disabled)
#      Active: active (running) since Tue 2021-08-31 22:27:20 UTC; 15ms ago
#        Docs: man:podman-generate-systemd(1)
#     Process: 161139 ExecStartPre=/bin/rm -f /run/container-c_image_c0EAPOVtq2.service.ctr-id (code=exited, status=0/SUCCESS)
#    Main PID: 161219 (conmon)
#       Tasks: 11 (limit: 4585)
#      Memory: 25.7M
#         CPU: 224ms
#      CGroup: /system.slice/container-c_image_c0EAPOVtq2.service
#              ├─161140 /usr/bin/podman run --cidfile=/run/container-c_image_c0EAPOVtq2.service.ctr-id --sdnotify=conmon --cgroups=no-conmon --rm --replace -d --name c_image_c0EAPOVtq2 --label io.containers.autoupdate=image quay.io/libpod/alpine:latest top -d 120
#              └─161219 /usr/bin/conmon --api-version 1 -c ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4 -u ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4/userdata -p /run/containers/storage/overlay-containers/ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4/userdata/pidfile -n c_image_c0EAPOVtq2 --exit-dir /run/libpod/exits --full-attach -s -l k8s-file:/var/lib/containers/storage/overlay-containers/ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4/userdata/ctr.log --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4/userdata/oci-log --conmon-pidfile /run/containers/storage/overlay-containers/ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4
# 
# Aug 31 22:27:20 ip-172-31-17-49.us-east-2.compute.internal systemd[1]: Starting Podman container-c_image_c0EAPOVtq2.service...
# Aug 31 22:27:20 ip-172-31-17-49.us-east-2.compute.internal podman[161140]: 2021-08-31 22:27:20.643400321 +0000 UTC m=+0.069787450 container create ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4 (image=quay.io/libpod/alpine:latest, name=c_image_c0EAPOVtq2, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=container-c_image_c0EAPOVtq2.service, created_at=2021-06-10T18:55:36Z, created_by=test/system/build-testimage, io.buildah.version=1.21.0)
# Aug 31 22:27:20 ip-172-31-17-49.us-east-2.compute.internal podman[161140]: 2021-08-31 22:27:20.617761729 +0000 UTC m=+0.044148872 image pull  quay.io/libpod/alpine:latest
# Aug 31 22:27:20 ip-172-31-17-49.us-east-2.compute.internal podman[161140]: 2021-08-31 22:27:20.816040972 +0000 UTC m=+0.242428114 container init ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4 (image=quay.io/libpod/alpine:latest, name=c_image_c0EAPOVtq2, created_at=2021-06-10T18:55:36Z, created_by=test/system/build-testimage, io.buildah.version=1.21.0, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=container-c_image_c0EAPOVtq2.service)
# Aug 31 22:27:20 ip-172-31-17-49.us-east-2.compute.internal systemd[1]: Started Podman container-c_image_c0EAPOVtq2.service.
# Aug 31 22:27:20 ip-172-31-17-49.us-east-2.compute.internal podman[161140]: 2021-08-31 22:27:20.832669296 +0000 UTC m=+0.259056443 container start ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4 (image=quay.io/libpod/alpine:latest, name=c_image_c0EAPOVtq2, created_at=2021-06-10T18:55:36Z, created_by=test/system/build-testimage, io.buildah.version=1.21.0, io.containers.autoupdate=image, PODMAN_SYSTEMD_UNIT=container-c_image_c0EAPOVtq2.service)
# Aug 31 22:27:20 ip-172-31-17-49.us-east-2.compute.internal podman[161140]: ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4
# # podman inspect --format {{.Image}} c_image_c0EAPOVtq2
# 9f9ec7f2fdef9168f74e9d057f307955db14d782cff22ded51d277d74798cb2f
# # podman auto-update --dry-run --format {{.Unit}},{{.Image}},{{.Updated}},{{.Policy}}
# container-c_image_c0EAPOVtq2.service,quay.io/libpod/alpine:latest,pending,registry
# # podman auto-update --format {{.Unit}},{{.Image}},{{.Updated}},{{.Policy}}
# Trying to pull quay.io/libpod/alpine:latest...
# Getting image source signatures
# Copying blob sha256:9d16cba9fb961d1aafec9542f2bf7cb64acfc55245f9e4eb5abecd4cdc38d749
# Copying blob sha256:9d16cba9fb961d1aafec9542f2bf7cb64acfc55245f9e4eb5abecd4cdc38d749
# Copying config sha256:961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4
# Writing manifest to image destination
# Storing signatures
# container-c_image_c0EAPOVtq2.service,quay.io/libpod/alpine:latest,true,registry
# # podman inspect --format {{.Image}} c_image_c0EAPOVtq2
# Error: error getting graph driver info "ec9e524beb3548ceea485e9e08def02d735d3a84884e2593f3ddef8c60a5d6a4": stat /var/lib/containers/storage/overlay/97f46bcc42af56b28259152676243358657e4f043a557ea98e93245606ffb281: no such file or directory
# [ rc=125 ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: podman inspect c_image_c0EAPOVtq2 failed unexpectedly
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There are already two known errors that we turn our head away from in this test:

while [[ $timeout -gt 0 ]]; do
run_podman '?' inspect --format "{{.Image}}" $cname
if [[ $status != 0 ]]; then
if [[ $output =~ (no such object|does not exist in database): ]]; then
# this is ok, it just means the container is being restarted
:
else
die "podman inspect $cname failed unexpectedly"
fi

The string "error getting graph driver info" does not appear in Cirrus CI. This is the only instance I've seen.

I would like to suggest that all these race conditions be fixed, period. If these errors happen in gating tests, they will happen to end users, too, and they will waste time worrying and tracking them down. That is rude. Can we please add mechanisms so that these errors do not happen?

@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Sep 1, 2021
@vrothberg vrothberg changed the title podman auto-update: error getting graph driver info: stat/ENOENT podman inspect: error getting graph driver info: stat/ENOENT Sep 2, 2021
@vrothberg
Copy link
Member

Changing to podman inspect to prevent it from landing unconditionally on my table. The flake is in the auto-update tests but the error happens during podman inspect.

@vrothberg
Copy link
Member

The issue is fairly easy to reproduce:

  • Terminal 1: while true; do ./bin/podman run --rm --replace --name test busybox sleep 0.2; done
  • Terminal 2: while true; do ./bin/podman inspect test > /dev/null; done

Took 5 seconds until hitting:

Error: error inspecting object: no such object: "test"
Error: error inspecting object: no such object: "test"
Error: container 2fce48237cd11f5e79fdee9c8816574b42c9f411b541cb80851d89834a44a98f does not exist in database: no such container

@vrothberg
Copy link
Member

That was surprisingly straight-forward to fix -> #11476

vrothberg added a commit to vrothberg/libpod that referenced this issue Sep 8, 2021
Improve the error handling of `container inspect` to properly handle
when the container has been removed _between_ the lookup and the
inspect.  That will yield the correct "no such object" error message in
`inspect`.

[NO TESTS NEEDED] since I do not know have a reliable and cheap
reproducer.  It's fixing a CI flake, so there's already an indicator.

Fixes: containers#11392
Signed-off-by: Valentin Rothberg <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants