Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's baaaack: podman images: Error: top layer [...] not found in layer tree #8148

Closed
edsantiago opened this issue Oct 26, 2020 · 27 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

Looks like a new variant of #7444, but am opening a new PR.

I seem to have gotten myself into a similar state again:

$ ./bin/podman images
Error: top layer 1f832d5208105d5dde3f814d391ff7b4ddb557fd3bdbcb79418906242772dc73 of image a7a37f74ff864eec55891b64ad360d07020827486e30a92ea81d16459645b26a not found in layer tree

$ ./bin/podman images -a
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x15054fa]

goroutine 1 [running]:
github.com/containers/podman/v2/pkg/domain/infra/abi.(*ImageEngine).List(0xc0006c64c0, 0x1dc02a0, 0xc00001c1b0, 0xc000226d01, 0x2ad3878, 0x0, 0x0, 0xc000512000, 0xc0004afc88, 0xa1a754, ...)
        pkg/domain/infra/abi/images_list.go:52 +0x43a
github.com/containers/podman/v2/cmd/podman/images.images(0x2a1d900, 0xc0006cb0b0, 0x0, 0x1, 0x0, 0x0)
        cmd/podman/images/list.go:98 +0x10f
github.com/spf13/cobra.(*Command).execute(0x2a1d900, 0xc0000ba170, 0x1, 0x1, 0x2a1d900, 0xc0000ba170)
        vendor/github.com/spf13/cobra/command.go:850 +0x453
github.com/spf13/cobra.(*Command).ExecuteC(0x2a2c000, 0xc0000cc010, 0x184bce0, 0x2ad3878)
        vendor/github.com/spf13/cobra/command.go:958 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
        vendor/github.com/spf13/cobra/command.go:895
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        vendor/github.com/spf13/cobra/command.go:888
main.Execute()
        cmd/podman/root.go:88 +0xec
main.main()
        cmd/podman/main.go:78 +0x18c

Smoking gun seems to be: running BATS tests while also watching Tech Talk (heavy bandwidth hog). Test was pulling a huge image (quay.io/libpod/fedora:31), got interrupted by timeout, everything exploded after that. Nothing works any more.

rootless, master @ 01f642f

@edsantiago edsantiago added the kind/bug Categorizes issue or PR as related to a bug. label Oct 26, 2020
@rhatdan
Copy link
Member

rhatdan commented Oct 27, 2020

Can you remove the bad image?

@edsantiago
Copy link
Member Author

No:

$ podman rmi a7a3
Error: 1 error occurred:
        * top layer 1f832d5208105d5dde3f814d391ff7b4ddb557fd3bdbcb79418906242772dc73 of image a7a37f74ff864eec55891b64ad360d07020827486e30a92ea81d16459645b26a not found in layer tree
$ podman rmi -a -f
Error: 3 errors occurred:
        * top layer 1f832d5208105d5dde3f814d391ff7b4ddb557fd3bdbcb79418906242772dc73 of image a7a37f74ff864eec55891b64ad360d07020827486e30a92ea81d16459645b26a not found in layer tree
        * top layer 1f832d5208105d5dde3f814d391ff7b4ddb557fd3bdbcb79418906242772dc73 of image a7a37f74ff864eec55891b64ad360d07020827486e30a92ea81d16459645b26a not found in layer tree
        * unable to delete all images, check errors and re-run image removal if needed
$ podman rmi 1f8
Error: 1 error occurred:
        * unable to find a name and tag match for 1f8 in repotags: no such image

I assume that removing ~/.local/share/containers will fix it, but I kind of think that we should offer a more courteous solution to end users.

@rhatdan
Copy link
Member

rhatdan commented Oct 27, 2020

@nalind Can't we get this rm to work.

@edsantiago
Copy link
Member Author

Uh-oh. I just had a flake in one of my PRs with exactly the same symptom: podman pull timed out, left the entire system in an unusable state.

@rhatdan
Copy link
Member

rhatdan commented Oct 28, 2020

It looks like you can edit $HOME/.local/share/containers/storage/overlay-images/images.json and remove the bad image from the json file and get your images working again.
Need to get this to work with podman rmi IMAGEID.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Dec 1, 2020

This is an important issue for us to fix,or at least have a simple way of cleaning this up, other then destroy all containers.

@TriplEight
Copy link

Just had to manually remove literally every "layer":".*" from ~/.local/share/containers/storage/vfs-images/images.json

@qhaas
Copy link

qhaas commented Dec 18, 2020

Seeing the following when trying to list images built with BUILDAH_FORMAT=docker set:

$ podman --version
podman version 2.0.5
$ buildah --version
buildah version 1.15.1 (image-spec 1.0.1-dev, runtime-spec 1.0.2-dev)
$ skopeo --version
skopeo version 1.1.1
$ podman images
Error: top layer 174f5685490326fc0a1c0f5570b8663732189b327007e47ff13d2ca59673db02 of image 8652b9f0cb4c0599575e5a003f5906876e10c1ceb2ab9fe1786712dac14a50cf not found in layer tree
$ podman system prune -af
Error: unable to get images to prune: top layer 174f5685490326fc0a1c0f5570b8663732189b327007e47ff13d2ca59673db02 of image 8652b9f0cb4c0599575e5a003f5906876e10c1ceb2ab9fe1786712dac14a50cf not found in layer tree
$ buildah rmi -a
error removing image "8652b9f0cb4c0599575e5a003f5906876e10c1ceb2ab9fe1786712dac14a50cf": unable to retrieve information about layer 174f5685490326fc0a1c0f5570b8663732189b327007e47ff13d2ca59673db02 from store: layer not known
...

Ended up just sudo rm -rf ~/.local/share/containers since I was getting rm: cannot remove ... Permission denied errors without sudo.

@rhatdan
Copy link
Member

rhatdan commented Dec 21, 2020

Could you give us the exact steps to recreate?

@qhaas
Copy link

qhaas commented Dec 21, 2020

Could you give us the exact steps to recreate?

My above issue with BUILDAH_FORMAT=docker appears to be a fluke, having clean-slated the folders that podman creates in one's home directory and rebuilt my images, it did not reoccur. I also attempted to replicate in a freshly spun-up VM without success. Hopefully, it doesn't manifest again...

@rhatdan
Copy link
Member

rhatdan commented Dec 22, 2020

Ok closing, reopen if it happens again

@rhatdan rhatdan closed this as completed Dec 22, 2020
@Snapstromegon
Copy link

Hey, I just stunmbled upon this:

I tried running the docker io node-red container on a raspberry pi 4 with Ubuntu 20.10 and podman 2.0.6 installed from default apt repos.

$ podman --version
podman version 2.0.6
$ buildah --version
buildah version 1.15.2 (image-spec 1.0.1, runtime-spec 1.0.2-dev)
$ podman images
Error: top layer 705c134cea6ac9d42812b58b50f4e2c01143c7e9e51067c073bf738b1eefbb3d of image fef71ae66fcf1571ab0d3e893ae853dc0e14ba63de05d0bfa44bb5765cade251 not found in layer tree

Things I did:

  1. podman pull nodered/node-red -> stuck on Storing signatures
  2. Ctrl + C
  3. Rerun the pull -> Error: error creating container storage: size for layer "705c134cea6ac9d42812b58b50f4e2c01143c7e9e51067c073bf738b1eefbb3d" is unknown, failing getSize()

I resolved this by doing the fix from above, but I'll leave this here for future reference.

@jharrington22
Copy link

I too just ran across this error @nalind (hi!)

$ podman --version
podman version 2.1.1
$ podman images
Error: top layer 5d6d8687c4a028c69d16dcd730084d6996490fd41556dbdc065ebac533204f2a of image a78267678b7e6e849c7e960b09227b737a38d5073a5071b041a16bd4b609ef92 not found in layer tree

Not sure if this helps but

cat ~/.local/share/containers/storage/overlay-images/a78267678b7e6e849c7e960b09227b737a38d5073a5071b041a16bd4b609ef92/manifest 
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 1994,
      "digest": "sha256:a78267678b7e6e849c7e960b09227b737a38d5073a5071b041a16bd4b609ef92"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 63670669,
         "digest": "sha256:f147208a1e03a819ae351de51e039b1d5ba3fb18b09fe213dd04324149cc71e6"
      }
   ]
}

and

ls -al ~/.local/share/containers/storage/overlay-images/a78267678b7e6e849c7e960b09227b737a38d5073a5071b041a16bd4b609ef92/
total 72
drwx------.  2 jamesh jamesh  4096 Feb 12 10:04  .
drwx------. 24 jamesh jamesh 49152 Feb 12 10:04  ..
-rw-------.  1 jamesh jamesh   529 Feb 12 10:04 '=bWFuaWZlc3Qtc2hhMjU2OjExMjE2ZWY1NDZiNWJiMDcyZjY2MmE4MTk0YmZmNzE5YTk1NDE2OWMyY2FjMzc5NDQ4OTM5MWFjNTAxNmM2NTU='
-rw-------.  1 jamesh jamesh  1201 Feb 12 10:04 '=bWFuaWZlc3Qtc2hhMjU2OjMxNjk3NjkwZTY0MWU4YjJlMzQwZWNhZmI0ZTg3OWYxODlmNGIyNDVkNGQwZjRlYjE3YWZhMjk3OTg4MmQ5ZGI='
-rw-------.  1 jamesh jamesh  1994 Feb 12 10:04 '=c2hhMjU2OmE3ODI2NzY3OGI3ZTZlODQ5YzdlOTYwYjA5MjI3YjczN2EzOGQ1MDczYTUwNzFiMDQxYTE2YmQ0YjYwOWVmOTI='
-rw-------.  1 jamesh jamesh     0 Feb 12 10:04 '=c2lnbmF0dXJlLTExMjE2ZWY1NDZiNWJiMDcyZjY2MmE4MTk0YmZmNzE5YTk1NDE2OWMyY2FjMzc5NDQ4OTM5MWFjNTAxNmM2NTU='
-rw-------.  1 jamesh jamesh   529 Feb 12 10:04  manifest

@jharrington22
Copy link

I think I broke it with a Ctrl-C mid podman build or by providing the podman build -f parameter with no dockerfile. See the outputs here

Note the make target is called make docker-build but its running podman build

$ IMAGE_REPOSITORY=jharrington22 make docker-build                                                                                 
GOOS=linux go build -o ./bin/main main.go                                                                                                                                                                                                     
# Build and tag images for quay.io                                                                                                                                                                                                            
/usr/bin/podman build . -f .Dockerfile -t quay.io/jharrington22/osd-cluster-ready:v0.1.30-6ceb073                                                                                                                                             
Error: error reading info about "/home/jamesh/.gvm/pkgsets/go1.13.6/global/src/github.com/iamkirkbater/osd-cluster-ready-job/.Dockerfile": stat /home/jamesh/.gvm/pkgsets/go1.13.6/global/src/github.com/iamkirkbater/osd-cluster-ready-job/.D
ockerfile: no such file or directory
make: *** [Makefile:39: docker-build] Error 125
$ IMAGE_REPOSITORY=jharrington22 make docker-build
GOOS=linux go build -o ./bin/main main.go
# Build and tag images for quay.io   
/usr/bin/podman build . -f Dockerfile -t quay.io/jharrington22/osd-cluster-ready:v0.1.30-6ceb073
STEP 1: FROM fedora:latest
Getting image source signatures
Copying blob f147208a1e03 done  
Copying config a78267678b done  
Writing manifest to image destination
Storing signatures
^Cmake: *** [Makefile:39: docker-build] Interrupt

$ IMAGE_REPOSITORY=jharrington22 make docker-build
GOOS=linux go build -o ./bin/main main.go
# Build and tag images for quay.io
/usr/bin/podman build . -f ./Dockerfile -t quay.io/jharrington22/osd-cluster-ready:v0.1.30-6ceb073
STEP 1: FROM fedora:latest
Getting image source signatures
Copying blob f147208a1e03 done  
Copying config a78267678b done  
Writing manifest to image destination
Storing signatures
STEP 2: RUN yum install --assumeyes     jq     wget
Error: error checking if cached image exists from a previous build: error getting history of base image "a78267678b7e6e849c7e960b09227b737a38d5073a5071b041a16bd4b609ef92": error creating new image from reference to image "a78267678b7e6e84
9c7e960b09227b737a38d5073a5071b041a16bd4b609ef92": size for layer "5d6d8687c4a028c69d16dcd730084d6996490fd41556dbdc065ebac533204f2a" is unknown, failing getSize()
make: *** [Makefile:39: docker-build] Error 125

@vrothberg
Copy link
Member

I'll take this one @nalind 👍

vrothberg added a commit to vrothberg/libpod that referenced this issue Feb 12, 2021
Internally, Podman constructs a tree of layers in containers/storage to
quickly compute relations among layers and hence images.  To compute the
tree, we intersect all local layers with all local images.  So far,
lookup errors have been fatal which has turned out to be a mistake since
it seems fairly easy to cause storage corruptions, for instance, when
killing builds.  In that case, a (partial) image may list a layer which
does not exist (anymore).  Since the errors were fatal, there was no
easy way to clean up and many commands were erroring out.

To improve usability, turn the fatal errors into warnings that guide the
user into resolving the issue.  In this case, a `podman system reset`
may be the approriate way for now.

[NO TESTS NEEDED] because I have no reliable way to force it.

[1] containers#8148 (comment)

Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/libpod that referenced this issue Feb 12, 2021
Internally, Podman constructs a tree of layers in containers/storage to
quickly compute relations among layers and hence images.  To compute the
tree, we intersect all local layers with all local images.  So far,
lookup errors have been fatal which has turned out to be a mistake since
it seems fairly easy to cause storage corruptions, for instance, when
killing builds.  In that case, a (partial) image may list a layer which
does not exist (anymore).  Since the errors were fatal, there was no
easy way to clean up and many commands were erroring out.

To improve usability, turn the fatal errors into warnings that guide the
user into resolving the issue.  In this case, a `podman system reset`
may be the approriate way for now.

[NO TESTS NEEDED] because I have no reliable way to force it.

[1] containers#8148 (comment)

Signed-off-by: Valentin Rothberg <[email protected]>
@martinetd
Copy link

martinetd commented Mar 1, 2021

@vrothberg thanks for the patch.
As far as I can tell, #9346 got in podman 3.0.1, right? I still can't get anything to work with this (see below), what's the recommended way forward? I'd rather not reset everything if possible.

EDIT: went with removing the faulty image from overlay-images/images.json as per #8148 (comment)

$ podman --version
podman version 3.0.1

$ podman rmi 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 not found in layer. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
Error: 1 error occurred:
	* layer not found in layer tree: "2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2"

$ podman rmi 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2
Error: 1 error occurred:
	* unable to find a name and tag match for 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 in repotags: no such image

$ podman image list
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
Error: error retrieving size of image "3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626": you may need to remove the image to resolve the error: unable to determine size: error locating layer with ID "2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2": layer not known

(as a side note, I produced that by interrupting a podman pull command with ^C on 3.0.0-rc2, updated after seeing this issue -- if possible I'd very much like to see podman storage survive e.g. a power failure during a podman pull, we're planning on some unattended update on unreliable/slow devices and it'd be great if things could just work™ even if they restart at odd times. I can probably justify investing some time in that if help is required producing breakages)

@vrothberg
Copy link
Member

Thanks for the report, @martinetd!

I will take a look at the rmi error.

'd very much like to see podman storage survive e.g. a power failure during a podman pull, we're planning on some unattended update on unreliable/slow devices and it'd be great if things could just work™ even if they restart at odd times. I can probably justify investing some time in that if help is required producing breakages)

We have been talking about this internally and @giuseppe has plans to address that. @giuseppe, did you tackle the fsync issues already?

@vrothberg
Copy link
Member

Here's a PR to address the reported error in rmi - #9542

The error shouldn't be fatal. @martinetd, if possible, could you try out the PR and see if that fully resolves your issue?

@martinetd
Copy link

The error shouldn't be fatal. @martinetd, if possible, could you try out the PR and see if that fully resolves your issue?

Thanks!
I have kept the old files around so can try easily, running out of time for today though so will be tomorrow.

@giuseppe
Copy link
Member

giuseppe commented Mar 1, 2021

We have been talking about this internally and @giuseppe has plans to address that. @giuseppe, did you tackle the fsync issues

add support for a fsck mode is going to take some time

vrothberg added a commit to vrothberg/libpod that referenced this issue Mar 1, 2021
The storage can easily be corrupted when a build or pull process (or any
process *writing* to the storage) has been killed.  The corruption
surfaces in Podman reporting that a given layer could not be found in
the layer tree.  Those errors must not be fatal but only logged, such
that the image removal may continue.  Otherwise, a user may be unable to
remove an image.

[NO TESTS NEEDED] as I do not yet have a reliable way to cause such a
storage corruption.

Reported-in: containers#8148 (comment)
Signed-off-by: Valentin Rothberg <[email protected]>
@martinetd
Copy link

add support for a fsck mode is going to take some time

I'm not so much concerned about a fsck mode than atomic updates if possible at all.
Write data first, sync it, update metadata in a separate file, then rename over the old file would ensure atomicity for example. There probably are other ways of doing it.

In this case the error seems to be more of an ordering problem, the metadata (json image descriptions) is written before the image? so there's a timing during which if the pull operation is interrupted Bad Things Happen™.

I'd like to fix these if possible, so the image always points either to the old image or the new image -- there will potentially be dangling files until a fsck command is made but that feels less important to me (especially if pulling the same image again will use the same names, so ultimately in my case the dangling files would mostly fix themselves)

I'm not familiar with the data structure so I might just be saying something stupid that doesn't make sense for the overlayfs driver, but I feel it should be possible with what I've seen of it so far.
As for the how, I guess that means either auditing the code or just chaos-monkey-interrupt a lot of pulls at different timing until it stops breaking. I can make time for the later if that helps :)

@giuseppe
Copy link
Member

giuseppe commented Mar 1, 2021

In this case the error seems to be more of an ordering problem, the metadata (json image descriptions) is written before the image? so there's a timing during which if the pull operation is interrupted Bad Things Happen™.

without a "fsck mode" we would need to have a "do a full syncfs before writing the metadata mode". I am fine with that as long as it is not the default, as a syncfs can be very expensive.

Causing such corruptions unfortunately is very easy. Locally I could reproduce it just by forcefully powering off a VM few seconds after the image pull is completed

@martinetd
Copy link

Full fs sync command should never be needed if you're writing things yourselves, package managers are not using these and have been around for long enough that I tend to trust them :)

What follows is longer than I had initially planned, but it's a subject I care about in general so please bear with me a few minutes...

In this case there seem to be multiple distinct problems:

  • Since I produced the error with a simple ^C, we're not even talking sync-level. ^C will never take back any data that the program asked to be written and linux accepted, so there's a pure ordering issue. It could be linear ordering or that the data and metadata are just written by two different processes without locking.
    For the linear case it's easy, just count the number of write() syscalls and kill podman at each interval - for example start under gdb, catch write/pwrite, "continue n", and quit.t
    If multiple processes are involved it gets a bit messy, you'd have to stop at each combination of number of writes of each processes so that doesn't scale well, but I guess it could be automatised as well.

  • As you're describing, sync issues, so if you just powerdown after a pull you don't end up in a corrupt state -- that definitely has a performance cost so I'm totally fine with it being an option.
    As I said, it should be possible to be slightly more subtle than full fs sync. I've had a look at what dnf (fedora) and dpkg (debian) do, and they're quite different:
    dnf is happy just writing files with another name and renaming the file at the end of the write. I'm not sure if the behaviour depends on what filesystem is used. (the rpmdb does get some fdatasync from sqlite)
    dpkg does quite a bit more: they call sync_file_range() with SYNC_FILE_RANGE_WRITE on every file just after writing them with a temporary name (which triggers a flush of dirty pages to disc but does not wait for them, so shouldn't impact performances too much), then when they're done writing everything go over through all the files of the package again, opens them and calls fsync this time and renames them to the final name after all the syncs have been done.

That's definitely more work than a full syncfs call, but it also doesn't disturb other workloads, and now I've seen dnf doesn't even bother with fsync I'm not sure if it's needed at all in our case because of the next problem:

  • The last problem is commands failing if part of the data is incoherent. I think if all commands handle this graciously there is actually no need for sync at all; a fresh start with an incomplete image should just handle it as if the image was missing and pull again, and pull itself should be able to fix any missing layer.
    If that works the UX is perfectly fine without any sync call: either you get the old metadata and the files all are still there, or you get the new metadata and if something is missing you can pull again.

Thanks!

@giuseppe
Copy link
Member

giuseppe commented Mar 1, 2021

Full fs sync command should never be needed if you're writing things yourselves, package managers are not using these and have been around for long enough that I tend to trust them :)

package managers usually call fsync on each file. That is waaaay more expensive than a syncfs.

* Since I produced the error with a simple ^C, we're not even talking sync-level. ^C will never take back any data that the program asked to be written and linux accepted, so there's a pure ordering issue. It could be linear ordering or that the data and metadata are just written by two different processes without locking.
  For the linear case it's easy, just count the number of write() syscalls and kill podman at each interval - for example start under gdb, catch write/pwrite, "continue n", and quit.t

I wasn't aware of such issue. If it happens without a powerdown, then it seems like something we should fix now without any sync involved.

Do you have a reproducer for it?

* As you're describing, sync issues, so if you just powerdown after a pull you don't end up in a corrupt state -- that definitely has a performance cost so I'm totally fine with it being an option.
  As I said, it should be possible to be slightly more subtle than full fs sync. I've had a look at what dnf (fedora) and dpkg (debian) do, and they're quite different:
  dnf is happy just writing files with another name and renaming the file at the end of the write. I'm not sure if the behaviour depends on what filesystem is used. (the rpmdb does get some fdatasync from sqlite)
  dpkg does quite a bit more: they call `sync_file_range()` with `SYNC_FILE_RANGE_WRITE` on every file just after writing them with a temporary name (which triggers a flush of dirty pages to disc but does not wait for them, so shouldn't impact performances too much), then when they're done writing everything go over through all the files of the package again, opens them and calls `fsync` this time and renames them to the final name after all the syncs have been done.

this is way slower than a single syncfs.

Long term, I'd like that containers/storage has the equivalent of: ostreedev/ostree#49

As you can see, usually a syncfs is much faster than calling fsync on each file (in facts both yum and dpkg perform quite bad in this area)

@martinetd
Copy link

martinetd commented Mar 2, 2021

Here's a PR to address the reported error in rmi - #9542

I now get a different error:

$ ./bin/podman rmi localhost:5000/my_image:latest
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 not found in layer. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 of image 3a6990719716f31429cd9230f5a7d64e141172266ed63dbe2196f6a54ba0c626 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Layer 2ccbedb453910fd68db20a7ce815e12e1aabf8e1caebc957b653ab06c81115b2 not found in layer. The storage may be corrupted, consider running `podman system reset`. 
Error: 1 error occurred:
	* layer not known

I reproduced it with minimal data, please try with attached tar (github won't let me attach tar, but it's small enough so base64-encoded it as lazy way out):
(EDIT: added note after this block, it actually almost works and I hadn't noticed it at first)

$ cd /tmp && rm -rf podmanroot # (need root dir to be /tmp/podmanroot as path is in bolt_state.db, and removing that file fixes the problem)
$ base64 -d << EOF | tar xvJ
/Td6WFoAAATm1rRGAgAhARYAAAB0L+Wj4i//Db9dADgbyLEGGBhrWfvNiq8E/CZX4m6z/eCSYo4y
CuVunZXSW3ak4IT/VrRHXK6zWxFPZQWG+wKAeH4rI8LaScgMYzxBVXsw+wPt8XI4bbCRywLSys8M
40QyyAxXBIyDU+YmajNtQycdu3QFCqX1cvYk3JyYH3+Kpw2Pmoep+3Afs45BgWWT4ZML9+Sf2NNj
hVwnDIXXLrnWzXO675seQJfq+HpfMrao82GOJMtRF2LoBXpoAgVxIBuiPnwVCTljcOXMM+n8IWZ+
YSGkMW1tr0eRY4jZW2dXxO8jZ8dNM08MQDDdfOmRb8w5peug2JyiExpve5Yur2voXf2lcATCtU5B
h6Q15vqfQuACVcSJOMFzjsHc95GkYAU3lp6kwH9u6ZcdkXKwXfkv/p6N9+mgqN6kt3dYQGWj+Mb0
suA1pVPEOt2rQwtPX8+sJt47Nj9QTMKMhutbO1FfJIQV92HnwUyqnTd28QZ+VYI8De5zZljwyphr
GkL1Ss206s3B3FxwkuauzeRFQSFk9QrsTqKu5okvFGJbqJZMZ2YJZF79ifdWJydNsl/cn842rHUv
hCWFa11sXOLSKxeJl3qpBinHV9vBCmA6NEwVZAGZZ5utA2UsxxagvmsUo+TQIuLhA16JfXvFdzXV
F9p8ATTNZunhoSUYK2I2ThBVf+73BBxgu2U8xrdX9LJHrnFvX/biWtXwHUibMIrUYfSJ5rOLBkyM
FDDu23auY2z40J+LcE7EturqFaPOxgbEw6mOOnns2ygGnk7xK6wek5jKlHQeu0F4McUPlg7hAvO7
cXOJwCsPuWsnoPQbTNE5aAC4LE6rNfTpxdc8Eq9H3A0cv5mJe6t4PRiraYAVV4WRZsMlHArkiVD6
ulSFVg3Zb4Jhc/xOy+7m64nMCA1SHn9ED3yYpkNNQCtOLV7QYwJ5goWAb4YyEXn5FTE66NhHcBb4
BV/qIOoRJI4/9s63J3UWD/LeRknnFdiEU1qsHM1p5eLiqynDTomZMipdEY/PPxOohcmtjz97Zn3a
Ku76WAZcJnBuNeo3dnUEy3/8nRDMJM7eblOICJYlufhl5uZjKvdqZH6+DoG+7pMN0dBhTom2wIYT
O6MH1DZ7y/g5FwASp+Ayq5Xx8nGn7Dj6p1fPl+21Ky4bySGubbJqkB7eguS3aAApInhBEH9rBWJO
+l4AyylxZF+GvCA9Cvpqq6LVpih+CZafIEkgSNmxRXXrNZW3OjQTukdPI4HSLI5iWtf4RaKzeuco
FuM1jZY+Exzi9R40Xh2L/0Lfs0vh0bcIi+qiOgi0ibkBoS6oeoX+vS/Fss3SdncHXHXKw+BScXk6
dxm415RxCv44Bq99nOuVIQm7CgMNFN6YrX1S+e1WNoRCy3Oi1jOcs65ev0Owxbji2gQCl4oUP80X
9caqUN9i7ve4Pv2FMMBiDYLpwR7/hsCiU75jgtF7P5A1D/s94VI6QBgp87WwyWpYmQEEH4jLEf05
WIQASlZXslhy/6vztj9/9rXedwrYFc4u/9Om39ZgPEp2LX3qiV27wplgPC7lH4Qin3cwce/zRmRP
4i356rmGv63u8y/MQqWWOk04s3ihPBbSmRUwjeHJu+7NN9TXu4xPq2GIMGCVCQmJwRC66Ep9JVZ4
mnQ9UqgJq6Tl8Klidtij5+CZCZ7wQjasT6pNlMgggQ9zd8iO7tGW6BfM+5EKmK1a/GT03iGt/VKZ
ZZR2j6x459khaetYgoixrTDkmhQA+3QO4oRG0pJ9X4Iu7UGoVxrmYjkBQZHIM35uD7JbI0vjfcuB
ntybzwCO8+lZ6mUQ7kQNt05VWfboxQsZh7mxtnOeniHepTZxm9MhuryFg1Fp6x2sSmLnnzz0r07N
2SqFx/daGRiuK05gLuQkdiUsI6HeuY8tqAcGUnlxnGPeTSQxJSPaLVeTHkRPFC+noTsJQM0gwpL9
0n0iJ5f6aswMS9g9PrQxFw+36/7Ntw0GK14gagBfH0l7GFm92iaSAUCoiJXkN7/nlbO7h6lg/pIx
VnxxHB0wjdRgnlFYQ6r8Z/QJPzB7523mZOYO5okETL6nk2555B/KBlJunskTB4FnJ80UVW4AY6wX
ceqpZwF6UOoVCPM/0JJ1S8YdwSLjQl+aYqHTaCaS6cfOSHJS2GQlLQCtqRNzrS4sE+EEEKJIXz9q
oWTy27rvRS4TKMVQ/dAerDeSThxgrlEV79YN/lDDdNRaVMn+xKZtA01K83R1CbT6VN84ab8d5VG+
G8MX0XOb8CZ3Ip7ckAC4d4gkGODEjokrPbgGJmkH/ZJN5anA1omQxwE5BQ+DTwZnekN3CoWSMmWx
EOuBOUH+V8RPZ306K6Zo/svcgw8ubr0aIRNWjgjzzLWDqX9UV5nd1TYLKDilIbajoCDyZX1i/m5Y
VfUKt/2pWLnydShdY+UuhZnJeyfSS/1/21rwA1dtSW/XfvGtWtQ1dWdTnmL/NIZm8tr4OrkFMfSa
CpMpUdJyBJhrZCAxnA8IBe2eZaBOHC079axul6g7sKV83dGtwxwsaqKAAiKymMlIhFP8x+hXuNkl
Rzzm0NbcpXZ/ibYNxFStlZCMByVJ5WQO0wSy3ekgk7CfQbBy6i6pFZwLGjYj7C3JEMyAY1Fy5zDz
Dg91JSJHsaHoUGk0rcW4NEbubOcRaEt343nCGXxpbNs95bIo1b+2tiRHWYPHrm1RaZKCSFY8da2V
MO3O5jmFPtWZn0r4CA3r8QD8sapQK1Ph2Sa3vkPFzvWnkODEFvQmTrXRIIcIsewzFYufNzZgg+Ei
AmHVnz3+8WvGW6mxdW+Ht6hgz4n7lJ3yfvFe9r7gUzWdUU5f9qH5ctGetRuEKBkCt260v381NjAE
7Oa6pI6wxLaVArhFkBVbeyZbfVBL0C6RI7NOFXxHbIugjVqappuxcwJ8jw9C9ABHKDWu6ENkeVA8
XRnLWZUDMRrLVYI3pD5wmfLoqi9i4w/G79fW7E3Ds9KqpMi1mRW5JhrMfk27eG09VDd5zqPfOqTY
psQP+dFzkRckp8c9oitOldaPn5htoD6YFzLljrth0IzftkykdKiGIwdhjCg2t5cFmqPTG5rlbqlD
Q8H9yau6Ba53HOzdG5gXx6v1T+51+thU7mSatfFS1nq312vX5D2VYiw3uVexcG2HQeu1KUrmHA/x
+SAj6gDRFPsH/No4IXfQTKcs5OzbQ0JKU9oS5pL1XGESSJftWNT4d/hdOYt3qB9Abp6bfnB1TO43
UDYxcJiIshifwslTn/nNbcmVwKrU2CrI5bYXl/cv61oCCk0uIsUhEzE6QoNJhl9zE/u4OFXq9cAI
l+5N7HEWWlrOP47PhIe1N3rwi7BdleZtDbZ2rSfx3WIAHOSFV7WVHa4GmsThHSiFN4CfS7iuPW2Q
Mkk2qJYjF6qg8uz8pBJS3AqEeTtZ9vS3trvRFbWkQ/Opwsr6Ynbq5cvSoyAxu+sXRzVzy9p+YUs8
38qpfLILyIwGfbrtw8pW1IMhgdoHDv8dYs2z757qIITohX0B/YoMx64uDVbZxPdthYBAd05+7Fkn
MT3NJl2jwRgK+1yFEC6QTHlx+QOtW825Gv1/vGAOLJ0fLM/mZL2lb6OgGs4CPkqIHLwABfq2KSh0
StsMl3iANGSb9Pm77BwgJTaLszjXsiAtRme3ArbLxlsKDHZvIrKr6+NOzcQHBLQJgyNfve4KzGK+
ioWLZpFYTWz+mLHi52Jj36/yC/VVXKGxRTlFNoOwoRkKKLInDLkmQZ8mhOM8ezq4TVYJ2zjwsbE0
J3IegQzew0e6vSeZP51JIkOj9zPVx/8uFRRIPWTQgHrbeir9rXrUT3lHDUqHJstC2DETkumlqkS6
GUXc+dHKAIGAM8LMh6RnvHrnOXCgB+IZAzdrGYq17U04kqqSpAQs5BgC2MzhUPePseMVkIr2ONhZ
rIhrOi46uMM8E06YFGWLuZzT4lk4H2ti4a6N3+oInoimfh3npreLCn4NXvDEZsgXXX9/B+YJI1t+
Vr8FbFV1GEo9S6Igib5XYvLnmtwgiMQk+TSgFwZ8ncTYThLnuzStKhNiNH5rd8yRcVneULcX5NPT
0VEDt9nZva0Pk54CJhMSijmNCqEzT8OmqNNH9ya1rNzY3Xs3O3GkGqoj5rB/h4UHugOGtQ/1Zghy
uczBpQIGPg3z8H4V7oakX8krT/XGDNy0nly4mon9dyBWYg+aqBs7Dy9B1FzIZ2Wteuhn1Myvhut+
PvIi0qbpGGqVlfm9A2UDTHAgd1Xk2AAU9Q2YbsqbS25crJB7bLmZ3ZzdTr0Gn7nl9m4khIBFkTmU
7ZEBwHUbBF1tx1ucv5V0teu2JOadN619Y3aTXZpKj32CryIb2qD51XYJbMViFTHJB1peTwLtvevt
l3LN94X6eu4m5A6LHe5RYsc3G0x8iRbhT5K9PmdnPe6aaAIDhiiExkD88O7VUsd/fY2vTn9uP8Wy
cqhgLNHKajXWfjKyNZaWPyFBWyP1vpVtr53ZdK5Fi30qqy9A6l9PVRjuYW3gxJiDOq/m0Cq0QacF
2C202vTiT5qnDGPtTmmxr9zj6KVbL9KBOqinzX7AKtda6e8YbYgymSf8nA5SXqW0m1OnKdUAFc0Z
zgAuz9y1xGhsWKOtKE2EkTZ9NaFaSlVO/euU+FT6DZEm3iR2DvFSTjry+nvsPmjvfEJHOdn7YdXX
2b4+ZmZ09JoHzelh131+wAAAnnGqc/qOBVYAAdsbgOAIADFy/iCxxGf7AgAAAAAEWVo=
EOF

$ podman --root /tmp/podmanroot image list
WARN[0000] Top layer ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f of image e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f of image e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
Error: error retrieving size of image "e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2": you may need to remove the image to resolve the error: unable to determine size: error locating layer with ID "ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f": layer not known
$ podman --root /tmp/podmanroot rmi ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f
Error: 1 error occurred:
	* unable to find a name and tag match for ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f in repotags: no such image
$ podman --root /tmp/podmanroot rmi localhost:5000/a:2
WARN[0000] Top layer ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f of image e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Layer ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f not found in layer. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f of image e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
Error: 1 error occurred:
	* layer not found in layer tree: "ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f"

(ugh, it's another error now... Well, I guess that's still worth looking at)
EDIT: ah, sorry for that one-- I forgot to run the pulled version.
New version's image list works! rmi errors, image list fixes things, then things work again.
It looks like rmi twice in a row also fixes things, so it's just the first message? I'll let you see what you want to do about UX on this one, a fresh pull works without rmi first so I'm happy enough. Thanks!

I wasn't aware of such issue. If it happens without a powerdown, then it seems like something we should fix now without any sync involved.

Do you have a reproducer for it?

I just hit ^C during a podman pull, just reproduced on 3.0.1 after some (no output skipped, just added new lines before prompt for readability ; localhost:5000 is a local registry from docker.io/library/registry:2 ):

09:31:24 0 /tmp$ dd if=/dev/zero of=a bs=4k count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000305732 s, 13.4 MB/s

09:31:31 0 /tmp$ rm -rf podmanroot/

09:31:36 0 /tmp$ tar c a | podman --root /tmp/podmanroot import a localhost/a:1
Getting image source signatures
Copying blob ad7facb2586f done  
Copying config cc743244d8 done  
Writing manifest to image destination
Storing signatures
cc743244d88568703770ec970094b6f5147bb0856fd514bd02ccedfe069bc841

09:31:39 0 /tmp$ podman --root /tmp/podmanroot image list
REPOSITORY   TAG     IMAGE ID      CREATED             SIZE
localhost/a  1       cc743244d885  About a minute ago  5.2 kB

09:31:41 0 /tmp$ buildah --root /tmp/podmanroot from a:1
mkdir /run/containers/storage: permission denied
WARN failed to shutdown storage: "mkdir /run/containers/storage: permission denied" 
ERRO exit status 125                              

09:32:13 125 /tmp$ buildah --root /tmp/podmanroot --runroot /tmp/podmanrunroot from a:1
a-working-container

09:32:22 0 /tmp$ buildah --root /tmp/podmanroot --runroot /tmp/podmanrunroot copy a-working-container a b
a054bd56fc68a40664234a4302137c54d394ab245268b7b88543db46782d3e32

09:32:53 0 /tmp$ buildah --root /tmp/podmanroot --runroot /tmp/podmanrunroot ls
CONTAINER ID  BUILDER  IMAGE ID     IMAGE NAME                       CONTAINER NAME
a98b6b144dc1     *     cc743244d885 localhost/a:1                    a-working-container

09:32:57 0 /tmp$ buildah --root /tmp/podmanroot --runroot /tmp/podmanrunroot diff 
unknown command "diff" for "buildah"

09:33:03 125 /tmp$ buildah --root /tmp/podmanroot --runroot /tmp/podmanrunroot commit a-working-container a:2
Getting image source signatures
Copying blob ad7facb2586f skipped: already exists  
Copying blob 08484ec0bcd4 done  
Copying config e1b17d5af1 done  
Writing manifest to image destination
Storing signatures
e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2

09:33:12 0 /tmp$ buildah --root /tmp/podmanroot --runroot /tmp/podmanrunroot rm a-working-container
a98b6b144dc1a15a54a37d4fdefbf28f36850f03714167a76f7e61e6a2d21d09
(failed reverse-i-search)`image-list': podman image sign --sign-by [email protected] oci-archive:my_^Cage-1.1.tar

09:33:28 130 /tmp$ podman --root /tmp/podmanroot image list
REPOSITORY   TAG     IMAGE ID      CREATED         SIZE
localhost/a  2       e1b17d5af165  18 seconds ago  11.3 kB
localhost/a  1       cc743244d885  3 minutes ago   5.2 kB

09:33:29 0 /tmp$ podman --root /tmp/podmanroot push localhost:5000/a:1
Error: unable to find 'localhost:5000/a:1' in local storage: no such image

09:33:44 125 /tmp$ podman --root /tmp/podmanroot push localhost/a:1 localhost:5000/a:1Getting image source signatures
Copying blob ad7facb2586f done  
Copying config cc743244d8 done  
Writing manifest to image destination
Storing signatures

09:33:50 0 /tmp$ podman --root /tmp/podmanroot push localhost/a:2 localhost:5000/a:2
Getting image source signatures
Copying blob 08484ec0bcd4 done  
Copying blob ad7facb2586f skipped: already exists  
Copying config e1b17d5af1 done  
Writing manifest to image destination
Storing signatures

09:33:53 0 /tmp$ podman --root /tmp/podmanroot rmi localhost/a:2
Untagged: localhost/a:2
Deleted: e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2

09:34:00 0 /tmp$ podman --root /tmp/podmanroot pull localhost:5000/a:2
Trying to pull localhost:5000/a:2...
Getting image source signatures
Copying blob fc87b077f4bb skipped: already exists  
Copying blob 353a677e0e56 done  
Copying config e1b17d5af1 done  
Writing manifest to image destination
Storing signatures
e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2

09:34:05 0 /tmp$ podman --root /tmp/podmanroot image list
REPOSITORY        TAG     IMAGE ID      CREATED         SIZE
localhost:5000/a  2       e1b17d5af165  56 seconds ago  11.3 kB
localhost/a       1       cc743244d885  3 minutes ago   5.2 kB

09:34:07 0 /tmp$ podman --root /tmp/podmanroot rmi localhost:5000/a:2
Untagged: localhost:5000/a:2
Deleted: e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2

09:34:17 0 /tmp$ podman --root /tmp/podmanroot pull localhost:5000/a:2
Trying to pull localhost:5000/a:2...
Getting image source signatures
Copying blob fc87b077f4bb skipped: already exists  
Copying blob 353a677e0e56 done  
Copying config e1b17d5af1 done  
Writing manifest to image destination
Storing signatures
^C
09:34:21 1 /tmp$ podman --root /tmp/podmanroot rmi localhost:5000/a:2
Error: 1 error occurred:
	* unable to find 'localhost:5000/a:2' in local storage: no such image

09:34:22 1 /tmp$ podman --root /tmp/podmanroot image list
REPOSITORY   TAG     IMAGE ID      CREATED        SIZE
localhost/a  1       cc743244d885  4 minutes ago  5.2 kB

09:34:27 0 /tmp$ podman --root /tmp/podmanroot pull localhost:5000/a:2
Trying to pull localhost:5000/a:2...
Getting image source signatures
Copying blob fc87b077f4bb skipped: already exists  
Copying blob 353a677e0e56 done  
Copying config e1b17d5af1 done  
Writing manifest to image destination
Storing signatures
e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2

09:34:31 0 /tmp$ podman --root /tmp/podmanroot image list
REPOSITORY        TAG     IMAGE ID      CREATED             SIZE
localhost:5000/a  2       e1b17d5af165  About a minute ago  11.3 kB
localhost/a       1       cc743244d885  4 minutes ago       5.2 kB

09:34:32 0 /tmp$ podman --root /tmp/podmanroot rmi localhost:5000/a:2
Untagged: localhost:5000/a:2
Deleted: e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2

09:34:38 0 /tmp$ podman --root /tmp/podmanroot pull localhost:5000/a:2
Trying to pull localhost:5000/a:2...
Getting image source signatures
Copying blob fc87b077f4bb skipped: already exists  
Copying blob 353a677e0e56 done  
Copying config e1b17d5af1 done  
Writing manifest to image destination
Storing signatures
^C09:34:39 1 /tmp$ podman --root /tmp/podmanroot rmi localhost:5000/a:2
Error: 1 error occurred:
	* unable to find 'localhost:5000/a:2' in local storage: no such image

09:34:42 1 /tmp$ podman --root /tmp/podmanroot pull localhost:5000/a:2
Trying to pull localhost:5000/a:2...
Getting image source signatures
Copying blob fc87b077f4bb skipped: already exists  
Copying blob 353a677e0e56 done  
Copying config e1b17d5af1 done  
Writing manifest to image destination
Storing signatures
e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2

09:34:44 0 /tmp$ ^C

09:34:44 130 /tmp$ podman --root /tmp/podmanroot rmi localhost:5000/a:2
WARN[0000] Top layer ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f of image e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Layer ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f not found in layer. The storage may be corrupted, consider running `podman system reset`. 
WARN[0000] Top layer ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f of image e1b17d5af16529f22664f6c8f7328cad67bf10e3cabe9f7e13d75f14b992d7a2 not found in layer tree. The storage may be corrupted, consider running `podman system reset`. 
Error: 1 error occurred:
	* layer not found in layer tree: "ca84fd7b90547cb7f2b5adf21c4d5270becc15b855548277feccb31732917e8f"

interestingly the last ^C shows after the next prompt, with podman pull having had an exit status of 0, so I'm not sure if it was leftover from the run before that didn't go well with the fresh pull...?

this is way slower than a single syncfs.

uh, not necessarily. Well, it all depends on the workload of your machine -- syncfs has a terrible cost on some workload (from my previous life on HPC systems with way too much scratch data). syncing there is actively harmful to whatever else happens on the machine, while it might be slightly more costly for podman to do what dpkg does I don't believe the overhead is that bad (does ostree do what dkpg does by triggering write then calling fsync when it's over? the initial sync should be almost free and by the time you call the real one there should be nothing left to flush for most files so that'll be cheap as well) and you're not hanging a production machine for a couple of minutes everytime they do a pull. I've seen that, and never want to see any syncfs on these servers ever again.
Well admittedly the likelihood that filesystem gets shared with podman container images is pretty slim but syncfs is a pretty terrible API as far as granularity goes...

Anyway, let's look at the interrupted errors first.

@vrothberg
Copy link
Member

@martinetd, in case there are still some things left, could you open a new issue? It's not sustainable to tackle too many things at once in an already closed issue. A dedicated one helps to keep track of things and backport them if needed.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

9 participants