-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Being-saved images create errors up the stack #595
Comments
For option 1, could we add a lock to the commit operation to avoid the issue you're concerned about? |
I like option 2. Can we run some kind of gc from daemons like CRI-O periodically to remove the phantom records? |
could we write these files before the image is recorded? We'd still need to have some kind of GC if these files are written but the image wasn't recorded. Another option would be to use a staging area, similarly to OSTree, so we can write these files to a temporary (but guessable) location, e.g.: When we lookup for the manifest, we'd need to check like:
The advantage is that at startup we could just |
I wished we would not have this race window at all but I guess that's a natural part of sharing data across processes. I am not yet entirely hooked on the proposals, so I want to throw in another idea. The problem reminds me of the block-copy detection mechanism we have been working on in c/image for a while. Mainly because we need to wait until processing data with a unique ID has been finished. As we have a unique ID, we can generate image-specific lockfiles ad-hoc. This somehow lets me wish for us being able to keep a lock for the image until we are finished committing and others can block on that. Our lockfile implementation allows for recursive locking, so we could wire that in without deadlocking ourselves. Garbage collection would work implicitly. Let's assume process P1 commits image Img. Process P2 runs The challenge would be that CreateImage had to return a handle to lock (or we wire that in via the options). The handle would then unlock the image-specific lock: img, handle, err := storage.NewImage(....)
defer handle.Unlock() I may have overlooked a detail but it sounds sweet at the moment. |
Do I understand right that this is primarily problem for “list all images” users (and users who guess the ID in advance and use it directly)? For users that refer to a single image by name, we could easily enough defer the Overall, I fairly strongly prefer “option 1”-like behavior, because leaving incomplete images that are not ever automatically deleted around “kind of sucks for everyone”, things like As for the locking, is that so infeasible? It probably is with the current API, but a |
if a new API is acceptable, I like this solution |
@nalind @mtrmac @giuseppe @vrothberg @saschagrunert This Issue seems to have been dropped, we seemed to be coming to a decent solution, but then everyone went off into their own priorities and this was never fixed. |
As for implementing, I suggest to create a card internally unless @saschagrunert has time to tackle it. |
I think I can look into it in the next weeks, if someone else is faster than me then feel free to assign this one :) |
@nalind @saschagrunert Any progress on this? |
Not until now unfortunately. |
@nalind @saschagrunert Any progress on this? |
Checking: We're not sure, but Miloslav suggested I may be seeing this issue when building multiarch images (using buildah) in Cirrus-cron jobs. For whatever reason, it tends to happen once/twice a week on the 'hello' image build like in this example.. Error message reported is:
Does this seem related or do I need to go find another tree to bark at? |
When pulling/importing images to named tags, we only add the tag after the image is “complete”. In the meantime, though, it can still be looked up by manifest digest (after that manifest is recorded into store), or by image ID (even before any manifest is uploaded, and the image ID is deterministic for pulled images — not for built images). So, it possible to:
The might be applicable if there are two processes, one is pulling the image, and another is referring to it (using a known top-level digest?) and finds the not-yet-complete image. One clearly problematic thing the All of this is somewhat a special case of the larger discussion — it would be easy to switch the two manifest writes (and we should do that), but similarly the image doesn’t yet have signatures recorded. In principle, all of this should be one locked write. I don’t immediately see how this relates to the log, though — doesn’t this show the build completely succeeding, and then some larger, non-parallel operation failing? I can’t trivially see the failing command or even attribute it to a specific script. But overall this looks a bit as if the parallel build just produced a broken image, somehow… and this failure is not racing against anything else running concurrently, is it? I’m afraid I’m not familiar with the Buildah/Podman multi-arch-manifest code. |
For instance, when building a multi-arch image, and giving the This also strikes me as maybe related to containers/buildah#3710 and/or containers/buildah#3791 |
The scripts are more/less a wrapper around
No, the only thing running on the machine with any intentional concurrency is the |
(I don’t really know what I’m looking at) My reading of the log is that |
Ahh, yes, comparing the output and my scripts, your analysis is correct. That error in the logs appears to originate from a |
Looking at the multi-platform build, apart from almost-certainly-irrelevant containers/buildah#4079 , the top-level manifest is being built synchronously after the individual instances, so that should not allow this kind of inconsistency. (OTOH I never read how Buildah/Podman deal with manifest list objects, and from what I’ve seen today I’m not sure how the failing instance lookup ever works, so I am certainly missing something important about the code base.) So I think it’s a different bug, or at least it’s likely enough to be different that it seems worthwhile discussing in a separate (initially Buildah) GitHub issue instead of here. |
Currently, to commit an image in the image library's
containers-storage
transport, we create and populate layers, create the image record, and then save the non-layers under the directory that matches that image record's ID. The manifest and config blobs are among those non-layers.This creates a small window where there's an image record without a manifest and/or config blob, and if another thread tries to read information about the image during that window, it will get unexpected errors, and those errors percolating up the stack is a problem.
Option 1: add an
incomplete
flag that we use for screening out values that we'd otherwise return in anImages()
list, and clean them up when we get a write lock, as we do for incomplete layers. Won't work because we don't keep the lock during the commit operation, so things can get cleaned up from under us during the window.Option 2: screen out values from the
Images()
list if they don't include a manifest. The commit logic doesn't need to be changed because it gets the ID when it creates the image record, and uses that ID to save the manifest for the image. Downsides: if we crash, the phantom image record could stay there forever, since it would be filtered for everyone who didn't know its ID. The screening would also need to check for a config blob, which requires knowledge that lives in containers/image (schema 1 images don't have them, and logic for determining what kind of manifest we have is in containers/image), which we can't use as a dependency.Option 3: if we fail to read an image's manifest, spin and try again. Downside: hacky, and we can only guess how long we should spin before giving up and reporting an error.
Option 4: expose a way to manipulate
Flags
in the image record, add a flag that containers/image will set on works in progress, and teach everyone to skip over images that have the flag set. Downside: kind of sucks for everyone.Option 5: ???
The text was updated successfully, but these errors were encountered: