Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solver: fix possible race for provenance ResolveImageConfig #4157

Merged
merged 1 commit into from
Aug 17, 2023

Conversation

jedevc
Copy link
Member

@jedevc jedevc commented Aug 17, 2023

ResolveImageConfig can be called concurrently - for example, by dockerfile2llb during conversion, we loop through each stage and resolve the base image for that stage.

In the case that two calls to ResolveImageConfig finish at roughly the same time, we can hit an edge case where we attempt to modify the bridge's image records at the same time.

To fix this, we just need to use the bridge's mutex to prevent concurrent access here.

This should fix the following stack trace found in CI (https://github.com/moby/buildkit/actions/runs/5889475633/job/15972815280?pr=4041):

sandbox.go:144: goroutine 1079 [running]:
sandbox.go:144: github.com/moby/buildkit/solver/llbsolver.(*provenanceBridge).ResolveImageConfig(0xc000431e00, {0x1c2b040?, 0xc0008e5b30?}, {0xc00094ba00?, 0xc0003728f0?}, {0x0, 0xc0006cb580, {0x19ba868, 0x7}, {0xc0008f7500, ...}, ...})
sandbox.go:144: 	/src/solver/llbsolver/provenance.go:139 +0x1fb
sandbox.go:144: github.com/moby/buildkit/frontend/dockerfile/dockerfile2llb.toDispatchState.func3.1()
sandbox.go:144: 	/src/frontend/dockerfile/dockerfile2llb/convert.go:405 +0x5fe
sandbox.go:144: golang.org/x/sync/errgroup.(*Group).Go.func1()
sandbox.go:144: 	/src/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x64
sandbox.go:144: created by golang.org/x/sync/errgroup.(*Group).Go
sandbox.go:144: 	/src/vendor/golang.org/x/sync/errgroup/errgroup.go:72 +0xa5
--- FAIL: TestIntegration/TestNoCache/worker=oci-rootless/frontend=builtin (4.45s)

No other explanation for this failure makes sense - b cannot be nil at this point, since a call to b.llbBridge.ResolveImageConfig has just succeeded (also because that would be very strange).

Note: I can't manage to reproduce this with the race checker, so I'm not actually 100% sure that this is the issue that caused the CI failure, but the code here is definitely not thread-safe, so at the very least, this improves that.

ResolveImageConfig can be called concurrently - for example, by
dockerfile2llb during conversion, we loop through each stage and resolve
the base image for that stage.

In the case that two calls to ResolveImageConfig finish at roughly the
same time, we can hit an edge case where we attempt to modify the
bridge's image records at the same time.

To fix this, we just need to use the bridge's mutex to prevent
concurrent access here.

This should fix the following stack trace found in CI:

    sandbox.go:144: goroutine 1079 [running]:
    sandbox.go:144: github.com/moby/buildkit/solver/llbsolver.(*provenanceBridge).ResolveImageConfig(0xc000431e00, {0x1c2b040?, 0xc0008e5b30?}, {0xc00094ba00?, 0xc0003728f0?}, {0x0, 0xc0006cb580, {0x19ba868, 0x7}, {0xc0008f7500, ...}, ...})
    sandbox.go:144: 	/src/solver/llbsolver/provenance.go:139 +0x1fb
    sandbox.go:144: github.com/moby/buildkit/frontend/dockerfile/dockerfile2llb.toDispatchState.func3.1()
    sandbox.go:144: 	/src/frontend/dockerfile/dockerfile2llb/convert.go:405 +0x5fe
    sandbox.go:144: golang.org/x/sync/errgroup.(*Group).Go.func1()
    sandbox.go:144: 	/src/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x64
    sandbox.go:144: created by golang.org/x/sync/errgroup.(*Group).Go
    sandbox.go:144: 	/src/vendor/golang.org/x/sync/errgroup/errgroup.go:72 +0xa5
    --- FAIL: TestIntegration/TestNoCache/worker=oci-rootless/frontend=builtin (4.45s)

No other explanation for this failure makes sense - `b` cannot be `nil`
at this point, since a call to `b.llbBridge.ResolveImageConfig` has just
succeeded (also because that would be very strange).

Signed-off-by: Justin Chadwell <[email protected]>
@jedevc jedevc requested review from tonistiigi and crazy-max August 17, 2023 12:44
@jedevc
Copy link
Member Author

jedevc commented Oct 3, 2023

@tonistiigi this looks like this is causing panics downstream: docker/buildx#2064.

Any objections to doing a v0.12.3 release? Happy to help prep for it - we have a few useful fixes to pull in: https://github.com/moby/buildkit/issues?q=label%3Aneeds-cherry-pick%2Fv0.12+sort%3Aupdated-desc+is%3Aclosed.

@jedevc jedevc deleted the fix-provenance-bridge-race branch October 3, 2023 14:55
@thaJeztah
Copy link
Member

+1 for backporting (based on the diff, which seems very small/focussed)

/cc @neersighted (we'll probably need a revendor in moby)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants