Clients can receive unwanted duplicate progress when opening a separate session #2817

sipsma · 2022-04-20T17:44:21Z

Background: Dagger supports exporting results in a DAG at arbitrary points of execution, which is a bit different than Buildkit's underlying model that currently only allows a single export configuration that's specified at the beginning of a Build request. Dagger works around this by opening separate sessions from the main Solve to export results.

The problem: The approach of separate sessions works perfectly fine for now, but does have a weird side effect in that Buildkit "replays" all progress events, including ExecOp logs, for every vertex in the LLB DAG referenced by the new export session. I believe I tracked this behavior down to these lines. I'd presume this exists so that if one client is doing a build and another totally separate client connects to Buildkit with an overlapping build, that new client gets all the progress so far backfilled rather than only getting the newest updates. That makes sense in general, but for Dagger's use case it results in the client receiving progress updates it already received. Currently, that means that progress logs get printed multiple times, which is very confusing for users.

Dagger want to deduplicate these logs ~~but right now the only way I'm seeing would be to hash the relevant data in each structure, which could get extremely expensive for progress logs especially at a certain scale~~ EDIT: commented below with a better stopgap approach.

Other options that could work better:

Buildkit sends each progress item w/ a unique ID that can serve as a hash key for clients to check whether they have already received the progress or not
Buildkit supports a solve request option that disables replaying of progress events so far in builds and only sends new ones. This also saves the network transfer of logs that will be ignored anyways.

@tonistiigi Let me know if this makes sense or if you think there's another way to approach this. Support for Export in the Gateway API is obviously a better long-term solution, but I'm guessing that's a much bigger project and we'd prefer to have a simpler fix in the meantime.

The text was updated successfully, but these errors were encountered:

sipsma · 2022-04-20T18:16:44Z

On a second thought, I guess there is a stopgap solution where the client maintains state on each vertex digest received and doesn't print logs for ones that have already been completed. That's not as ideal as just checking a unique ID or disabling this entirely, but isn't as expensive as hashing all the data. I'd still like to see if one of the other solutions would work, but marking this as just an enhancement for now.

tonistiigi · 2022-04-20T19:55:43Z

This is what we do in buildx bake docker/buildx#977 . Far from ideal but don't have very good solutions in mind atm. Somehow would need to track that vertex is already part of session group. Or only do one status request for multiple builds.

sipsma added the kind/enhancement label Apr 20, 2022

sipsma mentioned this issue Apr 20, 2022

Duplicate log messages dagger/dagger#449

Closed

AkihiroSuda added kind/enhancement and removed kind/enhancement labels Jun 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clients can receive unwanted duplicate progress when opening a separate session #2817

Clients can receive unwanted duplicate progress when opening a separate session #2817

sipsma commented Apr 20, 2022 •

edited

Loading

sipsma commented Apr 20, 2022

tonistiigi commented Apr 20, 2022

Clients can receive unwanted duplicate progress when opening a separate session #2817

Clients can receive unwanted duplicate progress when opening a separate session #2817

Comments

sipsma commented Apr 20, 2022 • edited Loading

sipsma commented Apr 20, 2022

tonistiigi commented Apr 20, 2022

sipsma commented Apr 20, 2022 •

edited

Loading