-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache transfer #224
Comments
I didn't catch the logic behind My first pass would be something like: // WorkerRef is the substitute for current cache.ImmutableRef .
// solver doesn't know about ImmutableRef/MutableRef. They are only managed by
// workers - later in a different binary.
type WorkerRef interface {
Release(context.Context) error
Transfer(context.Context) (Transfer, error)
Worker() worker.Worker
Checksum(selector string) (digest.Digest, error)
// Metadata() // not sure if needed
}
type Worker interface {
Pull(context.Context, Transfer) (WorkerRef, error)
// ... other worker methods. ResolveOp() etc
}
type Transfer interface {
// This implies layer tarball based content sharing that is not optimal but probably good starting point
ContentStore() content.Provider
Target() Transferable
}
type Transferable struct {
Digest digest.Digest
Parent *Transferable
// Metadata
// cache records to replicate this data
} |
How cache import would work with previous example: Cache importer returns |
For CopyOp, IIUC we don't need to create layers and transfer them, For ExecOp, I agree it needs to transfer entire the cache, and your
In the above proposal,. I omitted
Why do we need a registry? |
So you want to optimize the case where only part of the data from the input is needed? This could be optimization with extra parameters for Line 86 in 8c9d66f
I want to avoid the case where the same ref is shared to multiple workers in a way that it loses its identity as an identical ref. If we just copy data between snapshotters and then they call diff afterward they can generate duplicate blobs for data that is built from the same definition. It creates problems for pushing it, cache export and reusing it in next runs.
Ah, didn't notice that. I'd propose not using similar terms for
This is the cache import case, so the data is already in the registry. The optimization is to let the worker pull it directly instead of the manager(or random worker) pulling it and then sharing locally. |
Could you be more specific?
Typically, the data would not be already in the registry?
Should not the worker pull it directly from certain another worker, which could be suggested by the master, using the |
There is no guarantee that apply+diff generates the same bytes, especially on different drivers, platforms or when versions change. This is why Docker uses tar-split and containerd has contentstore for keeping the duplicate. We can't even ignore the issues with gzip, especially for the cache import/export cases where all the refs can be pushed/pulled. This is a valid problem to optimize. The same problem appears on cache import/export as well as if the cache is needed for the input it would need to pull in the full ref even if it just needs it for copying a subpath. I think we could even solve it for this case first(only way I can see it is to create a filtered intermediate ref) and then use the same solution for optimized data transfer between workers.
Yes, only talking about
Yes, it only depends on how the |
Another thought. Maybe we can reuse the filesync logic for this case. So when manager detects that it is optimum to only send partial data it will just try to sync it to a worker. In the worker this wouldn't become actual ref that is equal to the original one but only a one-time data source for a single op(they need to be readonly like the content cache paths) that will be discarded after an operation completes. Next time when this process runs it could try to find the most likely unused destination directory again, same way as the local sources work. |
OK, and the first step would be adding the layer-based cache transfer. gRPC changes
// returns local worker IDs that are likely to have cache for the cache key
func (*WorkerController) FindLocalWorkersWithCache(cacheKey string) (workerIDs []string)
func (*WorkerController) HasCache(workerID string, cacheKey string) bool
// Create an empty cache, and copy neededFiles from the existing cache to the new one.
// Used for optimizing CopyOp
func (*WorkerController) TrimCache(workerID string, cacheKey string, neededFiles ...string) (newCacheKey string)
// Export the cache to the local content store, and return the digest of the "application/vnd.buildkit.cacheconfig.v0" blob.
func (*WorkerController) PopulateCacheLayers(workerID string, cacheKey string) (cacheConfigBlobDigest digest.Digest) After that, we can consider adding filesync-based optimization Solver changes
|
p.s. |
It should just implement
WorkerRef and arrays instead of single Ref . I think the implementation should use only manager data(send in a similar way as #231).
For |
I'm working on refactoring I guess I can get If we use containerd content store gRPC service with @tonistiigi What advantages do you see in using filesync rather than content store for this case? |
We could try that but ideally there could be some other solution as
I don't think |
Hmm, we should (still) better use registry instead for the first implementation? |
@AkihiroSuda No, we should do the cache transfer between workers using the content store interface as described here. I was only referring to the overhead of using |
AkihiroSuda/filegrain#21 might be used as an alternative to |
Closing as we have Kubernetes driver in buildx now |
Still WIP, but opened tentatively for adding this to 2017Q4 github project: https://github.com/moby/buildkit/projects/1
Will update the issue description next week.
The text was updated successfully, but these errors were encountered: