Cache transfer #224

AkihiroSuda · 2017-12-15T10:59:44Z

Still WIP, but opened tentatively for adding this to 2017Q4 github project: https://github.com/moby/buildkit/projects/1

Will update the issue description next week.

// Ref is a reference to cacheable objects.                
type Ref interface {                                       
        ID() string                                        
        Release(context.Context) error                     
        Size(ctx context.Context) (int64, error)           
        Metadata() *metadata.StorageItem // TODO: remove                  
}                                                          

type ImmutableRef interface {                              
        Ref                                                
        Finalize(ctx context.Context) error // Make sure reference is flushed to driver                                
        // TODO: ImmutableMeta                             
}                                                          

type LocalImmutableRef interface {                         
        ImmutableRef                                       
        Mountable                                          
        Extractable                                        
        Parent() LocalImmutableRef                         
}                                                          

// no implementation yet            
// non-mountable                       
type RemoteImmutableRef interface {                        
        ImmutableRef                                       
        Extractable                                        
        // copies the whole cache                          
        Pull(ctx context.Context, something TBD) (LocalImmutableRef, error)                                                           
}                                                          

type MutableRef interface {                                
        Ref                                                
        Mountable                                          
        // TODO: MutableMeta                               
        Commit(context.Context) (LocalImmutableRef, error) 
}                                                          

type Mountable interface {                                 
        Mount(ctx context.Context, readonly bool) ([]mount.Mount, error)                                               
}                                                          

type Extractable interface {                               
        // extract files within the cache.                 
        // toPath is a local filesystem path.              
        // srcPaths are paths within the cache.            
        // used for CopyOp                                 
        Extract(ctx context.Context, toPath string, srcPath ...string) error                                           
}

tonistiigi · 2017-12-15T20:01:24Z

I didn't catch the logic behind Extractable. On multi worker solver(manager) should not be able to mount any data so it wouldn't deal with cache.ImmutableRef.

My first pass would be something like:

// WorkerRef is the substitute for current cache.ImmutableRef .
// solver doesn't know about ImmutableRef/MutableRef. They are only managed by
// workers - later in a different binary.

type WorkerRef interface {
	Release(context.Context) error
	Transfer(context.Context) (Transfer, error)
	Worker() worker.Worker
	Checksum(selector string) (digest.Digest, error)
	// Metadata() // not sure if needed
}

type Worker interface {
	Pull(context.Context, Transfer) (WorkerRef, error)
	// ... other worker methods. ResolveOp() etc
}

type Transfer interface {
	// This implies layer tarball based content sharing that is not optimal but probably good starting point
	ContentStore() content.Provider
	Target() Transferable
}

type Transferable struct {
	Digest digest.Digest
	Parent *Transferable
	// Metadata
	// cache records to replicate this data
}

tonistiigi · 2017-12-15T20:10:24Z

How cache import would work with previous example: Cache importer returns WorkerRef but it is not actually connected to any worker yet(Worker() would return nil). Now, when making a Transfer out of this ref, it would mark the needed layer blobs as transferable and provide content.Provider that directly connects to the repository on a registry. In fact, I think making a content.Provider out of registry repository is how pull should work in containerd by default(push already works like that).

AkihiroSuda · 2017-12-18T03:54:00Z

I didn't catch the logic behind Extractable.

For CopyOp, IIUC we don't need to create layers and transfer them,
I proposed Extractable so that files within a remote cache can be copied to the local without transferring entire the cache.
This should decrease the layer IO overhead and the network IO overhead from GBs to MBs.

For ExecOp, I agree it needs to transfer entire the cache, and your Transfer interface looks fine.

On multi worker solver(manager) should not be able to mount any data so it wouldn't deal with cache.ImmutableRef.

In the above proposal,. I omitted Mountable from ImmutableRef.

Now, when making a Transfer out of this ref, it would mark the needed layer blobs as transferable and provide content.Provider that directly connects to the repository on a registry.

Why do we need a registry?
It increases the IO overhead twice.
Can we just implement the containerd content gRPC API on the buildkitd?

tonistiigi · 2017-12-18T04:20:17Z

For CopyOp,

So you want to optimize the case where only part of the data from the input is needed? This could be optimization with extra parameters for Transfer, or I've been thinking that maybe we should have a FilterOp instead (for example to avoid this hack

buildkit/solver/exec.go

Line 86 in 8c9d66f

    
           if m.Readonly && ref != nil && m.Dest != pb.RootMount { // exclude read-only rootfs

). Another way is to create this filtered intermediate ref on the fly before transfer and just temporary assign it instead of the input+selector combination for the specific vertex.

I want to avoid the case where the same ref is shared to multiple workers in a way that it loses its identity as an identical ref. If we just copy data between snapshotters and then they call diff afterward they can generate duplicate blobs for data that is built from the same definition. It creates problems for pushing it, cache export and reusing it in next runs.

In the above proposal,. I omitted Mountable from ImmutableRef.

Ah, didn't notice that. I'd propose not using similar terms for cache.ImmutableRef and WorkerRef as they are very different concepts. You are right though that we need Finalize() that I forgot in my example.

Why do we need a registry?

This is the cache import case, so the data is already in the registry. The optimization is to let the worker pull it directly instead of the manager(or random worker) pulling it and then sharing locally.

AkihiroSuda · 2017-12-18T07:20:07Z

If we just copy data between snapshotters and then they call diff afterward they can generate duplicate blobs for data that is built from the same definition.

Could you be more specific?
IIUC, CopyOp with Extractable can produce identical blobs when the inputs are identical.
(gzip compression might be non-deterministic, but it is not specific to this topic)

This is the cache import case, so the data is already in the registry.

Typically, the data would not be already in the registry?

The optimization is to let the worker pull it directly instead of the manager(or random worker) pulling it and then sharing locally.

Should not the worker pull it directly from certain another worker, which could be suggested by the master, using the map[cacheKey][]workerId map?

tonistiigi · 2017-12-18T19:53:05Z

Could you be more specific?
IIUC, CopyOp with Extractable can produce identical blobs when the inputs are identical.
(gzip compression might be non-deterministic, but it is not specific to this topic)

There is no guarantee that apply+diff generates the same bytes, especially on different drivers, platforms or when versions change. This is why Docker uses tar-split and containerd has contentstore for keeping the duplicate. We can't even ignore the issues with gzip, especially for the cache import/export cases where all the refs can be pushed/pulled.

This is a valid problem to optimize. The same problem appears on cache import/export as well as if the cache is needed for the input it would need to pull in the full ref even if it just needs it for copying a subpath. I think we could even solve it for this case first(only way I can see it is to create a filtered intermediate ref) and then use the same solution for optimized data transfer between workers.

Typically, the data would not be already in the registry?

Yes, only talking about buildctl build --import-cache docker.io/foo/bar case in here.

Should not the worker pull it directly from certain another worker,

Yes, it only depends on how the Transfer interface is implemented.

tonistiigi · 2017-12-19T04:18:35Z

Another thought. Maybe we can reuse the filesync logic for this case. So when manager detects that it is optimum to only send partial data it will just try to sync it to a worker. In the worker this wouldn't become actual ref that is equal to the original one but only a one-time data source for a single op(they need to be readonly like the content cache paths) that will be discarded after an operation completes. Next time when this process runs it could try to find the most likely unused destination directory again, same way as the local sources work.

AkihiroSuda · 2017-12-19T08:43:13Z

OK, and the first step would be adding the layer-based cache transfer.

gRPC changes

Implement bare containerd content service API (but read-only)
-- Content stores of all local workers can be aggregated
-- can be implemented easily right now
Add "WorkerController" service

// returns local worker IDs that are likely to have cache for the cache key
func (*WorkerController) FindLocalWorkersWithCache(cacheKey string) (workerIDs []string)

func (*WorkerController) HasCache(workerID string, cacheKey string) bool


// Create an empty cache, and copy neededFiles from the existing cache to the new one.
// Used for optimizing CopyOp
func (*WorkerController) TrimCache(workerID string, cacheKey string, neededFiles ...string) (newCacheKey string)


// Export the cache to the local content store, and return the digest of the "application/vnd.buildkit.cacheconfig.v0" blob.
func (*WorkerController) PopulateCacheLayers(workerID string, cacheKey string) (cacheConfigBlobDigest digest.Digest)

After that, we can consider adding filesync-based optimization

Solver changes

The first step would be to determine the worker using annotation-based constraint. i.e. if an OpMetadata.WorkerConstraint.Annotation has com.example.userspecific.foo==bar, the corresponding op would be executed on a worker which was launched with buildkitd --worker-annotation=com.example.userspecific.foo=bar
Then we can work on more clever algorithm. ref: Cluster management (membership & cache query) #231

AkihiroSuda · 2017-12-19T08:50:03Z

p.s. TrimCache (should) corresponds to "a filtered intermediate ref"

tonistiigi · 2017-12-19T18:50:14Z

Add "WorkerController" service

It should just implement

buildkit/cache/instructioncache/instructioncache.go

Line 9 in 621596d

type InstructionCache interface {

with small modifications where needed, eg return WorkerRef and arrays instead of single Ref. I think the implementation should use only manager data(send in a similar way as #231).

For TrimCache we need to look more specifically when implementing that.

AkihiroSuda · 2018-02-07T09:42:13Z

I'm working on refactoring cacheimport: (WIP)
AkihiroSuda@b7ed478

I guess I can get buildctl transfer-cache <cache-identifier> <source-worker-id> <dest-worker-id> working for local workers by the end of the week, and soon we will be able to imitate distributed mode.

If we use containerd content store gRPC service with session.Attachable(), enabling this for real remote workers could be easily implemented, but maybe we want to use filesync instead as suggested above.

@tonistiigi What advantages do you see in using filesync rather than content store for this case?

tonistiigi · 2018-02-09T02:41:44Z

If we use containerd content store gRPC service with session.Attachable(), enabling this for real remote workers could be easily implemented, but maybe we want to use filesync instead as suggested above.

We could try that but ideally there could be some other solution as session has an overhead, especially for on top of grpc because the grpc memory allocator is so bad. If we modify the grpc API to include the identifier for a specific contentstore(the transfer object) we can avoid at least 50% of the allocations.

@tonistiigi What advantages do you see in using filesync rather than content store for this case?

I don't think filesync needs to be a priority atm. We can always optimize it later if needed.

AkihiroSuda · 2018-02-09T05:23:43Z

We could try that but ideally there could be some other solution as session has an overhead, especially for on top of grpc because the grpc memory allocator is so bad.

Hmm, we should (still) better use registry instead for the first implementation?
One of my concerns was the dependency issue, but I think it is ok for now.
(if it is really problematic, we could embed registry to manager.)

tonistiigi · 2018-02-09T05:50:47Z

@AkihiroSuda No, we should do the cache transfer between workers using the content store interface as described here. I was only referring to the overhead of using session.Attachable on the wire as that does grpc-over-grpc. Actually, if session.Attachable simplifies then we should use that. We can always switch to raw grpc if benchmarks show this as a bottleneck.

AkihiroSuda · 2018-04-13T05:01:59Z

AkihiroSuda/filegrain#21 might be used as an alternative to TrimCache

AkihiroSuda · 2020-07-02T08:06:54Z

Closing as we have Kubernetes driver in buildx now

AkihiroSuda added the area/distributed label Dec 18, 2017

AkihiroSuda mentioned this issue Dec 19, 2017

Cluster management (membership & cache query) #231

Closed

AkihiroSuda changed the title ~~[WIP] define non-mountable RemoteImmutableRef interface for distributed mode~~ Cache transfer Dec 19, 2017

AkihiroSuda mentioned this issue Dec 19, 2017

RFC: Distributed BuildKit (Swarm/Kubernetes/Mesos..) #62

Closed

AkihiroSuda mentioned this issue Feb 8, 2018

decouple remotecache from Docker registry #278

Merged

AkihiroSuda closed this as completed Jul 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache transfer #224

Cache transfer #224

AkihiroSuda commented Dec 15, 2017

tonistiigi commented Dec 15, 2017 •

edited

Loading

tonistiigi commented Dec 15, 2017

AkihiroSuda commented Dec 18, 2017

tonistiigi commented Dec 18, 2017

AkihiroSuda commented Dec 18, 2017

tonistiigi commented Dec 18, 2017

tonistiigi commented Dec 19, 2017

AkihiroSuda commented Dec 19, 2017

AkihiroSuda commented Dec 19, 2017

tonistiigi commented Dec 19, 2017

AkihiroSuda commented Feb 7, 2018

tonistiigi commented Feb 9, 2018

AkihiroSuda commented Feb 9, 2018

tonistiigi commented Feb 9, 2018 •

edited

Loading

AkihiroSuda commented Apr 13, 2018

AkihiroSuda commented Jul 2, 2020

Cache transfer #224

Cache transfer #224

Comments

AkihiroSuda commented Dec 15, 2017

tonistiigi commented Dec 15, 2017 • edited Loading

tonistiigi commented Dec 15, 2017

AkihiroSuda commented Dec 18, 2017

tonistiigi commented Dec 18, 2017

AkihiroSuda commented Dec 18, 2017

tonistiigi commented Dec 18, 2017

tonistiigi commented Dec 19, 2017

AkihiroSuda commented Dec 19, 2017

gRPC changes

Solver changes

AkihiroSuda commented Dec 19, 2017

tonistiigi commented Dec 19, 2017

AkihiroSuda commented Feb 7, 2018

tonistiigi commented Feb 9, 2018

AkihiroSuda commented Feb 9, 2018

tonistiigi commented Feb 9, 2018 • edited Loading

AkihiroSuda commented Apr 13, 2018

AkihiroSuda commented Jul 2, 2020

tonistiigi commented Dec 15, 2017 •

edited

Loading

tonistiigi commented Feb 9, 2018 •

edited

Loading