Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache transfer #224

Closed
AkihiroSuda opened this issue Dec 15, 2017 · 16 comments
Closed

Cache transfer #224

AkihiroSuda opened this issue Dec 15, 2017 · 16 comments

Comments

@AkihiroSuda
Copy link
Member

Still WIP, but opened tentatively for adding this to 2017Q4 github project: https://github.com/moby/buildkit/projects/1

Will update the issue description next week.

// Ref is a reference to cacheable objects.                
type Ref interface {                                       
        ID() string                                        
        Release(context.Context) error                     
        Size(ctx context.Context) (int64, error)           
        Metadata() *metadata.StorageItem // TODO: remove                  
}                                                          

type ImmutableRef interface {                              
        Ref                                                
        Finalize(ctx context.Context) error // Make sure reference is flushed to driver                                
        // TODO: ImmutableMeta                             
}                                                          

type LocalImmutableRef interface {                         
        ImmutableRef                                       
        Mountable                                          
        Extractable                                        
        Parent() LocalImmutableRef                         
}                                                          

// no implementation yet            
// non-mountable                       
type RemoteImmutableRef interface {                        
        ImmutableRef                                       
        Extractable                                        
        // copies the whole cache                          
        Pull(ctx context.Context, something TBD) (LocalImmutableRef, error)                                                           
}                                                          

type MutableRef interface {                                
        Ref                                                
        Mountable                                          
        // TODO: MutableMeta                               
        Commit(context.Context) (LocalImmutableRef, error) 
}                                                          

type Mountable interface {                                 
        Mount(ctx context.Context, readonly bool) ([]mount.Mount, error)                                               
}                                                          

type Extractable interface {                               
        // extract files within the cache.                 
        // toPath is a local filesystem path.              
        // srcPaths are paths within the cache.            
        // used for CopyOp                                 
        Extract(ctx context.Context, toPath string, srcPath ...string) error                                           
}
@tonistiigi
Copy link
Member

tonistiigi commented Dec 15, 2017

I didn't catch the logic behind Extractable. On multi worker solver(manager) should not be able to mount any data so it wouldn't deal with cache.ImmutableRef.

My first pass would be something like:

// WorkerRef is the substitute for current cache.ImmutableRef .
// solver doesn't know about ImmutableRef/MutableRef. They are only managed by
// workers - later in a different binary.

type WorkerRef interface {
	Release(context.Context) error
	Transfer(context.Context) (Transfer, error)
	Worker() worker.Worker
	Checksum(selector string) (digest.Digest, error)
	// Metadata() // not sure if needed
}

type Worker interface {
	Pull(context.Context, Transfer) (WorkerRef, error)
	// ... other worker methods. ResolveOp() etc
}

type Transfer interface {
	// This implies layer tarball based content sharing that is not optimal but probably good starting point
	ContentStore() content.Provider
	Target() Transferable
}

type Transferable struct {
	Digest digest.Digest
	Parent *Transferable
	// Metadata
	// cache records to replicate this data
}

@tonistiigi
Copy link
Member

How cache import would work with previous example: Cache importer returns WorkerRef but it is not actually connected to any worker yet(Worker() would return nil). Now, when making a Transfer out of this ref, it would mark the needed layer blobs as transferable and provide content.Provider that directly connects to the repository on a registry. In fact, I think making a content.Provider out of registry repository is how pull should work in containerd by default(push already works like that).

@AkihiroSuda
Copy link
Member Author

I didn't catch the logic behind Extractable.

For CopyOp, IIUC we don't need to create layers and transfer them,
I proposed Extractable so that files within a remote cache can be copied to the local without transferring entire the cache.
This should decrease the layer IO overhead and the network IO overhead from GBs to MBs.

For ExecOp, I agree it needs to transfer entire the cache, and your Transfer interface looks fine.

On multi worker solver(manager) should not be able to mount any data so it wouldn't deal with cache.ImmutableRef.

In the above proposal,. I omitted Mountable from ImmutableRef.

Now, when making a Transfer out of this ref, it would mark the needed layer blobs as transferable and provide content.Provider that directly connects to the repository on a registry.

Why do we need a registry?
It increases the IO overhead twice.
Can we just implement the containerd content gRPC API on the buildkitd?

@tonistiigi
Copy link
Member

For CopyOp,

So you want to optimize the case where only part of the data from the input is needed? This could be optimization with extra parameters for Transfer, or I've been thinking that maybe we should have a FilterOp instead (for example to avoid this hack

if m.Readonly && ref != nil && m.Dest != pb.RootMount { // exclude read-only rootfs
). Another way is to create this filtered intermediate ref on the fly before transfer and just temporary assign it instead of the input+selector combination for the specific vertex.

I want to avoid the case where the same ref is shared to multiple workers in a way that it loses its identity as an identical ref. If we just copy data between snapshotters and then they call diff afterward they can generate duplicate blobs for data that is built from the same definition. It creates problems for pushing it, cache export and reusing it in next runs.

In the above proposal,. I omitted Mountable from ImmutableRef.

Ah, didn't notice that. I'd propose not using similar terms for cache.ImmutableRef and WorkerRef as they are very different concepts. You are right though that we need Finalize() that I forgot in my example.

Why do we need a registry?

This is the cache import case, so the data is already in the registry. The optimization is to let the worker pull it directly instead of the manager(or random worker) pulling it and then sharing locally.

@AkihiroSuda
Copy link
Member Author

If we just copy data between snapshotters and then they call diff afterward they can generate duplicate blobs for data that is built from the same definition.

Could you be more specific?
IIUC, CopyOp with Extractable can produce identical blobs when the inputs are identical.
(gzip compression might be non-deterministic, but it is not specific to this topic)

This is the cache import case, so the data is already in the registry.

Typically, the data would not be already in the registry?

The optimization is to let the worker pull it directly instead of the manager(or random worker) pulling it and then sharing locally.

Should not the worker pull it directly from certain another worker, which could be suggested by the master, using the map[cacheKey][]workerId map?

@tonistiigi
Copy link
Member

Could you be more specific?
IIUC, CopyOp with Extractable can produce identical blobs when the inputs are identical.
(gzip compression might be non-deterministic, but it is not specific to this topic)

There is no guarantee that apply+diff generates the same bytes, especially on different drivers, platforms or when versions change. This is why Docker uses tar-split and containerd has contentstore for keeping the duplicate. We can't even ignore the issues with gzip, especially for the cache import/export cases where all the refs can be pushed/pulled.

This is a valid problem to optimize. The same problem appears on cache import/export as well as if the cache is needed for the input it would need to pull in the full ref even if it just needs it for copying a subpath. I think we could even solve it for this case first(only way I can see it is to create a filtered intermediate ref) and then use the same solution for optimized data transfer between workers.

Typically, the data would not be already in the registry?

Yes, only talking about buildctl build --import-cache docker.io/foo/bar case in here.

Should not the worker pull it directly from certain another worker,

Yes, it only depends on how the Transfer interface is implemented.

@tonistiigi
Copy link
Member

Another thought. Maybe we can reuse the filesync logic for this case. So when manager detects that it is optimum to only send partial data it will just try to sync it to a worker. In the worker this wouldn't become actual ref that is equal to the original one but only a one-time data source for a single op(they need to be readonly like the content cache paths) that will be discarded after an operation completes. Next time when this process runs it could try to find the most likely unused destination directory again, same way as the local sources work.

@AkihiroSuda
Copy link
Member Author

OK, and the first step would be adding the layer-based cache transfer.

gRPC changes

  • Implement bare containerd content service API (but read-only)
    -- Content stores of all local workers can be aggregated
    -- can be implemented easily right now

  • Add "WorkerController" service

// returns local worker IDs that are likely to have cache for the cache key
func (*WorkerController) FindLocalWorkersWithCache(cacheKey string) (workerIDs []string)

func (*WorkerController) HasCache(workerID string, cacheKey string) bool


// Create an empty cache, and copy neededFiles from the existing cache to the new one.
// Used for optimizing CopyOp
func (*WorkerController) TrimCache(workerID string, cacheKey string, neededFiles ...string) (newCacheKey string)


// Export the cache to the local content store, and return the digest of the "application/vnd.buildkit.cacheconfig.v0" blob.
func (*WorkerController) PopulateCacheLayers(workerID string, cacheKey string) (cacheConfigBlobDigest digest.Digest)

After that, we can consider adding filesync-based optimization

Solver changes

  • The first step would be to determine the worker using annotation-based constraint. i.e. if an OpMetadata.WorkerConstraint.Annotation has com.example.userspecific.foo==bar, the corresponding op would be executed on a worker which was launched with buildkitd --worker-annotation=com.example.userspecific.foo=bar

  • Then we can work on more clever algorithm. ref: Cluster management (membership & cache query) #231

@AkihiroSuda AkihiroSuda changed the title [WIP] define non-mountable RemoteImmutableRef interface for distributed mode Cache transfer Dec 19, 2017
@AkihiroSuda
Copy link
Member Author

p.s. TrimCache (should) corresponds to "a filtered intermediate ref"

@tonistiigi
Copy link
Member

Add "WorkerController" service

It should just implement

type InstructionCache interface {
with small modifications where needed, eg return WorkerRef and arrays instead of single Ref. I think the implementation should use only manager data(send in a similar way as #231).

For TrimCache we need to look more specifically when implementing that.

@AkihiroSuda
Copy link
Member Author

I'm working on refactoring cacheimport: (WIP)
AkihiroSuda@b7ed478

I guess I can get buildctl transfer-cache <cache-identifier> <source-worker-id> <dest-worker-id> working for local workers by the end of the week, and soon we will be able to imitate distributed mode.

If we use containerd content store gRPC service with session.Attachable(), enabling this for real remote workers could be easily implemented, but maybe we want to use filesync instead as suggested above.

@tonistiigi What advantages do you see in using filesync rather than content store for this case?

@tonistiigi
Copy link
Member

If we use containerd content store gRPC service with session.Attachable(), enabling this for real remote workers could be easily implemented, but maybe we want to use filesync instead as suggested above.

We could try that but ideally there could be some other solution as session has an overhead, especially for on top of grpc because the grpc memory allocator is so bad. If we modify the grpc API to include the identifier for a specific contentstore(the transfer object) we can avoid at least 50% of the allocations.

@tonistiigi What advantages do you see in using filesync rather than content store for this case?

I don't think filesync needs to be a priority atm. We can always optimize it later if needed.

@AkihiroSuda
Copy link
Member Author

We could try that but ideally there could be some other solution as session has an overhead, especially for on top of grpc because the grpc memory allocator is so bad.

Hmm, we should (still) better use registry instead for the first implementation?
One of my concerns was the dependency issue, but I think it is ok for now.
(if it is really problematic, we could embed registry to manager.)

@tonistiigi
Copy link
Member

tonistiigi commented Feb 9, 2018

@AkihiroSuda No, we should do the cache transfer between workers using the content store interface as described here. I was only referring to the overhead of using session.Attachable on the wire as that does grpc-over-grpc. Actually, if session.Attachable simplifies then we should use that. We can always switch to raw grpc if benchmarks show this as a bottleneck.

@AkihiroSuda
Copy link
Member Author

AkihiroSuda/filegrain#21 might be used as an alternative to TrimCache

@AkihiroSuda
Copy link
Member Author

Closing as we have Kubernetes driver in buildx now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants