-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-worker daemon #176
multi-worker daemon #176
Conversation
c7af1d2
to
83e81c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the flags, my suggestion would be --containerd-worker=(true,false,auto)
, --oci-worker=(true,false,auto)
. --containerd-worker-addr=unix:///containerd.sock
, --oci-worker-runtime=runc
, --oci-worker-snapshotter=overlay
. Default would be auto
on both cases and would detect if socket or runtime binary is present. Ideally could also specify multiple containerd/runc workers and assign them names but this could be a follow up if there isn't a good idea atm(maybe just make all these flags slices).
control/control.go
Outdated
@@ -63,7 +45,9 @@ func (c *Controller) Register(server *grpc.Server) error { | |||
} | |||
|
|||
func (c *Controller) DiskUsage(ctx context.Context, r *controlapi.DiskUsageRequest) (*controlapi.DiskUsageResponse, error) { | |||
du, err := c.opt.CacheManager.DiskUsage(ctx, client.DiskUsageInfo{ | |||
// FIXME mw0 | |||
mw0 := c.opt.MetaWorkers[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably what the TODO
is for but it should iterate over the workers and add worker ID to the DiskUsageInfo
(can be done is separate PR).
control/control.go
Outdated
@@ -101,8 +85,10 @@ func (c *Controller) Solve(ctx context.Context, req *controlapi.SolveRequest) (* | |||
|
|||
var expi exporter.ExporterInstance | |||
var err error | |||
// FIXME mw0 | |||
mw0 := c.opt.MetaWorkers[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually tricky, as the exporter should come from the worker that has the returned reference. We may need to delay this so that the solver loads this.
control/control.go
Outdated
@@ -202,7 +188,9 @@ func (c *Controller) Session(stream controlapi.Control_SessionServer) error { | |||
logrus.Debugf("session started") | |||
conn, opts := grpchijack.Hijack(stream) | |||
defer conn.Close() | |||
err := c.opt.SessionManager.HandleConn(stream.Context(), conn, opts) | |||
// FIXME mw0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SessionManager
probably needs to remain in the manager and we may need to do some redirect to the actual worker per vertex. Even if we could figure out the worker that needs to load local files, for pull credentials multiple workers would need to use the same session.
Another idea would be to add some logic in the client so that every session request already goes to a specific worker.
control/control_standalone_test.go
Outdated
"golang.org/x/net/context" | ||
) | ||
|
||
func TestControl(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this test now? Would be good to keep something that makes sure that the different components work together.
metaworker/metaworker.go
Outdated
Content: pd.ContentStore, | ||
Snapshotter: pd.Snapshotter, | ||
MetadataStore: md, | ||
func NewMetaWorker(opt MetaWorkerOpt) (*MetaWorker, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call metaWorker
WorkerController
and add methods like Add(name, worker),Get(constraint)
to it that the init would call and that could be later called when attaching remote workers.
Instead of MetaWorkerOpt
, runc.New(cliopts)
could return a worker interface that the Add()
accepts. To keep this shared code there could be a util pkg with a function that sets up the default configuration by converting a struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What name do you suggest for the type (X
) that will be used for WorkerController.Add(name string, worker X) error
and WorkerController.Get(constraint Constraint) (X, error)
?
It will be a structure composed of a "bare" worker, a snapshotter, and some other stuffs.
(as in the MetaWorker
structure in the current WIP PR, but without sessionmgr and so on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let me know whether it is the right approach to create X
instances for each of snapshotters. (containerd-overlay
, containerd-btrfs
, containerd-blahblahblah
instance rather than the single containerd
instance)
I guess this model simplifies the design for local workers, but on second thought this might result in creating annoying number of remote worker instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the correct way that that X
is called a Worker
(current /worker could become /executor) and has methods for implementing
Lines 67 to 76 in 3e4200c
type Op interface { | |
// CacheKey returns a persistent cache key for operation. | |
CacheKey(context.Context) (digest.Digest, error) | |
// ContentMask returns a partial cache checksum with content paths to the | |
// inputs. User can combine the content checksum of these paths to get a valid | |
// content based cache key. | |
ContentMask(context.Context) (digest.Digest, [][]string, error) | |
// Run runs an operation and returns the output references. | |
Run(ctx context.Context, inputs []Reference) (outputs []Reference, err error) | |
} |
pb.Op
the solver/manager may provide and then internals like snapshotters are not exposed at all? If this is correct then maybe call it OpWorker
first and then name switches in separate PR.
For the second question, my understanding is that containerd will have introspection API that can provide us these values. I think we should support all that containerd supports but not sure if we want to do this automatically/by default. Even if we show them all we should only enable one automatically as otherwise having more snapshotters will make builds slower(still have the same CPU but need to copy data between drivers). I think it would also be fine to just add the default one and if users want more they can enable them with cli flags.
solver/pb/ops.proto
Outdated
|
||
// WorkerConstraint is WIP. Will be changed. | ||
message WorkerConstraint { | ||
string worker_name = 1; // e.g. "runc-overlay", "containerd-overlay", "containerd-btrfs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is WIP so maybe already planned, but I don't think worker name should be in here. We could have a worker_type/worker_brand
(in addition to arch, os, labels etc). If you have 20 workers they would all have different names and adding/removing nodes would change them so this is not something that the client should know.
hack/dockerfiles/test.Dockerfile
Outdated
FROM alpine AS buildkit-buildd | ||
COPY --from=buildd /usr/bin/buildd /usr/bin/ | ||
COPY --from=buildctl /usr/bin/buildctl /usr/bin/ | ||
COPY --from=runc /usr/bin/runc /usr/bin/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be combined with the standalone stage(or make a common shared stage).
We could do it so that first you can only use the default one, after that you can choose worker but whole build must match a specific worker, then implement cache and make it faster. |
e3e3115
to
a8f93a9
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [ ] introduce the new "worker" concept moby#176 (comment) - [ ] fix up CLI - [ ] allow using multiples workers (requires inter-vertex cache copier, HUGE!) Implementation notes: - A new structure called "metaworker" holds a worker instance and its related stuffs such as the snapshotter - For containerd, we have separate workers for each of the available snapshotters: containerd-overlay, containerd-btrfs, ... - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
a8f93a9
to
20237fe
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [ ] introduce the new "worker" concept moby#176 (comment) - [ ] fix up CLI - [ ] allow using multiples workers (requires inter-vertex cache copier, HUGE!) Implementation notes: - A new structure called "metaworker" holds a worker instance and its related stuffs such as the snapshotter - For containerd, we have separate workers for each of the available snapshotters: containerd-overlay, containerd-btrfs, ... - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
20237fe
to
88732f1
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [ ] introduce the new "worker" concept moby#176 (comment) - [ ] fix up CLI - [ ] allow using multiples workers (requires inter-vertex cache copier, HUGE!) Implementation notes: - A new structure called "metaworker" holds a worker instance and its related stuffs such as the snapshotter - For containerd, we have separate workers for each of the available snapshotters: containerd-overlay, containerd-btrfs, ... - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
88732f1
to
c30f516
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [ ] introduce the new "worker" concept moby#176 (comment) - [ ] fix up CLI - [ ] allow using multiples workers (requires inter-vertex cache copier, HUGE!) Implementation notes: - A new structure called "metaworker" holds a worker instance and its related stuffs such as the snapshotter - For containerd, we have separate workers for each of the available snapshotters: containerd-overlay, containerd-btrfs, ... - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
c30f516
to
2eaaaf7
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [X] introduce the new "worker" concept moby#176 (comment) - [X] fix up CLI - [ ] fix up tests - [ ] allow using multiples workers (requires inter-vertex cache copier, HUGE!) Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - For containerd, we have separate workers for each of the available snapshotters: containerd-overlay, containerd-btrfs, ... - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
2eaaaf7
to
6f1d8dd
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [WIP] introduce the new "worker" concept moby#176 (comment) - [X] fix up CLI - [ ] fix up tests - [ ] allow using multiples workers (requires inter-vertex cache copier, HUGE!) --> will be separate PR Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - For containerd, we have separate workers for each of the available snapshotters: containerd-overlay, containerd-btrfs, ... However, only the default one is used, unless `containerd-worker-multiple-snapshotter` is specified. - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
6f1d8dd
to
949001d
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [WIP] introduce the new "worker" concept moby#176 (comment) - [X] fix up CLI - [ ] fix up tests - [ ] allow using multiples workers (requires inter-vertex cache copier, HUGE!) --> will be separate PR Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - For containerd, we have separate workers for each of the available snapshotters: containerd-overlay, containerd-btrfs, ... However, only the default one is used, unless `containerd-worker-multiple-snapshotter` is specified. - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
949001d
to
f366cf4
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [WIP] introduce the new "worker" concept moby#176 (comment) - [X] fix up CLI - [ ] fix up tests - [ ] allow using multiples workers (requires inter-vertex cache copier, HUGE!) --> will be separate PR Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - For containerd, we have separate workers for each of the available snapshotters: containerd-overlay, containerd-btrfs, ... However, only the default one is used, unless `containerd-worker-multiple-snapshotter` is specified. - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
f366cf4
to
6331de6
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [X] introduce the new "worker" concept moby#176 (comment) - [X] fix up CLI - [X] fix up tests - allow using multiples workers (requires inter-vertex cache copier, HUGE!) --> will be separate PR Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - For containerd, we have separate workers for each of the available snapshotters: containerd-overlay, containerd-btrfs, ... However, only the default one is used, unless `containerd-worker-multiple-snapshotter` is specified. - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
6331de6
to
f054853
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [X] introduce the new "worker" concept moby#176 (comment) - [X] fix up CLI - [X] fix up tests - allow using multiples workers (requires inter-vertex cache copier, HUGE!) --> will be separate PR Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - For containerd, we have separate workers for each of the available snapshotters: containerd-overlay, containerd-btrfs, ... However, only the default one is used, unless `containerd-worker-multiple-snapshotter` is specified. - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
I think the PR is now mergeable, PTAL? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just readability comments.
cmd/buildd/main.go
Outdated
var ( | ||
appFlags []cli.Flag | ||
// key: priority (+: less preferred, -: more preferred) | ||
workerCtors = make(map[int]func(c *cli.Context, common *worker.CommonOpt, root string) ([]*worker.Worker, error), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to workerInitializers
The part that sets these up doesn't read too well(including the *Ctor
naming). I'd define struct initializer { fn func(), priority int}
and call registerInitializer
with it. Should also simplify the priority sorting code. Another way would be to define ociWorker
and containerdWorker
pointers in here and set them in corresponding tagged files. Then you could just define priorityList := []..{ociWorker, containerdWorker}
here and skip if it was nil in the setup phase.
cmd/buildd/main.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
commonOpt := &worker.CommonOpt{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to InitOpt
, add root
to it or don't define it at all(SessionManager
will not probably be shared in the future). Don't define this in worker
package as a worker shouldn't know if some options were shared while some specific implementations were created. Just set to regular options before NewWorker
.
cmd/buildd/main.go
Outdated
commonOpt := &worker.CommonOpt{ | ||
SessionManager: sessionManager, | ||
} | ||
wc, err := workerController(c, commonOpt, root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newWorkerController
cmd/buildd/main_containerd.go
Outdated
|
||
return control.NewContainerd(root, socket) | ||
func skipContainerd(socket string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validContainerdSocket()
cmd/buildd/main_containerd.go
Outdated
return false | ||
} | ||
socketPath := strings.TrimPrefix(socket, "unix://") | ||
if _, err := os.Stat(socketPath); os.IsNotExist(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should try to actually dial and call inspection API. You can just leave TODO
comment atm.
worker/worker.go
Outdated
Name string | ||
MetadataStore *metadata.Store | ||
Executor executor.Executor | ||
BaseSnapshotter ctdsnapshot.Snapshotter // not blobmapping one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up: just require blobmappingSnapshotter
solver/source.go
Outdated
@@ -21,7 +22,8 @@ type sourceOp struct { | |||
src source.SourceInstance | |||
} | |||
|
|||
func newSourceOp(_ Vertex, op *pb.Op_Source, sm *source.Manager) (Op, error) { | |||
func newSourceOp(_ Vertex, op *pb.Op_Source, w *worker.Worker) (Op, error) { | |||
sm := w.SourceManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe avoid accidental dependencies. If the worker becomes interface this can depend on a smaller interface.
solver/solver.go
Outdated
} | ||
|
||
func NewLLBSolver(opt LLBOpt) *Solver { | ||
var s *Solver | ||
s = New(func(v Vertex) (Op, error) { | ||
w, err := DetermineVertexWorker(opt.WorkerController, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up: worker can't be determined here, both constraints and the workers of the inputs are needed.
solver/solver.go
Outdated
Worker worker.Worker | ||
InstructionCache InstructionCache | ||
ImageSource source.Source | ||
WorkerController *worker.Controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up: I guess this isn't LLBOpt
anymore as the controller is not LLB-specific. Only the workers it manages are.
solver/exec.go
Outdated
@@ -22,15 +23,17 @@ const execCacheType = "buildkit.exec.v0" | |||
type execOp struct { | |||
op *pb.ExecOp | |||
cm cache.Manager | |||
w worker.Worker | |||
exe executor.Executor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use exec
as exe
is commonly used for executable
f054853
to
65b73aa
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [X] introduce the new "worker" concept moby#176 (comment) - [X] fix up CLI - [X] fix up tests - allow using multiples workers (requires inter-vertex cache copier, HUGE!) --> will be separate PR Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
Updated PR. I'll open PRs to support mixing up multiple snapshotters (should be useful for docker/for-linux#72) and support transferring caches across workers. |
65b73aa
to
e182411
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [X] introduce the new "worker" concept moby#176 (comment) - [X] fix up CLI - [X] fix up tests - allow using multiples workers (requires inter-vertex cache copier, HUGE!) --> will be separate PR Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
e182411
to
8899776
Compare
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [X] introduce the new "worker" concept moby#176 (comment) - [X] fix up CLI - [X] fix up tests - allow using multiples workers (requires inter-vertex cache copier, HUGE!) --> will be separate PR Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [X] introduce the new "worker" concept moby#176 (comment) - [X] fix up CLI - [X] fix up tests - allow using multiples workers (requires inter-vertex cache copier, HUGE!) --> will be separate PR Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
8899776
to
c3aa849
Compare
CI green now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Follow-ups before multi-worker support:
|
- [X] put multiples workers in a single binary ("-tags containerd standalone") - [X] add worker selector to LLB vertex metadata - [X] s/worker/executor/g - [X] introduce the new "worker" concept moby/buildkit#176 (comment) - [X] fix up CLI - [X] fix up tests - allow using multiples workers (requires inter-vertex cache copier, HUGE!) --> will be separate PR Implementation notes: - "Workers" are renamed to "executors" now - The new "worker" instance holds an "executor" instance and its related stuffs such as the snapshotter - The default worker is "runc-overlay" Signed-off-by: Akihiro Suda <[email protected]>
Implementation notes:
related stuffs such as the snapshotter
Signed-off-by: Akihiro Suda [email protected]
Update #161, #41