-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support custom jobs #156
Support custom jobs #156
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable.
(I doubt it matters, but I'm slightly disappointed that serialising a custom type requires doing it first to a temporary area and they copying it to the final message, instead of writing it in place. I guess the alternative is registering user-provided conversion functions and using an open type or something.)
Thanks for the review. I also hit that snag and opted for the more wasteful approach of copying (mainly because I hadn't even considered open types so my use of conversion functions didn't work!). |
This PR now also adds support for submitting the custom jobs using the ocurrent ocluster plugin. In order to build a "solver worker" it would be great to make the cluster worker library publicly available. Here I've just added it to the ocluster package with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Needs rebasing, with the new level
attribute added.
Not sure about exposing the worker - if we want people to build on our worker then we need to turn it into a proper library. Most workers won't want to be testing for the docker daemon, reporting docker metrics, etc. Might be better to drop that bit for now.
0fb3ebd
to
32b319c
Compare
Thanks for the review @talex5 (and sorry for the delay)! The PR is rebased, the worker is no longer public, custom jobs require a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, except I think we need to do something about the OCurrent cache key.
ocurrent-plugin/current_ocluster.ml
Outdated
@@ -26,6 +26,10 @@ type docker_build = { | |||
push_target : target option; | |||
} [@@deriving to_yojson] | |||
|
|||
type custom = Cluster_api.Custom.t | |||
|
|||
let custom_to_yojson c = `String (Cluster_api.Custom.kind c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work. When the payload changes, we'll need a new database key for it.
So I think we need some kind of Custom.key
function to extract the important parts. We only need that on the sending side, so it might make sense to split action
into two types, one for sending and one for receiving. I had a go at that here (just the splitting; not getting the key):
Also, this cache key is only needed for the OCurrent plugin, so we might want to make it optional for other users of OCluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point, that makes sense. I'll see if I can extend your commit with the key part. Thanks :)) !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it more, perhaps custom key generation should be part of the Current_ocluster
plugin's configuration?
Do we need OCurrent support for custom jobs at the moment? If not, that could just be taken out and added later in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it more, perhaps custom key generation should be part of the Current_ocluster plugin's configuration?
How would this function get to the Key
module to change how custom jobs are digested? Would we provide a functorised version that needs a custom key generating function with the current implementation being the default one provided, or perhaps I'm missing something ?
Do we need OCurrent support for custom jobs at the moment? If not, that could just be taken out and added later in a separate PR.
It would definitely be useful to have OCurrent support for custom jobs, in the context of the solver service we'd want to be able to submit jobs in a pipeline for doing the solves, but as you said maybe it is easier to do this in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this function get to the Key module to change how custom jobs are digested? Would we provide a functorised version ...
That sounds reasonable to me. If that turned out to be awkward, another option would be adding the function to the Key.t
when the key is created (the plugin configuration is available then). A third option is having some kind of registry of handlers for custom jobs.
in the context of the solver service we'd want to be able to submit jobs in a pipeline for doing the solves
In the case of the existing CIs, there is a pre-existing "analysis" build step that submits jobs to the solver internally, so that wouldn't need to use the plugin directly. In fact, even the existing cluster build jobs don't use the Current_cluster
builder directly, but instead define their own builder using the low-level Current_ocluster.Connection
API.
Avoids the need to build the payload twice, and allows adding other operations in future (e.g. get-cache-key) that are only needed in one direction.
Thanks @talex5 for the pointers, I didn't know about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - thanks!
CHANGES: - Update OBuilder to pull in Windows prereqs (@MisterDA ocurrent/ocluster#196, reviewed by @tmcgilchrist) - Worker pool capacity metric (@mtelvers ocurrent/ocluster#195, reviewed by @talex5) - Update Ocluster to be macOS capable (@patricoferris ocurrent/ocluster#152, reviewed by @tmcgilchrist) - Use rsync-hardlink from OBuilder rsync store (@MisterDA ocurrent/ocluster#189, reviewed by @tmcgilchrist) - Various Dune 3 fixes (@MisterDA ocurrent/ocluster#187, reviewed by @talex5) - Switch from Debian to Ubuntu for Worker Builds (@mtelvers ocurrent/ocluster#185, reviewed by @dra27) - Update to Prometheus 1.2 (@mtelvers ocurrent/ocluster#183, reviewed by @MisterDA) - Default to GitLab merge-requests refspecs (@MisterDA ocurrent/ocluster#180, reviewed by @tmcgilchrist) - GitLab: allow fetching merge-requests from remote origin (@MisterDA ocurrent/ocluster#175, reviewed by @tmcgilchrist) - Updated Dockerfile* and .dockerignore (@mtelvers ocurrent/ocluster#178, reviewed by @tmcgilchrist and @MisterDA) - Upgrade lwt to 5.5.0 (@maiste ocurrent/ocluster#171 ocurrent/ocluster#181, reviewed by @dra27) - Added --terse option to ocluster-admin-show and added new command ocluster-admin-exec (@mtelvers ocurrent/ocluster#165, reviewed by @tmcgilchrist and @dra27) - Support custom jobs, adds custom job specifications to the cluster API. (@patricoferris ocurrent/ocluster#156, reviewed by @talex5) - Continuing the joy of submodule URL changes (@dra27 ocurrent/ocluster#164) - Update ocaml/opam-repository SHA includes tar-unix.2.0.1 (@mtelvers ocurrent/ocluster#161, reviewed by @dra27) - Run git submodule update when resetting the git clone (@MisterDA ocurrent/ocluster#163, reviewed by @tmcgilchrist) - Cmdliner.1.1.0 support (@MisterDA ocurrent/ocluster#160) - Explicitly set confirmation levels to allow for manually triggered jobs. (@tmcgilchrist ocurrent/ocluster#159, reviewed by @TheLortex) - Merge Obuilder with rsync store (@MisterDA ocurrent/ocluster#155, reviewed by @talex5) - Sqlite3 remove usage of "FALSE" for compatibility with older versions (@art-w ocurrent/ocluster#150, reviewed by @talex5) - Revert adopting GNU tar format (@dra27 ocurrent/ocluster#148) - Update obuilder to latest (@mtelvers ocurrent/ocluster#147) - Fix deprecations in Fmt 0.8.10 (@tmcgilchrist ocurrent/ocluster#145) - Lwt_unix.yield was deprecated in favor of Lwt.pause (@MisterDA ocurrent/ocluster#142, reviewed by @dra27) - Remove runc build from Dockerfile.worker (@talex5 ocurrent/ocluster#140) - Add Windows support (@MisterDA ocurrent/ocluster#128, reviewed by @dra27 and @talex5) - Admin client: fix ref-counting on progress display (@talex5 ocurrent/ocluster#138) - Add --verbose to README examples (@talex5 ocurrent/ocluster#137) - Use --connect for the worker capability too (@talex5 ocurrent/ocluster#136) - Make free-space check work on Windows (@talex5 ocurrent/ocluster#134) - Support Fmt.cli and Logs.cli (@MisterDA ocurrent/ocluster#133, reviewed by @talex5) - Windows support prerequisites (@talex5 and @MisterDA ocurrent/ocluster#132) - Update to capnp-rpc 1.2 and fix connection handling (@talex5 ocurrent/ocluster#131) - Improve reporting of connection errors (@talex5 ocurrent/ocluster#130) - Depend on more current_* modules to build examples (@MisterDA ocurrent/ocluster#129, reviewed by @talex5) - Add ca-certificates to Dockerfile (@talex5 ocurrent/ocluster#127) - API to wait for a worker to drain, useful for scripts that need to wait for a worker to stop before continuing. (@talex5 ocurrent/ocluster#126) - Allow pausing/unpausing/forgetting unconnected workers (@talex5 ocurrent/ocluster#125) - Fix ref-leak when rejecting duplicate workers (@talex5 ocurrent/ocluster#124) - Improve handling of a worker's active state (@talex5 ocurrent/ocluster#123) - Report better error on duplicate worker registration (@talex5 ocurrent/ocluster#122) - Switch pool tests to expect tests (@talex5 ocurrent/ocluster#121) - Add timestamps to OBuilder logs (@talex5 ocurrent/ocluster#120) - Fix opam constraint on digestif (@kit-ty-kate ocurrent/ocluster#118, reviewed by @talex5) - Add support for secrets. Secrets are a way to transmit sensitive key-value pairs to workers, without having them displayed in any log files. (@TheLortex ocurrent/ocluster#116, reviewed by @talex5) - Add optional label to build_obuilder (@TheLortex ocurrent/ocluster#113, reviewed by @talex5)
CHANGES: - Update OBuilder to pull in Windows prereqs (@MisterDA ocurrent/ocluster#196, reviewed by @tmcgilchrist) - Worker pool capacity metric (@mtelvers ocurrent/ocluster#195, reviewed by @talex5) - Update Ocluster to be macOS capable (@patricoferris ocurrent/ocluster#152, reviewed by @tmcgilchrist) - Use rsync-hardlink from OBuilder rsync store (@MisterDA ocurrent/ocluster#189, reviewed by @tmcgilchrist) - Various Dune 3 fixes (@MisterDA ocurrent/ocluster#187, reviewed by @talex5) - Switch from Debian to Ubuntu for Worker Builds (@mtelvers ocurrent/ocluster#185, reviewed by @dra27) - Update to Prometheus 1.2 (@mtelvers ocurrent/ocluster#183, reviewed by @MisterDA) - Default to GitLab merge-requests refspecs (@MisterDA ocurrent/ocluster#180, reviewed by @tmcgilchrist) - GitLab: allow fetching merge-requests from remote origin (@MisterDA ocurrent/ocluster#175, reviewed by @tmcgilchrist) - Updated Dockerfile* and .dockerignore (@mtelvers ocurrent/ocluster#178, reviewed by @tmcgilchrist and @MisterDA) - Upgrade lwt to 5.5.0 (@maiste ocurrent/ocluster#171 ocurrent/ocluster#181, reviewed by @dra27) - Added --terse option to ocluster-admin-show and added new command ocluster-admin-exec (@mtelvers ocurrent/ocluster#165, reviewed by @tmcgilchrist and @dra27) - Support custom jobs, adds custom job specifications to the cluster API. (@patricoferris ocurrent/ocluster#156, reviewed by @talex5) - Continuing the joy of submodule URL changes (@dra27 ocurrent/ocluster#164) - Update ocaml/opam-repository SHA includes tar-unix.2.0.1 (@mtelvers ocurrent/ocluster#161, reviewed by @dra27) - Run git submodule update when resetting the git clone (@MisterDA ocurrent/ocluster#163, reviewed by @tmcgilchrist) - Cmdliner.1.1.0 support (@MisterDA ocurrent/ocluster#160) - Explicitly set confirmation levels to allow for manually triggered jobs. (@tmcgilchrist ocurrent/ocluster#159, reviewed by @TheLortex) - Merge Obuilder with rsync store (@MisterDA ocurrent/ocluster#155, reviewed by @talex5) - Sqlite3 remove usage of "FALSE" for compatibility with older versions (@art-w ocurrent/ocluster#150, reviewed by @talex5) - Revert adopting GNU tar format (@dra27 ocurrent/ocluster#148) - Update obuilder to latest (@mtelvers ocurrent/ocluster#147) - Fix deprecations in Fmt 0.8.10 (@tmcgilchrist ocurrent/ocluster#145) - Lwt_unix.yield was deprecated in favor of Lwt.pause (@MisterDA ocurrent/ocluster#142, reviewed by @dra27) - Remove runc build from Dockerfile.worker (@talex5 ocurrent/ocluster#140) - Add Windows support (@MisterDA ocurrent/ocluster#128, reviewed by @dra27 and @talex5) - Admin client: fix ref-counting on progress display (@talex5 ocurrent/ocluster#138) - Add --verbose to README examples (@talex5 ocurrent/ocluster#137) - Use --connect for the worker capability too (@talex5 ocurrent/ocluster#136) - Make free-space check work on Windows (@talex5 ocurrent/ocluster#134) - Support Fmt.cli and Logs.cli (@MisterDA ocurrent/ocluster#133, reviewed by @talex5) - Windows support prerequisites (@talex5 and @MisterDA ocurrent/ocluster#132) - Update to capnp-rpc 1.2 and fix connection handling (@talex5 ocurrent/ocluster#131) - Improve reporting of connection errors (@talex5 ocurrent/ocluster#130) - Depend on more current_* modules to build examples (@MisterDA ocurrent/ocluster#129, reviewed by @talex5) - Add ca-certificates to Dockerfile (@talex5 ocurrent/ocluster#127) - API to wait for a worker to drain, useful for scripts that need to wait for a worker to stop before continuing. (@talex5 ocurrent/ocluster#126) - Allow pausing/unpausing/forgetting unconnected workers (@talex5 ocurrent/ocluster#125) - Fix ref-leak when rejecting duplicate workers (@talex5 ocurrent/ocluster#124) - Improve handling of a worker's active state (@talex5 ocurrent/ocluster#123) - Report better error on duplicate worker registration (@talex5 ocurrent/ocluster#122) - Switch pool tests to expect tests (@talex5 ocurrent/ocluster#121) - Add timestamps to OBuilder logs (@talex5 ocurrent/ocluster#120) - Fix opam constraint on digestif (@kit-ty-kate ocurrent/ocluster#118, reviewed by @talex5) - Add support for secrets. Secrets are a way to transmit sensitive key-value pairs to workers, without having them displayed in any log files. (@TheLortex ocurrent/ocluster#116, reviewed by @talex5) - Add optional label to build_obuilder (@TheLortex ocurrent/ocluster#113, reviewed by @talex5)
This PR adds custom job specifications to the cluster API using the Cap'n'p
AnyPointer
. A user can then convert their payload to a pointer and have their worker handle their custom cases. Thekind
field was meant to allow distinguishing between multiple custom job specifications (I don't know if there's a better way to do that).The
simple_custom
test is meant to pretend theOBuilder
job specification doesn't exist and creates a custom job where the payload is anOBuilder
struct with a spec field.The PR also exposes the
default_build
in the cluster-worker library. The idea there is that if #151 is merged and released then people can extend the worker to custom jobs whilst still supporting docker and obuilder ones? It might be nice to then expose some of the logic currently inbin/worker
too. See this branch of solver-service for an example of using thecluster_worker
libraries.