Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom jobs #156

Merged
merged 7 commits into from
Apr 21, 2022
Merged

Support custom jobs #156

merged 7 commits into from
Apr 21, 2022

Conversation

patricoferris
Copy link
Contributor

This PR adds custom job specifications to the cluster API using the Cap'n'p AnyPointer. A user can then convert their payload to a pointer and have their worker handle their custom cases. The kind field was meant to allow distinguishing between multiple custom job specifications (I don't know if there's a better way to do that).

The simple_custom test is meant to pretend the OBuilder job specification doesn't exist and creates a custom job where the payload is an OBuilder struct with a spec field.

The PR also exposes the default_build in the cluster-worker library. The idea there is that if #151 is merged and released then people can extend the worker to custom jobs whilst still supporting docker and obuilder ones? It might be nice to then expose some of the logic currently in bin/worker too. See this branch of solver-service for an example of using the cluster_worker libraries.

Copy link
Contributor

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable.

(I doubt it matters, but I'm slightly disappointed that serialising a custom type requires doing it first to a temporary area and they copying it to the final message, instead of writing it in place. I guess the alternative is registering user-provided conversion functions and using an open type or something.)

@patricoferris
Copy link
Contributor Author

Thanks for the review. I also hit that snag and opted for the more wasteful approach of copying (mainly because I hadn't even considered open types so my use of conversion functions didn't work!).

@patricoferris patricoferris mentioned this pull request Jan 31, 2022
@patricoferris
Copy link
Contributor Author

This PR now also adds support for submitting the custom jobs using the ocurrent ocluster plugin. In order to build a "solver worker" it would be great to make the cluster worker library publicly available. Here I've just added it to the ocluster package with ocluster.worker but that's probably not desirable (it was just to unblock solver-worker) in which case it will need #151 probably.

Copy link
Contributor

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Needs rebasing, with the new level attribute added.

Not sure about exposing the worker - if we want people to build on our worker then we need to turn it into a proper library. Most workers won't want to be testing for the docker daemon, reporting docker metrics, etc. Might be better to drop that bit for now.

@patricoferris
Copy link
Contributor Author

Thanks for the review @talex5 (and sorry for the delay)!

The PR is rebased, the worker is no longer public, custom jobs require a cache_hint and the Custom API for Current_ocluster now also has a test.

Copy link
Contributor

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, except I think we need to do something about the OCurrent cache key.

@@ -26,6 +26,10 @@ type docker_build = {
push_target : target option;
} [@@deriving to_yojson]

type custom = Cluster_api.Custom.t

let custom_to_yojson c = `String (Cluster_api.Custom.kind c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work. When the payload changes, we'll need a new database key for it.

So I think we need some kind of Custom.key function to extract the important parts. We only need that on the sending side, so it might make sense to split action into two types, one for sending and one for receiving. I had a go at that here (just the splitting; not getting the key):

talex5@4aa07b2

Also, this cache key is only needed for the OCurrent plugin, so we might want to make it optional for other users of OCluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point, that makes sense. I'll see if I can extend your commit with the key part. Thanks :)) !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more, perhaps custom key generation should be part of the Current_ocluster plugin's configuration?

Do we need OCurrent support for custom jobs at the moment? If not, that could just be taken out and added later in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more, perhaps custom key generation should be part of the Current_ocluster plugin's configuration?

How would this function get to the Key module to change how custom jobs are digested? Would we provide a functorised version that needs a custom key generating function with the current implementation being the default one provided, or perhaps I'm missing something ?

Do we need OCurrent support for custom jobs at the moment? If not, that could just be taken out and added later in a separate PR.

It would definitely be useful to have OCurrent support for custom jobs, in the context of the solver service we'd want to be able to submit jobs in a pipeline for doing the solves, but as you said maybe it is easier to do this in a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this function get to the Key module to change how custom jobs are digested? Would we provide a functorised version ...

That sounds reasonable to me. If that turned out to be awkward, another option would be adding the function to the Key.t when the key is created (the plugin configuration is available then). A third option is having some kind of registry of handlers for custom jobs.

in the context of the solver service we'd want to be able to submit jobs in a pipeline for doing the solves

In the case of the existing CIs, there is a pre-existing "analysis" build step that submits jobs to the solver internally, so that wouldn't need to use the plugin directly. In fact, even the existing cluster build jobs don't use the Current_cluster builder directly, but instead define their own builder using the low-level Current_ocluster.Connection API.

talex5 and others added 2 commits April 11, 2022 11:26
Avoids the need to build the payload twice, and allows adding other
operations in future (e.g. get-cache-key) that are only needed in one
direction.
@patricoferris
Copy link
Contributor Author

Thanks @talex5 for the pointers, I didn't know about the Connection API! I think all of the custom parts have been removed from Current_ocluster and the tests now show how to use the underlying Connection.t to submit custom jobs (I think...)

@talex5 talex5 merged commit 4cb48ea into ocurrent:master Apr 21, 2022
Copy link
Contributor

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - thanks!

tmcgilchrist added a commit to tmcgilchrist/opam-repository that referenced this pull request Nov 9, 2022
CHANGES:

- Update OBuilder to pull in Windows prereqs (@MisterDA ocurrent/ocluster#196, reviewed by @tmcgilchrist)
- Worker pool capacity metric (@mtelvers ocurrent/ocluster#195, reviewed by @talex5)
- Update Ocluster to be macOS capable (@patricoferris ocurrent/ocluster#152, reviewed by @tmcgilchrist)
- Use rsync-hardlink from OBuilder rsync store (@MisterDA ocurrent/ocluster#189, reviewed by @tmcgilchrist)
- Various Dune 3 fixes (@MisterDA ocurrent/ocluster#187, reviewed by @talex5)
- Switch from Debian to Ubuntu for Worker Builds (@mtelvers ocurrent/ocluster#185, reviewed by @dra27)
- Update to Prometheus 1.2 (@mtelvers ocurrent/ocluster#183, reviewed by @MisterDA)
- Default to GitLab merge-requests refspecs (@MisterDA ocurrent/ocluster#180, reviewed by @tmcgilchrist)
- GitLab: allow fetching merge-requests from remote origin (@MisterDA ocurrent/ocluster#175, reviewed by @tmcgilchrist)
- Updated Dockerfile* and .dockerignore (@mtelvers ocurrent/ocluster#178, reviewed by @tmcgilchrist and @MisterDA)
- Upgrade lwt to 5.5.0 (@maiste ocurrent/ocluster#171 ocurrent/ocluster#181, reviewed by @dra27)
- Added --terse option to ocluster-admin-show and added new command ocluster-admin-exec (@mtelvers ocurrent/ocluster#165, reviewed by @tmcgilchrist and @dra27)
- Support custom jobs, adds custom job specifications to the cluster API. (@patricoferris ocurrent/ocluster#156, reviewed by @talex5)
- Continuing the joy of submodule URL changes (@dra27 ocurrent/ocluster#164)
- Update ocaml/opam-repository SHA includes tar-unix.2.0.1 (@mtelvers ocurrent/ocluster#161, reviewed by @dra27)
- Run git submodule update when resetting the git clone (@MisterDA ocurrent/ocluster#163, reviewed by @tmcgilchrist)
- Cmdliner.1.1.0 support (@MisterDA ocurrent/ocluster#160)
- Explicitly set confirmation levels to allow for manually triggered jobs. (@tmcgilchrist ocurrent/ocluster#159, reviewed by @TheLortex)
- Merge Obuilder with rsync store (@MisterDA ocurrent/ocluster#155, reviewed by @talex5)
- Sqlite3 remove usage of "FALSE" for compatibility with older versions (@art-w ocurrent/ocluster#150, reviewed by @talex5)
- Revert adopting GNU tar format (@dra27 ocurrent/ocluster#148)
- Update obuilder to latest (@mtelvers ocurrent/ocluster#147)
- Fix deprecations in Fmt 0.8.10 (@tmcgilchrist ocurrent/ocluster#145)
- Lwt_unix.yield was deprecated in favor of Lwt.pause (@MisterDA ocurrent/ocluster#142, reviewed by @dra27)
- Remove runc build from Dockerfile.worker (@talex5 ocurrent/ocluster#140)
- Add Windows support (@MisterDA ocurrent/ocluster#128, reviewed by @dra27 and @talex5)
- Admin client: fix ref-counting on progress display (@talex5 ocurrent/ocluster#138)
- Add --verbose to README examples (@talex5 ocurrent/ocluster#137)
- Use --connect for the worker capability too (@talex5 ocurrent/ocluster#136)
- Make free-space check work on Windows (@talex5 ocurrent/ocluster#134)
- Support Fmt.cli and Logs.cli (@MisterDA ocurrent/ocluster#133, reviewed by @talex5)
- Windows support prerequisites (@talex5 and @MisterDA ocurrent/ocluster#132)
- Update to capnp-rpc 1.2 and fix connection handling (@talex5 ocurrent/ocluster#131)
- Improve reporting of connection errors (@talex5 ocurrent/ocluster#130)
- Depend on more current_* modules to build examples (@MisterDA ocurrent/ocluster#129, reviewed by @talex5)
- Add ca-certificates to Dockerfile (@talex5 ocurrent/ocluster#127)
- API to wait for a worker to drain, useful for scripts that need to wait for a worker to stop before continuing. (@talex5 ocurrent/ocluster#126)
- Allow pausing/unpausing/forgetting unconnected workers (@talex5 ocurrent/ocluster#125)
- Fix ref-leak when rejecting duplicate workers (@talex5 ocurrent/ocluster#124)
- Improve handling of a worker's active state (@talex5 ocurrent/ocluster#123)
- Report better error on duplicate worker registration (@talex5 ocurrent/ocluster#122)
- Switch pool tests to expect tests (@talex5 ocurrent/ocluster#121)
- Add timestamps to OBuilder logs (@talex5 ocurrent/ocluster#120)
- Fix opam constraint on digestif (@kit-ty-kate ocurrent/ocluster#118, reviewed by @talex5)
- Add support for secrets. Secrets are a way to transmit sensitive key-value pairs to workers, without having them displayed in any log files. (@TheLortex ocurrent/ocluster#116, reviewed by @talex5)
- Add optional label to build_obuilder (@TheLortex ocurrent/ocluster#113, reviewed by @talex5)
dinosaure pushed a commit to tmcgilchrist/opam-repository that referenced this pull request Dec 8, 2022
CHANGES:

- Update OBuilder to pull in Windows prereqs (@MisterDA ocurrent/ocluster#196, reviewed by @tmcgilchrist)
- Worker pool capacity metric (@mtelvers ocurrent/ocluster#195, reviewed by @talex5)
- Update Ocluster to be macOS capable (@patricoferris ocurrent/ocluster#152, reviewed by @tmcgilchrist)
- Use rsync-hardlink from OBuilder rsync store (@MisterDA ocurrent/ocluster#189, reviewed by @tmcgilchrist)
- Various Dune 3 fixes (@MisterDA ocurrent/ocluster#187, reviewed by @talex5)
- Switch from Debian to Ubuntu for Worker Builds (@mtelvers ocurrent/ocluster#185, reviewed by @dra27)
- Update to Prometheus 1.2 (@mtelvers ocurrent/ocluster#183, reviewed by @MisterDA)
- Default to GitLab merge-requests refspecs (@MisterDA ocurrent/ocluster#180, reviewed by @tmcgilchrist)
- GitLab: allow fetching merge-requests from remote origin (@MisterDA ocurrent/ocluster#175, reviewed by @tmcgilchrist)
- Updated Dockerfile* and .dockerignore (@mtelvers ocurrent/ocluster#178, reviewed by @tmcgilchrist and @MisterDA)
- Upgrade lwt to 5.5.0 (@maiste ocurrent/ocluster#171 ocurrent/ocluster#181, reviewed by @dra27)
- Added --terse option to ocluster-admin-show and added new command ocluster-admin-exec (@mtelvers ocurrent/ocluster#165, reviewed by @tmcgilchrist and @dra27)
- Support custom jobs, adds custom job specifications to the cluster API. (@patricoferris ocurrent/ocluster#156, reviewed by @talex5)
- Continuing the joy of submodule URL changes (@dra27 ocurrent/ocluster#164)
- Update ocaml/opam-repository SHA includes tar-unix.2.0.1 (@mtelvers ocurrent/ocluster#161, reviewed by @dra27)
- Run git submodule update when resetting the git clone (@MisterDA ocurrent/ocluster#163, reviewed by @tmcgilchrist)
- Cmdliner.1.1.0 support (@MisterDA ocurrent/ocluster#160)
- Explicitly set confirmation levels to allow for manually triggered jobs. (@tmcgilchrist ocurrent/ocluster#159, reviewed by @TheLortex)
- Merge Obuilder with rsync store (@MisterDA ocurrent/ocluster#155, reviewed by @talex5)
- Sqlite3 remove usage of "FALSE" for compatibility with older versions (@art-w ocurrent/ocluster#150, reviewed by @talex5)
- Revert adopting GNU tar format (@dra27 ocurrent/ocluster#148)
- Update obuilder to latest (@mtelvers ocurrent/ocluster#147)
- Fix deprecations in Fmt 0.8.10 (@tmcgilchrist ocurrent/ocluster#145)
- Lwt_unix.yield was deprecated in favor of Lwt.pause (@MisterDA ocurrent/ocluster#142, reviewed by @dra27)
- Remove runc build from Dockerfile.worker (@talex5 ocurrent/ocluster#140)
- Add Windows support (@MisterDA ocurrent/ocluster#128, reviewed by @dra27 and @talex5)
- Admin client: fix ref-counting on progress display (@talex5 ocurrent/ocluster#138)
- Add --verbose to README examples (@talex5 ocurrent/ocluster#137)
- Use --connect for the worker capability too (@talex5 ocurrent/ocluster#136)
- Make free-space check work on Windows (@talex5 ocurrent/ocluster#134)
- Support Fmt.cli and Logs.cli (@MisterDA ocurrent/ocluster#133, reviewed by @talex5)
- Windows support prerequisites (@talex5 and @MisterDA ocurrent/ocluster#132)
- Update to capnp-rpc 1.2 and fix connection handling (@talex5 ocurrent/ocluster#131)
- Improve reporting of connection errors (@talex5 ocurrent/ocluster#130)
- Depend on more current_* modules to build examples (@MisterDA ocurrent/ocluster#129, reviewed by @talex5)
- Add ca-certificates to Dockerfile (@talex5 ocurrent/ocluster#127)
- API to wait for a worker to drain, useful for scripts that need to wait for a worker to stop before continuing. (@talex5 ocurrent/ocluster#126)
- Allow pausing/unpausing/forgetting unconnected workers (@talex5 ocurrent/ocluster#125)
- Fix ref-leak when rejecting duplicate workers (@talex5 ocurrent/ocluster#124)
- Improve handling of a worker's active state (@talex5 ocurrent/ocluster#123)
- Report better error on duplicate worker registration (@talex5 ocurrent/ocluster#122)
- Switch pool tests to expect tests (@talex5 ocurrent/ocluster#121)
- Add timestamps to OBuilder logs (@talex5 ocurrent/ocluster#120)
- Fix opam constraint on digestif (@kit-ty-kate ocurrent/ocluster#118, reviewed by @talex5)
- Add support for secrets. Secrets are a way to transmit sensitive key-value pairs to workers, without having them displayed in any log files. (@TheLortex ocurrent/ocluster#116, reviewed by @talex5)
- Add optional label to build_obuilder (@TheLortex ocurrent/ocluster#113, reviewed by @talex5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants