Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making the orchestrator work with arbitrary CLIs #174

Closed
nbren12 opened this issue Mar 6, 2020 · 1 comment · Fixed by #176
Closed

Making the orchestrator work with arbitrary CLIs #174

nbren12 opened this issue Mar 6, 2020 · 1 comment · Fixed by #176

Comments

@nbren12
Copy link
Contributor

nbren12 commented Mar 6, 2020

After considering how to configure a baseline simulation, I realized a maintainability issue with the current strategy of the orchestrator not working with general command line interfaces.

To submit a prognostic baseline simulation without an ML model, I would like to use the same script as for the prognostic runs. Perhaps the simplest way to do this is to make "model_url" an optional argument.

Unfortunately, I don't think the orchestrator "from" capability will work with "extra_args". It would be nice to do something like this:

extra_args:
--sklearn-model-url: {from: train_sklearn_model}

Of course, I could write another wrapper which calls orchestrate_submit_job.py, but adding wrapper layers causes a significant maintainability problem. Any changes to the prognostic run CLI would require editing all the wrappers calling it. I also think the indirection with e.g. bash wrappers calling python scripts which submit K8s jobs is making the code pretty complicated and hard to follow.

To sum up, I think we should move towards having 1 command line interface per workflow, and the orchestrator should be expanded to work with most command line interfaces. I don't think this should be too hard. For simplicity, I think it is probably still OK to assume that the output is the last positional argument. I also think "extra_args" could be merged with "input" since IIRC the only real difference is the ordering of the generate command line arguments.

cc @frodre @brianhenn

@brianhenn
Copy link
Contributor

brianhenn pushed a commit that referenced this issue Mar 19, 2020
Adds a bunch of edits to the workflow orchestrator to redesign its syntax and argument handling to be more generalizable and enhance the orchestrator maintainability. This includes:

general handling of arguments for both inputs and other types of arguments, with ability to specify from/location, optional arguments, and the ability to pass multiple argument values available for all arguments. This resolves #174 .
changed the argument passing syntax to reflect the merging of inputs/extra arguments; arguments are now passed in *positional_args + output_location + **{optional_arg: arg_value} order.
updated all of the current workflow step submission scripts to work with the new syntax
removed all of the intermediate orchestrator_submit.sh scripts as they are no longer needed by the orchestrator (all orchestrator calls are now directly to python), while maintaining any single-step shell script functionality in individual steps.
added the ability to launch Dataflow jobs from the orchestrator without an intermediate script using the --runner argument
These changes are included in this PR because they (among many other things) allow for the prognostic run to be called with the ML model path argument as an optional input from the model training step, to support running the ML and baseline prognostic runs together under the orchestrator.

Also introduces some changes to allow running baseline prognostic simulations with ML included.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants