Making the orchestrator work with arbitrary CLIs #174

nbren12 · 2020-03-06T02:13:00Z

After considering how to configure a baseline simulation, I realized a maintainability issue with the current strategy of the orchestrator not working with general command line interfaces.

To submit a prognostic baseline simulation without an ML model, I would like to use the same script as for the prognostic runs. Perhaps the simplest way to do this is to make "model_url" an optional argument.

Unfortunately, I don't think the orchestrator "from" capability will work with "extra_args". It would be nice to do something like this:

extra_args:
--sklearn-model-url: {from: train_sklearn_model}

Of course, I could write another wrapper which calls orchestrate_submit_job.py, but adding wrapper layers causes a significant maintainability problem. Any changes to the prognostic run CLI would require editing all the wrappers calling it. I also think the indirection with e.g. bash wrappers calling python scripts which submit K8s jobs is making the code pretty complicated and hard to follow.

To sum up, I think we should move towards having 1 command line interface per workflow, and the orchestrator should be expanded to work with most command line interfaces. I don't think this should be too hard. For simplicity, I think it is probably still OK to assume that the output is the last positional argument. I also think "extra_args" could be merged with "input" since IIRC the only real difference is the ordering of the generate command line arguments.

cc @frodre @brianhenn

The text was updated successfully, but these errors were encountered:

brianhenn · 2020-03-09T22:58:01Z

Make extra/optional args handle from/location syntax

Adds a bunch of edits to the workflow orchestrator to redesign its syntax and argument handling to be more generalizable and enhance the orchestrator maintainability. This includes: general handling of arguments for both inputs and other types of arguments, with ability to specify from/location, optional arguments, and the ability to pass multiple argument values available for all arguments. This resolves #174 . changed the argument passing syntax to reflect the merging of inputs/extra arguments; arguments are now passed in *positional_args + output_location + **{optional_arg: arg_value} order. updated all of the current workflow step submission scripts to work with the new syntax removed all of the intermediate orchestrator_submit.sh scripts as they are no longer needed by the orchestrator (all orchestrator calls are now directly to python), while maintaining any single-step shell script functionality in individual steps. added the ability to launch Dataflow jobs from the orchestrator without an intermediate script using the --runner argument These changes are included in this PR because they (among many other things) allow for the prognostic run to be called with the ML model path argument as an optional input from the model training step, to support running the ML and baseline prognostic runs together under the orchestrator. Also introduces some changes to allow running baseline prognostic simulations with ML included.

nbren12 mentioned this issue Mar 16, 2020

Baseline simulations and allPhysics configuration #176

Merged

brianhenn closed this as completed in #176 Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making the orchestrator work with arbitrary CLIs #174

Making the orchestrator work with arbitrary CLIs #174

nbren12 commented Mar 6, 2020 •

edited

Loading

brianhenn commented Mar 9, 2020

Making the orchestrator work with arbitrary CLIs #174

Making the orchestrator work with arbitrary CLIs #174

Comments

nbren12 commented Mar 6, 2020 • edited Loading

brianhenn commented Mar 9, 2020

nbren12 commented Mar 6, 2020 •

edited

Loading