yadage: allow passing initdir configuration #309

mvidalgarcia · 2020-03-23T15:49:25Z

reana.yaml could be like:

version: 0.6.0
[...]
workflow:
  type: yadage
  file: workflow.yaml
  options: 
    initdir: ./myinitdir
[...]

diegodelemos · 2020-03-25T14:54:29Z

reana_workflow_controller/workflow_run_manager.py

@@ -186,6 +187,8 @@ def _workflow_engine_command(self, overwrite_input_parameters=None,
                        self.workflow.get_specification()).encode()),
                    workflow_file=self.workflow.reana_specification.get(
                        'workflow').get('file'),
+                    workflow_initdir=self.workflow.reana_specification.get(


In R-W-E-CWL we are passing --operational-options. Could we do the same here? The number of customisable options users might ask for will likely grow, and if that happens we would always need to come to this file to update it with glue code for the new option. On the other hand, if we send everything, options are added in reana.yaml by the user, and only changes in the specific workflow engine are needed to parse them. Let me know what you think :)

P.S. Similar pattern to the one described in reanahub/reana#277

While elaborating this answer and re-reading it I have one question. Is the new options field in reana.yaml workflow engine specific? I think the answer is yes. Then I would rename along the lines of:

engine_options: for workflow engine specific options

options: (or better name for a "general options" field) which would have common options across engines, ideally one day CACHE=off|on.

I totally agree about your first comment, I'll look into it.
WRT the second comment, I'm not sure, for now, it is true that is engine specific but it could have more general options as the CACHE you mentioned. We could parse the options and grab specific keywords to handle them properly and send the rest to the workflow engine. But we could have different fields for this as well...

Checking how the operational parameters are passed, I see that they shall be passed as part of inputs.options in reana.yaml (BTW this is missing in the docs I think). Should we include here the options from workflow.options, merge both in one, ...? This goes in line with Diego's second comment discussion.
We can discuss this in the next standup cc/ @reanahub/developers

I will just illustrate what we are discussing to make it easier for me to follow:

Right now what works (not saying that is how it should be):

Including options in reana.yaml (this was undocumented and missing from the reana.yaml JSONSchema as pointed out by @mvidalgarcia)

version: 0.3.0 inputs: files: - code/helloworld.py - data/names.txt parameters: helloworld: code/helloworld.py inputfile: data/names.txt outputfile: results/greetings.txt sleeptime: 0 + options: + CACHE: off + TARGET: gendata workflow: type: serial specification: steps: - environment: 'python:2.7-slim' commands: - python "${helloworld}" --inputfile "${inputfile}" --outputfile "${outputfile}" --sleeptime ${sleeptime} outputs: files: - results/greetings.txt

Including options when invoking the CLI (documented in the REANA CLI reference)

$ reana-client start -o TARGET=gendata -o CACHE=off

Now we have the following with global: allow operational options in reana.yaml reana-client#400, because we didn't remember we were having the options field in reana.yaml (was missing from the JSONSChema)

On the reana.yaml

version: 0.3.0 inputs: files: - code/helloworld.py - data/names.txt parameters: helloworld: code/helloworld.py inputfile: data/names.txt outputfile: results/greetings.txt sleeptime: 0 workflow: type: serial + options: + CACHE: off + TARGET: gendata specification: steps: - environment: 'python:2.7-slim' commands: - python "${helloworld}" --inputfile "${inputfile}" --outputfile "${outputfile}" --sleeptime ${sleeptime} outputs: files: - results/greetings.txt

The question is whether the options belong to:

(1) the inputs of a workflow (inputs)

(2) configuration for the workflow (workflow.options).

(2) Seems more natural to me since inputs is a word loaded with a greater meaning (remember the four questions) for the analysis and seems that everything we add to the workflow options is not directly affecting the science behind it. In the future I don't see someone seraching for inputs: CACHE off, because probably this was just a transient parameter passed to avoid a caching issue and make the build pass. However a search inputs: events 20000 would have direct impact in the search results from the physics point of view.

1. We have to distinguish operational options being in reana.yaml and operational options passed via CLI.

Consider the following pseudo example:

inputs: parameters: myparam: myparamvalue options: myoption: myoptionvalue

and users running the workflow as:

$ reana-client run -p myparam=42.1 -p mynewparam=42.2 -o myoption=42.3 -o mynewoption=42.4

That is we have to be ready to manage (1) params as they are coming from reana.yaml, (2) params as they are coming from CLI, (3) options as they are coming from reana.yaml (once you will introduce this feature! we don't have it yet) and (4) options as they are coming from CLI.

Some complexities: same variables may be present only in reana.yaml some only in CLI, some in both. We have to handle both their existence and which value overrides which value. (E.g. the variable values from no.3 and no.4, if they are present, will override variable values from no.1 and no.2.)

Ideally, we should store all of them separately in REANA DB identically to what user has passed. I.e. the expansion should not really happen on the reana-client side, and the "source-of-truth" storage should be what the user has passed. Only after that we can think of merging options into one (i.e. overriding ones in reana.yaml by ones passed via CLI).

2. In addition, we have a complication that some options are generic, and some options are workflow-engine specific. As @diegodelemos mentions, we could store them in two different places; however since we already define workflow.type property, we can also for simplicity use only one options property, and the client will validate and will reply?

Example:

inputs: parameters: myparam: myparamvalue options: myoption: myoptionvalue toplevel: foobar workflow: type: cwl file: myworkflow.cwl

would reply *"Sorry, you have specified both toplevel option and cwl workflow engine, which does not support it."

It'll be easier for users to specify it in one place... We should take the trouble of properly validating which feature is supported by which engine. (See the old musings about feature matrix for each engine; we have started to have it in docs, we should also make it programmer-friendly e.g. in r-commons so that r-client could take advantage of that knowledge.)

3. If this sounds overwhelming, we could simplify usage scenario by not allowing to specify -o and -p in CLI, reducing the number of cases mentioned above from four to two. I'm not sure it would be a good idea to always ask people to edit reana.yaml for even simplest of "quick runs", though.

Another alternative could be to allow only -o and -p and "generate" reana.yaml based on that.

Currently we are a bit in the mixture of both approaches, hence the additional complexity... but it should not be that hard to manage either, so I guess with a few rules about what to store as the source-of-truth and about what-overrides-what, it can work nicely!

P.S. See also some past musings on this "science parameters" + "global REANA options" +"workflow-specific options" topic e.g. here:

parameters vs operational_parameters reana-server#105

cwl: support for partial workflow execution reana-workflow-engine-cwl#69

that everything we add to the workflow options is not directly affecting the science behind it.

This is true for operational options like CACHE=off, but for workflow engine options like toplevel the science would not even run properly... So these latter ones are more like inputs actually.

lukasheinrich · 2020-03-25T15:16:10Z

yadage can read in an overall coniguration file instead of command line parameters

e.g.

$> cat yadage.yml
dataarg: workdir
initdata:
  nevents: 100
workflow: madgraph_delphes.yml
toplevel: from-github/phenochain
visualize: true
verbosity: INFO

which can be executed with

yadage-run -f yadage.yml

so probably one could pass in the same format, and R-W-E-Y, would only set e.g. workdir

reana_workflow_controller/workflow_run_manager.py

closes reanahub/reana-client#396

diegodelemos reviewed Mar 25, 2020

View reviewed changes

tiborsimko mentioned this pull request Mar 27, 2020

Operational options reanahub/reana-workflow-engine-yadage#93

Open

mvidalgarcia force-pushed the rc/396-initdir-configurable branch from c57e759 to 0f86284 Compare April 2, 2020 17:04

diegodelemos reviewed Apr 3, 2020

View reviewed changes

reana_workflow_controller/workflow_run_manager.py Outdated Show resolved Hide resolved

mvidalgarcia force-pushed the rc/396-initdir-configurable branch from 0f86284 to 8244b5d Compare April 3, 2020 14:42

yadage: allow passing operational options

b08d617

closes reanahub/reana-client#396

mvidalgarcia force-pushed the rc/396-initdir-configurable branch from 8244b5d to b08d617 Compare April 3, 2020 16:58

diegodelemos approved these changes Apr 7, 2020

View reviewed changes

diegodelemos merged commit b08d617 into reanahub:master Apr 7, 2020

mvidalgarcia deleted the rc/396-initdir-configurable branch April 7, 2020 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yadage: allow passing initdir configuration #309

yadage: allow passing initdir configuration #309

mvidalgarcia commented Mar 23, 2020 •

edited

Loading

diegodelemos Mar 25, 2020

diegodelemos Mar 25, 2020

mvidalgarcia Mar 25, 2020

mvidalgarcia Mar 25, 2020

diegodelemos Mar 27, 2020

tiborsimko Mar 27, 2020

tiborsimko Mar 27, 2020

lukasheinrich commented Mar 25, 2020 •

edited

Loading

yadage: allow passing initdir configuration #309

yadage: allow passing initdir configuration #309

Conversation

mvidalgarcia commented Mar 23, 2020 • edited Loading

diegodelemos Mar 25, 2020

Choose a reason for hiding this comment

diegodelemos Mar 25, 2020

Choose a reason for hiding this comment

mvidalgarcia Mar 25, 2020

Choose a reason for hiding this comment

mvidalgarcia Mar 25, 2020

Choose a reason for hiding this comment

diegodelemos Mar 27, 2020

Choose a reason for hiding this comment

tiborsimko Mar 27, 2020

Choose a reason for hiding this comment

tiborsimko Mar 27, 2020

Choose a reason for hiding this comment

lukasheinrich commented Mar 25, 2020 • edited Loading

mvidalgarcia commented Mar 23, 2020 •

edited

Loading

lukasheinrich commented Mar 25, 2020 •

edited

Loading