-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yadage: allow passing initdir configuration #309
yadage: allow passing initdir configuration #309
Conversation
@@ -186,6 +187,8 @@ def _workflow_engine_command(self, overwrite_input_parameters=None, | |||
self.workflow.get_specification()).encode()), | |||
workflow_file=self.workflow.reana_specification.get( | |||
'workflow').get('file'), | |||
workflow_initdir=self.workflow.reana_specification.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In R-W-E-CWL we are passing --operational-options
. Could we do the same here? The number of customisable options users might ask for will likely grow, and if that happens we would always need to come to this file to update it with glue code for the new option. On the other hand, if we send everything, options are added in reana.yaml
by the user, and only changes in the specific workflow engine are needed to parse them. Let me know what you think :)
P.S. Similar pattern to the one described in reanahub/reana#277
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While elaborating this answer and re-reading it I have one question. Is the new options
field in reana.yaml
workflow engine specific? I think the answer is yes. Then I would rename along the lines of:
engine_options
: for workflow engine specific optionsoptions
: (or better name for a "general options" field) which would have common options across engines, ideally one dayCACHE=off|on
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree about your first comment, I'll look into it.
WRT the second comment, I'm not sure, for now, it is true that is engine specific but it could have more general options as the CACHE
you mentioned. We could parse the options and grab specific keywords to handle them properly and send the rest to the workflow engine. But we could have different fields for this as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking how the operational parameters are passed, I see that they shall be passed as part of inputs.options
in reana.yaml
(BTW this is missing in the docs I think). Should we include here the options from workflow.options
, merge both in one, ...? This goes in line with Diego's second comment discussion.
We can discuss this in the next standup cc/ @reanahub/developers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will just illustrate what we are discussing to make it easier for me to follow:
-
Right now what works (not saying that is how it should be):
- Including
options
inreana.yaml
(this was undocumented and missing from thereana.yaml
JSONSchema as pointed out by @mvidalgarcia)
version: 0.3.0 inputs: files: - code/helloworld.py - data/names.txt parameters: helloworld: code/helloworld.py inputfile: data/names.txt outputfile: results/greetings.txt sleeptime: 0 + options: + CACHE: off + TARGET: gendata workflow: type: serial specification: steps: - environment: 'python:2.7-slim' commands: - python "${helloworld}" --inputfile "${inputfile}" --outputfile "${outputfile}" --sleeptime ${sleeptime} outputs: files: - results/greetings.txt
- Including options when invoking the CLI (documented in the REANA CLI reference)
$ reana-client start -o TARGET=gendata -o CACHE=off
- Including
-
Now we have the following with global: allow operational options in reana.yaml reana-client#400, because we didn't remember we were having the
options
field inreana.yaml
(was missing from the JSONSChema)- On the
reana.yaml
version: 0.3.0 inputs: files: - code/helloworld.py - data/names.txt parameters: helloworld: code/helloworld.py inputfile: data/names.txt outputfile: results/greetings.txt sleeptime: 0 workflow: type: serial + options: + CACHE: off + TARGET: gendata specification: steps: - environment: 'python:2.7-slim' commands: - python "${helloworld}" --inputfile "${inputfile}" --outputfile "${outputfile}" --sleeptime ${sleeptime} outputs: files: - results/greetings.txt
- On the
The question is whether the options belong to:
- (1) the inputs of a workflow (
inputs
) - (2) configuration for the workflow (
workflow.options
).
(2) Seems more natural to me since inputs
is a word loaded with a greater meaning (remember the four questions) for the analysis and seems that everything we add to the workflow options is not directly affecting the science behind it. In the future I don't see someone seraching for inputs: CACHE off
, because probably this was just a transient parameter passed to avoid a caching issue and make the build pass. However a search inputs: events 20000
would have direct impact in the search results from the physics point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. We have to distinguish operational options being in reana.yaml
and operational options passed via CLI.
Consider the following pseudo example:
inputs:
parameters:
myparam: myparamvalue
options:
myoption: myoptionvalue
and users running the workflow as:
$ reana-client run -p myparam=42.1 -p mynewparam=42.2 -o myoption=42.3 -o mynewoption=42.4
That is we have to be ready to manage (1) params as they are coming from reana.yaml, (2) params as they are coming from CLI, (3) options as they are coming from reana.yaml (once you will introduce this feature! we don't have it yet) and (4) options as they are coming from CLI.
Some complexities: same variables may be present only in reana.yaml
some only in CLI, some in both. We have to handle both their existence and which value overrides which value. (E.g. the variable values from no.3 and no.4, if they are present, will override variable values from no.1 and no.2.)
Ideally, we should store all of them separately in REANA DB identically to what user has passed. I.e. the expansion should not really happen on the reana-client
side, and the "source-of-truth" storage should be what the user has passed. Only after that we can think of merging options into one (i.e. overriding ones in reana.yaml by ones passed via CLI).
2. In addition, we have a complication that some options are generic, and some options are workflow-engine specific. As @diegodelemos mentions, we could store them in two different places; however since we already define workflow.type
property, we can also for simplicity use only one options
property, and the client will validate and will reply?
Example:
inputs:
parameters:
myparam: myparamvalue
options:
myoption: myoptionvalue
toplevel: foobar
workflow:
type: cwl
file: myworkflow.cwl
would reply *"Sorry, you have specified both toplevel
option and cwl
workflow engine, which does not support it."
It'll be easier for users to specify it in one place... We should take the trouble of properly validating which feature is supported by which engine. (See the old musings about feature matrix for each engine; we have started to have it in docs, we should also make it programmer-friendly e.g. in r-commons
so that r-client
could take advantage of that knowledge.)
3. If this sounds overwhelming, we could simplify usage scenario by not allowing to specify -o
and -p
in CLI, reducing the number of cases mentioned above from four to two. I'm not sure it would be a good idea to always ask people to edit reana.yaml
for even simplest of "quick runs", though.
Another alternative could be to allow only -o
and -p
and "generate" reana.yaml
based on that.
Currently we are a bit in the mixture of both approaches, hence the additional complexity... but it should not be that hard to manage either, so I guess with a few rules about what to store as the source-of-truth and about what-overrides-what, it can work nicely!
P.S. See also some past musings on this "science parameters" + "global REANA options" +"workflow-specific options" topic e.g. here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that everything we add to the workflow options is not directly affecting the science behind it.
This is true for operational options like CACHE=off, but for workflow engine options like toplevel
the science would not even run properly... So these latter ones are more like inputs actually.
yadage can read in an overall coniguration file instead of command line parameters e.g.
which can be executed with
so probably one could pass in the same format, and R-W-E-Y, would only set e.g. |
c57e759
to
0f86284
Compare
0f86284
to
8244b5d
Compare
8244b5d
to
b08d617
Compare
closes reanahub/reana-client#396
reana.yaml could be like: