Nextflow parameter description scheme #866

sven1103 · 2018-09-19T14:37:41Z

TL;DR

A naming scheme to enable meta-data annotation for workflow parameters.

Details

Usually, workflow-specific execution parameters for the single processes are defined in the params scope, a DSL-feature Nextflow provides to access parameter variables from the workflow script during wf execution.

Currently, there is no naming scheme / convention / language feature for annotating parameters with description text, mandatory/optional flags or similar.

This could be useful though for upstream applications in order to build graphical user interfaces and configure a workflow correctly before execution in a dynamic, user-friendly way.

I am very happy for any input here and design suggestions :)

Best, Sven

The text was updated successfully, but these errors were encountered:

pditommaso · 2018-09-19T14:41:33Z

What kind of scheme/metadata would be useful for your use case ?

ewels · 2018-09-19T15:49:25Z

Suggestions:

Name (allow spaces, stuff that shouldn't be in the variable name)
Description
Type
- bool
- numeric
  - max
  - min
- str
  - max length

I think we spoke about this previously and pointed to some HTML spec or something, but that's probably overkill for now.

Phil

sven1103 · 2018-09-19T16:25:49Z

default value

sven1103 · 2018-09-19T16:28:35Z

I think mandatory/optional is not necessary, as on definition of a parameter, a default value should always be present.

ewels · 2018-09-19T16:29:28Z

Yes, and we already have default values?

sven1103 · 2018-09-19T16:30:49Z

implicit in the params definition yes, but not in the descriptions section

sven1103 · 2018-09-19T16:32:20Z

I mean, a parser could just check the following parameter definition after a description section, maybe this is sufficient?

sven1103 · 2018-09-19T16:41:23Z

For simplicity, i would suggest to have normal multi-line comment strings as in Groovy and extend it by tags, such as:

/**
The number of iterations of the algorithm

@name: Algo iterations
@type:  int
@range: {0,10(,1)} // including step size for range?
*/
params.iterations = 10

/**
Sequence type

@name: Nucleotide sequence type
@type: enum
@range: ["dna", "rna"]
*/
params.type = "dna"

maybe sth. in that direction?

pditommaso · 2018-09-20T07:48:55Z

Let's focus on metadata fields before, I see the following:

name: (*) the parameter name, it should a valid identifier (no blanks, etc)
label: human readable name
help/usage: param description showed in the help text
type: (*) string, text (multiline string) integer, decimal, file (path), url, upload, choice, range
min: (value for numeric, length for string)
max: (value for numeric, length for string)
defvalue: (*) default value
pattern: regex validation rule

pditommaso · 2018-09-20T07:49:54Z

I'm adding @ypriverol in this thread because he's interested in this topic.

pditommaso · 2018-09-20T07:57:38Z

Related to #88, #144.

sven1103 · 2018-09-20T07:59:57Z

👍 looks good to me

ewels · 2018-09-20T13:19:38Z

type: (*) string, text (multiline string) integer, decimal, file (path), url, upload, choice, range

Maybe distinguish between a choice that can have multiple selected or just one? eg. radio and checkbox in HTML forms.

Love the idea of having a generic regex validation available 👍

msmootsgi · 2018-09-20T19:37:35Z

It sounds like you want to define your parameters with json schema or something similar. That would open up a world of possibilities.

pditommaso · 2018-09-21T07:26:14Z

Maybe distinguish between a choice that can have multiple selected or just one? eg. radio and checkbox in HTML forms.

Yes, tho thinking more I'm realising we should distinguish between data type and data visualisation. The data type could be: string, integer, decimal, boolean, mem unit. Then we can have an attribute to define how the field is rendered and another for validation. To recap:

name: (*) the parameter name, it should a valid identifier (no blanks, etc)
label: human readable name
usage: param description showed in the help text
type: (*) string, integer, decimal, boolean, mem unit
render: file (local path), url, upload, range, check-box, radio-button, list-box, drop-down
choices: possible values for check-box, radio-button, list-box, drop-down
defvalue: (*) default value
pattern: regex validation rule
arity: 1 or many

It sounds like you want to define your parameters with json schema or something similar. That would open up a world of possibilities.

That may be useful!

pditommaso · 2018-09-21T07:28:24Z

For the syntax, the annotation style proposed by Sven looks too verbose IMO. What about what describe here #144 (comment) ? (let's continue the discussion in this thread)

ewels · 2018-09-21T09:43:16Z

I think @sven1103's suggestion was just a quick one that wouldn't affect nextflow execution so that we could get started. But yes, it would be good to have a proper syntax.

Personally, I would really like to keep the current syntax as it is, both for backwards compatability and also because it's so simple and intuitive. Then have an alternative method for adding to params where you can add in additional fields if you want to. params.add would be my favourite from your comment. Because name and default value are mandatory they could probably be the first two arguments and the rest optional? So any of these would be valid:

params.simple = 'is best'
params.add( 'foo', 1 )
params.add( 'bar', 20,
    label: 'Bar',
    usage: 'This is the bar variable which should be set',
    type: 'integer',
    render: 'range',
    min: 0, max: 100
)

msmootsgi · 2018-09-21T16:14:57Z

I should have elaborated a bit more on my suggestion. What I'm proposing is the option for users to specify a json schema that defines the necessary and allowable params needed within a nextflow pipeline. I'm imagining that this schema would live in a file (e.g. param_schema.json) that lives alongside the pipeline's main.nf file. This file would be optional, so that if didn't exist nextflow would behave as it does now. If it did exist, it would do multiple things for us:

First, it would allow a UI to be inferred from the schema. Just parse the schema and based on the type and constraints display the proper widget. This would also provide descriptions of the different params needed. I believe this would address this ticket.
To be clear, "UI" could mean a web-based UI with javascript+HTML or a command line app like nextflow itself. It would be straightforward to infer a command line parser from a schema and solve Auto-generate usage from params in pipeline #144. This might even point a way towards solving GNU command line interface #556 (maybe have a base schema that gets extended and can't be overridden?).
Second, it would allow nextflow to automatically validate the input parameters based on the schema with something like this. This could happen behind the scenes automatically if param_schema.json existed and produce nice error messages.
In the same way that you might infer a UI, you could also infer a web-service. You could use a library like connexion that builds a web API based on a swagger spec. Since swagger specifies API parameters bases on json schema, I believe you could dynamically generate an API just by referencing param_schema.json.
Finally, I believe a schema like this would help integrate with WES, something that I think would be useful for Dockstore integration and broader adoption of nextflow.

What I particularly like about this idea is that most of the pieces needed to implement this already exist AND it shouldn't impact existing pipeline execution at all. It would be a strictly optional feature that users could add on when they're ready. From an implementation perspective, I think this would be easy to implement in small pieces, i.e. first validate input against a schema, then infer a command line, and then on to the rest of it as needed/desired.

Does that make sense? Sorry for the wall of text. This ticket just happened to appear and dovetail nicely in my brain with stuff we've been doing internally (schemas, parameter validation, nextflow web API) and stuff I've been learning about related to WES and Dockstore.

sven1103 · 2018-09-22T08:08:30Z

Hey all,
a lot of great suggestions here, I`ll try to address them properly.

For the syntax, the annotation style proposed by Sven looks too verbose IMO. What about what describe here #144 (comment) ? (let's continue the discussion in this thread)

I agree, maybe this was to early to note down. Indeed, what @ewels said is right, I wanted the annotation not to interfere with the existing DSL.

@pditommaso #144 looks very consice as part of the DSL, I like it very much. However, I agree with @msmootsgi that this is harder to parse for third party applications in order to build i.e. UIs, a JSON would be much more straight forward here.

But: why can't we have the syntax as proposed in #144 and have an option to output the parameters meta-data as params.json? So one could use the meta-data inside the runtime code, but one could also ship JSON-like parameter descriptor files for third party apps with the pipeline?

ewels · 2018-09-22T08:42:33Z

But: why can't we have the syntax as proposed in #144 and have an option to output the parameters meta-data as params.json? So one could use the meta-data inside the runtime code, but one could also ship JSON-like parameter descriptor files for third party apps with the pipeline?

This would be cool, but it feels a bit like double-documentation to have it in two places. Could maybe have support for both, and the JSON file is just parsed and ends up being equivalent to the nextflow syntax?

I really like @msmootsgi's JSON suggestion, would make life a lot simpler to use this stuff if we don't have to launch nextflow to pull out the spec (eg. on web servers).

pditommaso · 2018-09-22T10:19:46Z

I think we are mixing two topics here: parameters representation and validation. Any parameters can already be specified as JSON (or even YAML) using the -params-file option.

The same mechanism would implicitly be able to support an extended DSL for params representation discussed above as per the equivalent json/object representation.

Then a schema validation could be optionally applied independently the formats how the params are provided.

ewels · 2018-09-22T10:58:49Z

Parameters and spec for parameters aren't quite the same thing though, right? That JSON / YAML file is fine for submitting the options once given, but the above is more about describing what options are required. It's more of a core file for the pipeline script, rather than something a user writes to run the pipeline.

sven1103 · 2018-09-22T11:55:47Z

That JSON / YAML file is fine for submitting the options once given, but the above is more about describing what options are required

But why can't it be both? I mean once the DSL of Nextflow is extended (' params.add(...)' ), the JSON just needs to be parsed properly with the new meta-data fields we just discussed.

I agree that the validation would be sth nice to have in addition, but not necessary for the first implementation of this feature request (though really nice ;)).

ewels · 2018-09-22T18:39:14Z

But why can't it be both?

Because one is written by the pipeline developer, and one is written by the pipeline user.. At least, that's what -params-file does if I understand it correctly.

sven1103 · 2018-09-23T12:41:05Z

Because one is written by the pipeline developer, and one is written by the pipeline user

Why? With -params-file you just give NF the variables then available in the params scope. As developer you could just provide your pipeline with this file (plus metadata), instead of declaring them in the main.nf. Of course if you do not pass it to Nextflow, you would still need to need some default values for the params though.

sven1103 · 2018-09-23T12:42:14Z

That is why I suggested some annotations in the main.nf directly, so there is only one single place to look at (as developer and user and parser)

pditommaso · 2018-09-24T09:22:13Z

The proposal to have a separate json file has the advantage to decouple the parameters definition form the NF files, therefore could be easily used by third party applications, I agree on that.

On the other hand has the disadvantage to be a separate json file about which I generally I would like to avoid.

However as Phil and Mike mentioning, this would only be require by workflow developer(s) not by the end users (tho may be frequently the same person), I think it could be worth to explore this solution.

I would like to see this in practice to better evaluate pros & cons. Above all it would be interesting how it would be modelled the json scheme for the metadata proposed above. @msmootsgi could you provide an example?

msmootsgi · 2018-09-24T22:59:05Z

Ok, I've attached a tgz file containing an example schema params_schema.json, a failing input file (example1.json), and a VERY primitive "groovy" (cough, java, cough) script that validates the input against the schema. Does this help?

pipeline_params.tar.gz

sven1103 · 2019-11-26T06:38:12Z

The approach nf-core is currently pursuing is not pipeline-specific, which makes it great for applications to scan nf-core pipeline parameters and build UIs on it, or verify parameter input for a given pipeline.

I disagree - every pipeline has a schema which is pipeline specific. It has to be specific to be useful. If it was totally generic then the tools wouldn't know what the different pipeline params were called.

I think we start to mix up things. When I talk of schema, i mean this: https://nf-co.re/parameter-schema/0.2.0dev/parameters.schema.json. And this is NOT pipeline-specific.

Every pipeline has an instance of this schema, building the parameter representation of each pipeline in JSON.

sven1103 · 2019-11-26T07:10:05Z

@mes5k I think we just have two different approaches. We have one schema at the top level of nf-core, that defines how parameter representation of nf-core pipelines have to look like. And then we have the instances in JSON for every pipeline, that describe parameter objects that are specific to the pipeline.

Your approach is to have a JSON schema for every pipeline and an additional extension with a separate file to describe form elements using JSON Forms specification.

What I like with your approach, that you have the definition of the UI elements decoupled from the core information: the parameters. This looks very clean to me.

On the other hand you have to define two JSON documents for every pipeline, whereas our approach requires only one.

ewels · 2019-11-26T08:59:53Z

Regarding the UI schema, it seems pretty basic - I'm not sure that it really contains any information not already present in the parameters schema? So I was thinking the web server could auto-generate this file pretty easily if it's required for the forms.

mes5k · 2019-11-26T17:22:07Z

The UI Schema does contain a small amount of information. You can see the options for layout and customization here: https://jsonforms.io/docs/uischema. I just use it for controlling layout (i.e. tabs instead of a loooong vertical list). I think it would be an awesome idea to auto-generate a uischema if the repo doesn't provide one.

ewels · 2019-11-26T22:50:28Z

Sounds good! After chatting to @sven1103 a bit more today he drafted a new schema for the hlatyping pipeline which is more stand-alone like your example above. I played with this a bit more and have just created a PR - see nf-core/hlatyping#69

Thoughts / feedback welcome! If everyone is happy I can roll out the changes for this to the nf-core template and command line wizard tool.

Regarding JSONForms - I'm not in a rush to play with this (and the required uischema) to be honest. The main place I can see us running a parameters builder for now is on the nf-core website, and this is built in PHP and not react. So I'm mostly likely to implement this using either a PHP library, or more probably a javascript-only library. I've just had a super fast play with a couple of js libraries (brutusin/json-forms and others) and the hlatyping schema seemed to generate a form and validate inputs perfectly as-is, without any other files or anything. Happy days! 🎉

I'm sure that tweaking will be necessary for a website deployment, but I think that can be up to the implementation and doesn't need to be bundled with the pipeline.

mes5k · 2019-11-28T17:36:47Z

That's excellent and very much illustrates the power of using plain JSON Schema! Very glad to hear that it's working for you.

ewels · 2020-03-20T13:39:55Z

I've made quite a bit of progress on this in the past few days. We now have a web UI for building / editing schema: https://nf-co.re/json_schema_build

This is designed to work with a new helper tool nf-core schema build which is now available in the dev branch of nf-core/tools and will go out in the new release (see docs).

I'm now working to remove the older parameter.settings.json stuff that we had before and build some new tools that use the schema to build parameters to launch the pipeline.

Feedback welcome!

mes5k · 2020-03-20T15:50:45Z

Very nice @ewels! Is there a way to add array types?

drpatelh · 2020-03-20T17:03:36Z

You mean like this @mes5k ?
https://github.com/drpatelh/nf-core-atacseq/blob/f579b54e6451248c97fe9c8eebf70318cbc55fa5/nextflow_schema.json#L434-L441

ewels · 2020-03-20T22:55:35Z

No, array is a JSON Schema variable type: https://json-schema.org/understanding-json-schema/reference/array.html#array

I skipped it because you can’t (to my knowledge) give arrays on the Nextflow command line for a parameter. So I hoped it was a safe-ish assumption that no-one would be using them. I made similar simplifying assumptions in other places, like not allowing nested groups.

I guess we can come back to some of these simplifications in the future if it’s a big problem..

mes5k · 2020-03-22T15:51:54Z

Ah, makes sense. My approach has to always been to pass a params file on the command line rather than individual param arguments. Our params were always complicated enough that it was much easier to deal with them in a file than on the command line.

That being said, setting params on the command line is very convenient so what you've done makes good sense. I think having relatively simple parameters for standardized pipelines like nf-core will probably make the pipelines more accessible to people and is a good goal.

ewels · 2020-03-23T08:59:49Z

Yup! And the nice thing about using JSON Schema is that it should be pretty easy to have partial support for things like this. For example, maybe not supporting it in the builder tool, but supporting it in nf-core launch and the future web parameter builder. We can of course also add more features over time.

I'm hoping that these new tools will make more and more people run nextflow using params files. In my mind it's better for reproducibility, and will hopefully be more user friendly now - as a side effect it may open up opportunities for more complex configuration. But I'm not in a rush to complicate things for now 😄

stale · 2020-08-20T09:01:47Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ewels · 2020-08-20T09:38:45Z

This is now fully implemented within @nf_core - other Nextflow workflows are welcome (and encouraged!) to use our tools and standards. Future core Nextflow support would be brilliant - we will start building Nextflow functions that use the schema for various things soon.

Thank you for the help, work and advice everyone!

Phil

ewels · 2020-08-20T12:17:47Z

ps. If anyone else fancies trying to use the NF-core schema stuff on a non-nf-core pipeline, that would be great (eg. nf-core schema build . etc). It should all work, but you never know what assumptions I missed until you actually try it..

stale · 2021-01-17T12:22:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ewels · 2021-01-17T13:27:17Z

I think that this issue can be closed now. Standard is set and in use within nf-core and starting to be used elsewhere.

Core support by Nextflow would be amazing ✨ but probably deserves its own issue. @pditommaso - let me know if this is something that you'd ever consider and I'd be happy to write something up. We need to write up the spec somewhere anyway so this would be a good kick up the bum 😅

pditommaso · 2021-01-18T10:22:48Z

We are working these days with the schema and evaluating impact related to NF and Tower. I agree to close this and open a new one if we decide to add the support for it in the NF core (without - 😅 )

pditommaso added kind/feature priority labels Sep 20, 2018

ewels mentioned this issue Nov 26, 2019

Stand alone parameters schema nf-core/hlatyping#69

Closed

This was referenced Mar 2, 2020

Refactor nf-core launch for new schema nf-core/tools#574

Closed

Update template with new JSON Schema format nf-core/tools#576

Closed

Remove unused JSON meta-Schema nf-core/website#321

Closed

combiz mentioned this issue Mar 5, 2020

Make params-builder that uses JSON Schema nf-core/website#322

Closed

mes5k mentioned this issue Apr 27, 2020

Auto-generate usage from params in pipeline #144

Closed

stale bot added the stale label Aug 20, 2020

stale bot removed the stale label Aug 20, 2020

stale bot added the stale label Jan 17, 2021

stale bot removed the stale label Jan 17, 2021

pditommaso closed this as completed Jan 18, 2021

drpatelh mentioned this issue Nov 8, 2021

Native NF validation via JSON parameter schema #2434

Closed

Nextflow parameter description scheme #866

Nextflow parameter description scheme #866

Comments

sven1103 commented Sep 19, 2018

TL;DR

Details

pditommaso commented Sep 19, 2018

ewels commented Sep 19, 2018

sven1103 commented Sep 19, 2018 • edited Loading

sven1103 commented Sep 19, 2018

ewels commented Sep 19, 2018

sven1103 commented Sep 19, 2018

sven1103 commented Sep 19, 2018

sven1103 commented Sep 19, 2018

pditommaso commented Sep 20, 2018 • edited Loading

pditommaso commented Sep 20, 2018

pditommaso commented Sep 20, 2018

sven1103 commented Sep 20, 2018

ewels commented Sep 20, 2018

msmootsgi commented Sep 20, 2018

pditommaso commented Sep 21, 2018

pditommaso commented Sep 21, 2018 • edited Loading

ewels commented Sep 21, 2018

msmootsgi commented Sep 21, 2018

sven1103 commented Sep 22, 2018 • edited Loading

ewels commented Sep 22, 2018

pditommaso commented Sep 22, 2018

ewels commented Sep 22, 2018

sven1103 commented Sep 22, 2018

ewels commented Sep 22, 2018

sven1103 commented Sep 23, 2018

sven1103 commented Sep 23, 2018

pditommaso commented Sep 24, 2018

msmootsgi commented Sep 24, 2018

sven1103 commented Nov 26, 2019

sven1103 commented Nov 26, 2019

ewels commented Nov 26, 2019

mes5k commented Nov 26, 2019

ewels commented Nov 26, 2019

mes5k commented Nov 28, 2019

ewels commented Mar 20, 2020

mes5k commented Mar 20, 2020

drpatelh commented Mar 20, 2020

ewels commented Mar 20, 2020

mes5k commented Mar 22, 2020

ewels commented Mar 23, 2020

stale bot commented Aug 20, 2020

ewels commented Aug 20, 2020

ewels commented Aug 20, 2020

stale bot commented Jan 17, 2021

ewels commented Jan 17, 2021

pditommaso commented Jan 18, 2021

sven1103 commented Sep 19, 2018 •

edited

Loading

pditommaso commented Sep 20, 2018 •

edited

Loading

pditommaso commented Sep 21, 2018 •

edited

Loading

sven1103 commented Sep 22, 2018 •

edited

Loading