Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nextflow parameter description scheme #866

Closed
sven1103 opened this issue Sep 19, 2018 · 80 comments
Closed

Nextflow parameter description scheme #866

sven1103 opened this issue Sep 19, 2018 · 80 comments
Labels

Comments

@sven1103
Copy link
Contributor

TL;DR

A naming scheme to enable meta-data annotation for workflow parameters.

Details

Usually, workflow-specific execution parameters for the single processes are defined in the params scope, a DSL-feature Nextflow provides to access parameter variables from the workflow script during wf execution.

Currently, there is no naming scheme / convention / language feature for annotating parameters with description text, mandatory/optional flags or similar.

This could be useful though for upstream applications in order to build graphical user interfaces and configure a workflow correctly before execution in a dynamic, user-friendly way.

I am very happy for any input here and design suggestions :)

Best, Sven

@pditommaso
Copy link
Member

What kind of scheme/metadata would be useful for your use case ?

@ewels
Copy link
Member

ewels commented Sep 19, 2018

Suggestions:

  • Name (allow spaces, stuff that shouldn't be in the variable name)
  • Description
  • Type
    • bool
    • numeric
      • max
      • min
    • str
      • max length

I think we spoke about this previously and pointed to some HTML spec or something, but that's probably overkill for now.

Phil

@sven1103
Copy link
Contributor Author

sven1103 commented Sep 19, 2018

  • default value

@sven1103
Copy link
Contributor Author

I think mandatory/optional is not necessary, as on definition of a parameter, a default value should always be present.

@ewels
Copy link
Member

ewels commented Sep 19, 2018

Yes, and we already have default values?

@sven1103
Copy link
Contributor Author

implicit in the params definition yes, but not in the descriptions section

@sven1103
Copy link
Contributor Author

I mean, a parser could just check the following parameter definition after a description section, maybe this is sufficient?

@sven1103
Copy link
Contributor Author

For simplicity, i would suggest to have normal multi-line comment strings as in Groovy and extend it by tags, such as:

/**
The number of iterations of the algorithm

@name: Algo iterations
@type:  int
@range: {0,10(,1)} // including step size for range?
*/
params.iterations = 10

/**
Sequence type

@name: Nucleotide sequence type
@type: enum
@range: ["dna", "rna"]
*/
params.type = "dna"

maybe sth. in that direction?

@pditommaso
Copy link
Member

pditommaso commented Sep 20, 2018

Let's focus on metadata fields before, I see the following:

  • name: (*) the parameter name, it should a valid identifier (no blanks, etc)
  • label: human readable name
  • help/usage: param description showed in the help text
  • type: (*) string, text (multiline string) integer, decimal, file (path), url, upload, choice, range
  • min: (value for numeric, length for string)
  • max: (value for numeric, length for string)
  • defvalue: (*) default value
  • pattern: regex validation rule

@pditommaso
Copy link
Member

I'm adding @ypriverol in this thread because he's interested in this topic.

@pditommaso
Copy link
Member

Related to #88, #144.

@sven1103
Copy link
Contributor Author

👍 looks good to me

@ewels
Copy link
Member

ewels commented Sep 20, 2018

type: (*) string, text (multiline string) integer, decimal, file (path), url, upload, choice, range

Maybe distinguish between a choice that can have multiple selected or just one? eg. radio and checkbox in HTML forms.

Love the idea of having a generic regex validation available 👍

@msmootsgi
Copy link
Contributor

It sounds like you want to define your parameters with json schema or something similar. That would open up a world of possibilities.

@pditommaso
Copy link
Member

Maybe distinguish between a choice that can have multiple selected or just one? eg. radio and checkbox in HTML forms.

Yes, tho thinking more I'm realising we should distinguish between data type and data visualisation. The data type could be: string, integer, decimal, boolean, mem unit. Then we can have an attribute to define how the field is rendered and another for validation. To recap:

  • name: (*) the parameter name, it should a valid identifier (no blanks, etc)
  • label: human readable name
  • usage: param description showed in the help text
  • type: (*) string, integer, decimal, boolean, mem unit
  • render: file (local path), url, upload, range, check-box, radio-button, list-box, drop-down
  • choices: possible values for check-box, radio-button, list-box, drop-down
  • defvalue: (*) default value
  • pattern: regex validation rule
  • arity: 1 or many

It sounds like you want to define your parameters with json schema or something similar. That would open up a world of possibilities.

That may be useful!

@pditommaso
Copy link
Member

pditommaso commented Sep 21, 2018

For the syntax, the annotation style proposed by Sven looks too verbose IMO. What about what describe here #144 (comment) ? (let's continue the discussion in this thread)

@ewels
Copy link
Member

ewels commented Sep 21, 2018

I think @sven1103's suggestion was just a quick one that wouldn't affect nextflow execution so that we could get started. But yes, it would be good to have a proper syntax.

Personally, I would really like to keep the current syntax as it is, both for backwards compatability and also because it's so simple and intuitive. Then have an alternative method for adding to params where you can add in additional fields if you want to. params.add would be my favourite from your comment. Because name and default value are mandatory they could probably be the first two arguments and the rest optional? So any of these would be valid:

params.simple = 'is best'
params.add( 'foo', 1 )
params.add( 'bar', 20,
    label: 'Bar',
    usage: 'This is the bar variable which should be set',
    type: 'integer',
    render: 'range',
    min: 0, max: 100
)    

@msmootsgi
Copy link
Contributor

I should have elaborated a bit more on my suggestion. What I'm proposing is the option for users to specify a json schema that defines the necessary and allowable params needed within a nextflow pipeline. I'm imagining that this schema would live in a file (e.g. param_schema.json) that lives alongside the pipeline's main.nf file. This file would be optional, so that if didn't exist nextflow would behave as it does now. If it did exist, it would do multiple things for us:

  • First, it would allow a UI to be inferred from the schema. Just parse the schema and based on the type and constraints display the proper widget. This would also provide descriptions of the different params needed. I believe this would address this ticket.
  • To be clear, "UI" could mean a web-based UI with javascript+HTML or a command line app like nextflow itself. It would be straightforward to infer a command line parser from a schema and solve Auto-generate usage from params in pipeline #144. This might even point a way towards solving GNU command line interface #556 (maybe have a base schema that gets extended and can't be overridden?).
  • Second, it would allow nextflow to automatically validate the input parameters based on the schema with something like this. This could happen behind the scenes automatically if param_schema.json existed and produce nice error messages.
  • In the same way that you might infer a UI, you could also infer a web-service. You could use a library like connexion that builds a web API based on a swagger spec. Since swagger specifies API parameters bases on json schema, I believe you could dynamically generate an API just by referencing param_schema.json.
  • Finally, I believe a schema like this would help integrate with WES, something that I think would be useful for Dockstore integration and broader adoption of nextflow.

What I particularly like about this idea is that most of the pieces needed to implement this already exist AND it shouldn't impact existing pipeline execution at all. It would be a strictly optional feature that users could add on when they're ready. From an implementation perspective, I think this would be easy to implement in small pieces, i.e. first validate input against a schema, then infer a command line, and then on to the rest of it as needed/desired.

Does that make sense? Sorry for the wall of text. This ticket just happened to appear and dovetail nicely in my brain with stuff we've been doing internally (schemas, parameter validation, nextflow web API) and stuff I've been learning about related to WES and Dockstore.

@sven1103
Copy link
Contributor Author

sven1103 commented Sep 22, 2018

Hey all,
a lot of great suggestions here, I`ll try to address them properly.

For the syntax, the annotation style proposed by Sven looks too verbose IMO. What about what describe here #144 (comment) ? (let's continue the discussion in this thread)

I agree, maybe this was to early to note down. Indeed, what @ewels said is right, I wanted the annotation not to interfere with the existing DSL.

@pditommaso #144 looks very consice as part of the DSL, I like it very much. However, I agree with @msmootsgi that this is harder to parse for third party applications in order to build i.e. UIs, a JSON would be much more straight forward here.

But: why can't we have the syntax as proposed in #144 and have an option to output the parameters meta-data as params.json? So one could use the meta-data inside the runtime code, but one could also ship JSON-like parameter descriptor files for third party apps with the pipeline?

@ewels
Copy link
Member

ewels commented Sep 22, 2018

But: why can't we have the syntax as proposed in #144 and have an option to output the parameters meta-data as params.json? So one could use the meta-data inside the runtime code, but one could also ship JSON-like parameter descriptor files for third party apps with the pipeline?

This would be cool, but it feels a bit like double-documentation to have it in two places. Could maybe have support for both, and the JSON file is just parsed and ends up being equivalent to the nextflow syntax?

I really like @msmootsgi's JSON suggestion, would make life a lot simpler to use this stuff if we don't have to launch nextflow to pull out the spec (eg. on web servers).

@pditommaso
Copy link
Member

I think we are mixing two topics here: parameters representation and validation. Any parameters can already be specified as JSON (or even YAML) using the -params-file option.

The same mechanism would implicitly be able to support an extended DSL for params representation discussed above as per the equivalent json/object representation.

Then a schema validation could be optionally applied independently the formats how the params are provided.

@ewels
Copy link
Member

ewels commented Sep 22, 2018

Parameters and spec for parameters aren't quite the same thing though, right? That JSON / YAML file is fine for submitting the options once given, but the above is more about describing what options are required. It's more of a core file for the pipeline script, rather than something a user writes to run the pipeline.

@sven1103
Copy link
Contributor Author

That JSON / YAML file is fine for submitting the options once given, but the above is more about describing what options are required

But why can't it be both? I mean once the DSL of Nextflow is extended (' params.add(...)' ), the JSON just needs to be parsed properly with the new meta-data fields we just discussed.

I agree that the validation would be sth nice to have in addition, but not necessary for the first implementation of this feature request (though really nice ;)).

@ewels
Copy link
Member

ewels commented Sep 22, 2018

But why can't it be both?

Because one is written by the pipeline developer, and one is written by the pipeline user.. At least, that's what -params-file does if I understand it correctly.

@sven1103
Copy link
Contributor Author

Because one is written by the pipeline developer, and one is written by the pipeline user

Why? With -params-file you just give NF the variables then available in the params scope. As developer you could just provide your pipeline with this file (plus metadata), instead of declaring them in the main.nf. Of course if you do not pass it to Nextflow, you would still need to need some default values for the params though.

@sven1103
Copy link
Contributor Author

That is why I suggested some annotations in the main.nf directly, so there is only one single place to look at (as developer and user and parser)

@pditommaso
Copy link
Member

The proposal to have a separate json file has the advantage to decouple the parameters definition form the NF files, therefore could be easily used by third party applications, I agree on that.

On the other hand has the disadvantage to be a separate json file about which I generally I would like to avoid.

However as Phil and Mike mentioning, this would only be require by workflow developer(s) not by the end users (tho may be frequently the same person), I think it could be worth to explore this solution.

I would like to see this in practice to better evaluate pros & cons. Above all it would be interesting how it would be modelled the json scheme for the metadata proposed above. @msmootsgi could you provide an example?

@msmootsgi
Copy link
Contributor

Ok, I've attached a tgz file containing an example schema params_schema.json, a failing input file (example1.json), and a VERY primitive "groovy" (cough, java, cough) script that validates the input against the schema. Does this help?

pipeline_params.tar.gz

@sven1103
Copy link
Contributor Author

The approach nf-core is currently pursuing is not pipeline-specific, which makes it great for applications to scan nf-core pipeline parameters and build UIs on it, or verify parameter input for a given pipeline.

I disagree - every pipeline has a schema which is pipeline specific. It has to be specific to be useful. If it was totally generic then the tools wouldn't know what the different pipeline params were called.

I think we start to mix up things. When I talk of schema, i mean this: https://nf-co.re/parameter-schema/0.2.0dev/parameters.schema.json. And this is NOT pipeline-specific.

Every pipeline has an instance of this schema, building the parameter representation of each pipeline in JSON.

@sven1103
Copy link
Contributor Author

@mes5k I think we just have two different approaches. We have one schema at the top level of nf-core, that defines how parameter representation of nf-core pipelines have to look like. And then we have the instances in JSON for every pipeline, that describe parameter objects that are specific to the pipeline.

Your approach is to have a JSON schema for every pipeline and an additional extension with a separate file to describe form elements using JSON Forms specification.

What I like with your approach, that you have the definition of the UI elements decoupled from the core information: the parameters. This looks very clean to me.

On the other hand you have to define two JSON documents for every pipeline, whereas our approach requires only one.

@ewels
Copy link
Member

ewels commented Nov 26, 2019

Regarding the UI schema, it seems pretty basic - I'm not sure that it really contains any information not already present in the parameters schema? So I was thinking the web server could auto-generate this file pretty easily if it's required for the forms.

@mes5k
Copy link
Contributor

mes5k commented Nov 26, 2019

The UI Schema does contain a small amount of information. You can see the options for layout and customization here: https://jsonforms.io/docs/uischema. I just use it for controlling layout (i.e. tabs instead of a loooong vertical list). I think it would be an awesome idea to auto-generate a uischema if the repo doesn't provide one.

@ewels
Copy link
Member

ewels commented Nov 26, 2019

Sounds good! After chatting to @sven1103 a bit more today he drafted a new schema for the hlatyping pipeline which is more stand-alone like your example above. I played with this a bit more and have just created a PR - see nf-core/hlatyping#69

Thoughts / feedback welcome! If everyone is happy I can roll out the changes for this to the nf-core template and command line wizard tool.


Regarding JSONForms - I'm not in a rush to play with this (and the required uischema) to be honest. The main place I can see us running a parameters builder for now is on the nf-core website, and this is built in PHP and not react. So I'm mostly likely to implement this using either a PHP library, or more probably a javascript-only library. I've just had a super fast play with a couple of js libraries (brutusin/json-forms and others) and the hlatyping schema seemed to generate a form and validate inputs perfectly as-is, without any other files or anything. Happy days! 🎉

image

I'm sure that tweaking will be necessary for a website deployment, but I think that can be up to the implementation and doesn't need to be bundled with the pipeline.

@mes5k
Copy link
Contributor

mes5k commented Nov 28, 2019

That's excellent and very much illustrates the power of using plain JSON Schema! Very glad to hear that it's working for you.

@ewels
Copy link
Member

ewels commented Mar 20, 2020

I've made quite a bit of progress on this in the past few days. We now have a web UI for building / editing schema: https://nf-co.re/json_schema_build

This is designed to work with a new helper tool nf-core schema build which is now available in the dev branch of nf-core/tools and will go out in the new release (see docs).

I'm now working to remove the older parameter.settings.json stuff that we had before and build some new tools that use the schema to build parameters to launch the pipeline.

Feedback welcome!

@mes5k
Copy link
Contributor

mes5k commented Mar 20, 2020

Very nice @ewels! Is there a way to add array types?

@ewels
Copy link
Member

ewels commented Mar 20, 2020

No, array is a JSON Schema variable type: https://json-schema.org/understanding-json-schema/reference/array.html#array

I skipped it because you can’t (to my knowledge) give arrays on the Nextflow command line for a parameter. So I hoped it was a safe-ish assumption that no-one would be using them. I made similar simplifying assumptions in other places, like not allowing nested groups.

I guess we can come back to some of these simplifications in the future if it’s a big problem..

@mes5k
Copy link
Contributor

mes5k commented Mar 22, 2020

Ah, makes sense. My approach has to always been to pass a params file on the command line rather than individual param arguments. Our params were always complicated enough that it was much easier to deal with them in a file than on the command line.

That being said, setting params on the command line is very convenient so what you've done makes good sense. I think having relatively simple parameters for standardized pipelines like nf-core will probably make the pipelines more accessible to people and is a good goal.

@ewels
Copy link
Member

ewels commented Mar 23, 2020

Yup! And the nice thing about using JSON Schema is that it should be pretty easy to have partial support for things like this. For example, maybe not supporting it in the builder tool, but supporting it in nf-core launch and the future web parameter builder. We can of course also add more features over time.

I'm hoping that these new tools will make more and more people run nextflow using params files. In my mind it's better for reproducibility, and will hopefully be more user friendly now - as a side effect it may open up opportunities for more complex configuration. But I'm not in a rush to complicate things for now 😄

@stale
Copy link

stale bot commented Aug 20, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 20, 2020
@ewels
Copy link
Member

ewels commented Aug 20, 2020

This is now fully implemented within @nf_core - other Nextflow workflows are welcome (and encouraged!) to use our tools and standards. Future core Nextflow support would be brilliant - we will start building Nextflow functions that use the schema for various things soon.

Thank you for the help, work and advice everyone!

Phil

@stale stale bot removed the stale label Aug 20, 2020
@ewels
Copy link
Member

ewels commented Aug 20, 2020

ps. If anyone else fancies trying to use the NF-core schema stuff on a non-nf-core pipeline, that would be great (eg. nf-core schema build . etc). It should all work, but you never know what assumptions I missed until you actually try it..

@stale
Copy link

stale bot commented Jan 17, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 17, 2021
@ewels
Copy link
Member

ewels commented Jan 17, 2021

I think that this issue can be closed now. Standard is set and in use within nf-core and starting to be used elsewhere.

Core support by Nextflow would be amazing ✨ but probably deserves its own issue. @pditommaso - let me know if this is something that you'd ever consider and I'd be happy to write something up. We need to write up the spec somewhere anyway so this would be a good kick up the bum 😅

@stale stale bot removed the stale label Jan 17, 2021
@pditommaso
Copy link
Member

We are working these days with the schema and evaluating impact related to NF and Tower. I agree to close this and open a new one if we decide to add the support for it in the NF core (without - 😅 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants