-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nextflow parameter description scheme #866
Comments
What kind of scheme/metadata would be useful for your use case ? |
Suggestions:
I think we spoke about this previously and pointed to some HTML spec or something, but that's probably overkill for now. Phil |
|
I think mandatory/optional is not necessary, as on definition of a parameter, a default value should always be present. |
Yes, and we already have default values? |
implicit in the params definition yes, but not in the descriptions section |
I mean, a parser could just check the following parameter definition after a description section, maybe this is sufficient? |
For simplicity, i would suggest to have normal multi-line comment strings as in Groovy and extend it by tags, such as: /**
The number of iterations of the algorithm
@name: Algo iterations
@type: int
@range: {0,10(,1)} // including step size for range?
*/
params.iterations = 10
/**
Sequence type
@name: Nucleotide sequence type
@type: enum
@range: ["dna", "rna"]
*/
params.type = "dna"
maybe sth. in that direction? |
Let's focus on metadata fields before, I see the following:
|
I'm adding @ypriverol in this thread because he's interested in this topic. |
👍 looks good to me |
Maybe distinguish between a choice that can have multiple selected or just one? eg. Love the idea of having a generic regex validation available 👍 |
It sounds like you want to define your parameters with json schema or something similar. That would open up a world of possibilities. |
Yes, tho thinking more I'm realising we should distinguish between data type and data visualisation. The data type could be: string, integer, decimal, boolean, mem unit. Then we can have an attribute to define how the field is rendered and another for validation. To recap:
That may be useful! |
For the syntax, the annotation style proposed by Sven looks too verbose IMO. What about what describe here #144 (comment) ? (let's continue the discussion in this thread) |
I think @sven1103's suggestion was just a quick one that wouldn't affect nextflow execution so that we could get started. But yes, it would be good to have a proper syntax. Personally, I would really like to keep the current syntax as it is, both for backwards compatability and also because it's so simple and intuitive. Then have an alternative method for adding to params.simple = 'is best'
params.add( 'foo', 1 )
params.add( 'bar', 20,
label: 'Bar',
usage: 'This is the bar variable which should be set',
type: 'integer',
render: 'range',
min: 0, max: 100
) |
I should have elaborated a bit more on my suggestion. What I'm proposing is the option for users to specify a json schema that defines the necessary and allowable params needed within a nextflow pipeline. I'm imagining that this schema would live in a file (e.g.
What I particularly like about this idea is that most of the pieces needed to implement this already exist AND it shouldn't impact existing pipeline execution at all. It would be a strictly optional feature that users could add on when they're ready. From an implementation perspective, I think this would be easy to implement in small pieces, i.e. first validate input against a schema, then infer a command line, and then on to the rest of it as needed/desired. Does that make sense? Sorry for the wall of text. This ticket just happened to appear and dovetail nicely in my brain with stuff we've been doing internally (schemas, parameter validation, nextflow web API) and stuff I've been learning about related to WES and Dockstore. |
Hey all,
I agree, maybe this was to early to note down. Indeed, what @ewels said is right, I wanted the annotation not to interfere with the existing DSL. @pditommaso #144 looks very consice as part of the DSL, I like it very much. However, I agree with @msmootsgi that this is harder to parse for third party applications in order to build i.e. UIs, a JSON would be much more straight forward here. But: why can't we have the syntax as proposed in #144 and have an option to output the parameters meta-data as |
This would be cool, but it feels a bit like double-documentation to have it in two places. Could maybe have support for both, and the JSON file is just parsed and ends up being equivalent to the nextflow syntax? I really like @msmootsgi's JSON suggestion, would make life a lot simpler to use this stuff if we don't have to launch nextflow to pull out the spec (eg. on web servers). |
I think we are mixing two topics here: parameters representation and validation. Any parameters can already be specified as JSON (or even YAML) using the The same mechanism would implicitly be able to support an extended DSL for params representation discussed above as per the equivalent json/object representation. Then a schema validation could be optionally applied independently the formats how the params are provided. |
Parameters and spec for parameters aren't quite the same thing though, right? That JSON / YAML file is fine for submitting the options once given, but the above is more about describing what options are required. It's more of a core file for the pipeline script, rather than something a user writes to run the pipeline. |
But why can't it be both? I mean once the DSL of Nextflow is extended (' params.add(...)' ), the JSON just needs to be parsed properly with the new meta-data fields we just discussed. I agree that the validation would be sth nice to have in addition, but not necessary for the first implementation of this feature request (though really nice ;)). |
Because one is written by the pipeline developer, and one is written by the pipeline user.. At least, that's what |
Why? With |
That is why I suggested some annotations in the |
The proposal to have a separate json file has the advantage to decouple the parameters definition form the NF files, therefore could be easily used by third party applications, I agree on that. On the other hand has the disadvantage to be a separate json file about which I generally I would like to avoid. However as Phil and Mike mentioning, this would only be require by workflow developer(s) not by the end users (tho may be frequently the same person), I think it could be worth to explore this solution. I would like to see this in practice to better evaluate pros & cons. Above all it would be interesting how it would be modelled the json scheme for the metadata proposed above. @msmootsgi could you provide an example? |
Ok, I've attached a tgz file containing an example schema |
I think we start to mix up things. When I talk of schema, i mean this: https://nf-co.re/parameter-schema/0.2.0dev/parameters.schema.json. And this is NOT pipeline-specific. Every pipeline has an instance of this schema, building the parameter representation of each pipeline in JSON. |
@mes5k I think we just have two different approaches. We have one schema at the top level of nf-core, that defines how parameter representation of nf-core pipelines have to look like. And then we have the instances in JSON for every pipeline, that describe parameter objects that are specific to the pipeline. Your approach is to have a JSON schema for every pipeline and an additional extension with a separate file to describe form elements using JSON Forms specification. What I like with your approach, that you have the definition of the UI elements decoupled from the core information: the parameters. This looks very clean to me. On the other hand you have to define two JSON documents for every pipeline, whereas our approach requires only one. |
Regarding the UI schema, it seems pretty basic - I'm not sure that it really contains any information not already present in the parameters schema? So I was thinking the web server could auto-generate this file pretty easily if it's required for the forms. |
The UI Schema does contain a small amount of information. You can see the options for layout and customization here: https://jsonforms.io/docs/uischema. I just use it for controlling layout (i.e. tabs instead of a loooong vertical list). I think it would be an awesome idea to auto-generate a uischema if the repo doesn't provide one. |
Sounds good! After chatting to @sven1103 a bit more today he drafted a new schema for the hlatyping pipeline which is more stand-alone like your example above. I played with this a bit more and have just created a PR - see nf-core/hlatyping#69 Thoughts / feedback welcome! If everyone is happy I can roll out the changes for this to the nf-core template and command line wizard tool. Regarding JSONForms - I'm not in a rush to play with this (and the required uischema) to be honest. The main place I can see us running a parameters builder for now is on the nf-core website, and this is built in PHP and not react. So I'm mostly likely to implement this using either a PHP library, or more probably a javascript-only library. I've just had a super fast play with a couple of js libraries (brutusin/json-forms and others) and the hlatyping schema seemed to generate a form and validate inputs perfectly as-is, without any other files or anything. Happy days! 🎉 I'm sure that tweaking will be necessary for a website deployment, but I think that can be up to the implementation and doesn't need to be bundled with the pipeline. |
That's excellent and very much illustrates the power of using plain JSON Schema! Very glad to hear that it's working for you. |
I've made quite a bit of progress on this in the past few days. We now have a web UI for building / editing schema: https://nf-co.re/json_schema_build This is designed to work with a new helper tool I'm now working to remove the older Feedback welcome! |
Very nice @ewels! Is there a way to add array types? |
No, I skipped it because you can’t (to my knowledge) give arrays on the Nextflow command line for a parameter. So I hoped it was a safe-ish assumption that no-one would be using them. I made similar simplifying assumptions in other places, like not allowing nested groups. I guess we can come back to some of these simplifications in the future if it’s a big problem.. |
Ah, makes sense. My approach has to always been to pass a params file on the command line rather than individual param arguments. Our params were always complicated enough that it was much easier to deal with them in a file than on the command line. That being said, setting params on the command line is very convenient so what you've done makes good sense. I think having relatively simple parameters for standardized pipelines like nf-core will probably make the pipelines more accessible to people and is a good goal. |
Yup! And the nice thing about using JSON Schema is that it should be pretty easy to have partial support for things like this. For example, maybe not supporting it in the builder tool, but supporting it in I'm hoping that these new tools will make more and more people run nextflow using params files. In my mind it's better for reproducibility, and will hopefully be more user friendly now - as a side effect it may open up opportunities for more complex configuration. But I'm not in a rush to complicate things for now 😄 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is now fully implemented within @nf_core - other Nextflow workflows are welcome (and encouraged!) to use our tools and standards. Future core Nextflow support would be brilliant - we will start building Nextflow functions that use the schema for various things soon. Thank you for the help, work and advice everyone! Phil |
ps. If anyone else fancies trying to use the NF-core schema stuff on a non-nf-core pipeline, that would be great (eg. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I think that this issue can be closed now. Standard is set and in use within nf-core and starting to be used elsewhere. Core support by Nextflow would be amazing ✨ but probably deserves its own issue. @pditommaso - let me know if this is something that you'd ever consider and I'd be happy to write something up. We need to write up the spec somewhere anyway so this would be a good kick up the bum 😅 |
We are working these days with the schema and evaluating impact related to NF and Tower. I agree to close this and open a new one if we decide to add the support for it in the NF core (without |
TL;DR
A naming scheme to enable meta-data annotation for workflow parameters.
Details
Usually, workflow-specific execution parameters for the single processes are defined in the
params
scope, a DSL-feature Nextflow provides to access parameter variables from the workflow script during wf execution.Currently, there is no naming scheme / convention / language feature for annotating parameters with description text, mandatory/optional flags or similar.
This could be useful though for upstream applications in order to build graphical user interfaces and configure a workflow correctly before execution in a dynamic, user-friendly way.
I am very happy for any input here and design suggestions :)
Best, Sven
The text was updated successfully, but these errors were encountered: