-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax enhancement aka DLS-2 #984
Comments
Fantastic stuff, Paolo! I've tried it out and played with having set-based inputs and outputs and it works nicely so far. I also note that this will make unit testing individual process far easier! My opinions on the points you raise:
|
Hello! Tried this new feature and looks amazing, thank you ! Coming to your points:
where the task specific parameters can be defined at execution time, similarly to what could be done with
|
Great stuff indeed! In regards to point 3: I also think that namespacing will be invaluable. I really like python's semantics in this regard ( |
Conversion from a monolithic script to a slim main.nf with imported processes is perfect!! Barriers were minimal. I would second not having access to params without passing them explicitly, but I would need some way of accessing them since many of my processes have a conditional that executes a different variant of a script depending on a param. If it were possible to use |
First, this looks awesome. I'm working with a few people to build some pretty complex NF stuff and this type of thing should make our lives much much easier. 🎉 As for RFC:
And then when we use it:
Or just update config values individually:
|
Thanks a lot for all your comments! I've uploaded another snapshot introducing some refinements and suggestions you provided:
Main changes:
Then in
These are the main points. In the next iteration I will try to extend the module concept to allow the definition also of custom functions that can be imported both the in the config and script context. |
Thanks for the update @pditommaso. To clarify on injection of modules. If you wanted to inject params that has been passed as arguments to the nextflow run command would you do something like below to have default values that could be overridden by args on the nextflow run command line and then passed on to the module?
|
Yes, exactly like that, you can even do
Tho both ways are the only thing that I don't like in this approach. |
Of course you release this feature after I can't use I think this feature looks great. Reading through this it seems like this only lets you separate and reuse the definition of single processes, but it doesn't have a way of collecting or aggregating multiple processes into single entity (like a subworkflow). Is that right? Have you given any thought to that or is that still future work? Regardless, I think this is awesome and I'll continue to wish I was using |
@mes5k Ah-ah, you have to back to NF !
This approach is extremely flexible and the idea is to use a similar mechanism also for sub-workflows. |
Awesome! So happy to hear that you're working on this. Will definitely make the job of selling nextflow internally easier! |
Uploaded However still not happy, I'll try experimenting with the ability to define subworkflows. |
@pditommaso does this feature relate to #238 and also #777, #844? I guess, yes.
It makes sense to allow to run a target process or module of very large script separately like a portion of work. Just look the definition of targeting for Terraform tool. It makes possible to uniquely refer to each module or any resource or data source within any provider context by full qualified item name. So, examples of CLI for NF can be written as:
Besides introducing the modules feature to extract common code to a separate file I hope it will lead to implementation of the described above features because they are useful and desired. |
#238 yes, the others are out of the scope of this enhancement. |
@pditommaso let's assume that the feature is done and can be released as an experimental. |
That's the plan. |
It is worth to add a version designation to the nf script to help end user identify version and produce clear error descriptions. For example:
where M is stands for milestone. |
Ok, just upload
Then invoke it as a function ie.
The output of the last invoked process ( In the main script it can be defined an anonymous workflow that's supposed to be the application entry-point and therefore it's implicitly executed e.g.
Bonus (big one): within a workflow scope the same channel can be used as input in different processes (finally!) |
Hi @pditommaso I've started experimenting and I'm having a hard time getting something working. I'm getting this error:
With this code: https://github.com/mes5k/new_school_nf Can you point me in the right direction? |
The processes can only be defined in the module script (to keep compatibility with existing code). In the main there must be a
or
|
Awesome, thanks! My first example is now working. My next experiment was to see if I could import an entire workflow. I can't tell from your comments whether that's something that's supported or whether I've just got a mistake in my code. |
Is it possible to assign module process outputs to a variable so that you can do something like modules.nf
main.nf
|
Yes, but it's not necessary. The process can be accessed as a variable to retried the the output value ie
|
You can define the workflow logic as a sub-workflow, then invoke it ie.
|
OK cool thanks. Also you mentioned that you can reuse a channel. Can you therefore do
|
@rspreafico-vir You need to clone and to build the master branch or use the |
This works great with |
Is there any current plan for when this might be officially released? |
So far this https://www.nextflow.io/docs/edge/dsl2.html |
@pditommaso Is this available on the 19.04.1 release? |
Nope, kindly see a few comments up. Requires |
Thanks @rspreafico-vir. I am on the point of submitting something to nf-core and would dearly love it to be using DSL-2! |
|
@aunderwo Finally! I remember talking to you about this at the NF conference last year 😄 Really looking forward to this functionality being added to |
Should we re-open this issue ? Very excited about the new release! |
@arnaudbore just blogged about that https://www.nextflow.io/blog/2019/one-more-step-towards-modules.html |
Thank you Paolo! This makes our research a lot easier! |
This commit implements a major enhancenent for Nextflow DLS that provides support for: - module libraries and processes inclusion - ability to use an outout channel multiple times as input - implicit process output variable - pipe style process and operator compositon
This commit implements a major enhancenent for Nextflow DLS that provides support for: - module libraries and processes inclusion - ability to use an outout channel multiple times as input - implicit process output variable - pipe style process and operator compositon Signed-off-by: Ivkovic <[email protected]>
Hi @pditommaso , just a question. Do you think that the |
Just a +1 that I've been hunting through the documentation for this feature. |
The
Regarding the |
Hi - couple of questions. Can sub-workflows have outputs? Can we specify which channels are output specifically? If not, is it a planned feature? Perhaps supporting a similar syntax to process (e.g. input & output declaration) would be useful? |
Thanks for your suggestion @pditommaso , the |
@rspreafico-vir nice to read that. @JonathanCSmith actually that was the first implementation I've tried, but I was not convinced because the semantic is different. I'll open a separate issue to discuss it. |
Isn't the channel output by the last process in a sub-workflow the implicit output of that sub-workflow? @rspreafico-vir the problem I had in the past with |
@mes5k You are right about
I have not tried the behavior with local execution, just AWS Batch with S3. The other catch is that For now I am fixing the two catches above by first checking that the output folder does not exist, and then by creating it myself before calling The other way around it would be to create a process that takes as an input all the pipeline outputs (generated from different processes), with the intention of consolidating all files in a single subfolder of the work folder. However, this does not work in S3 as symlinks to input files are only maintained in the EC2 instance, but not in S3. I have not tried specifying that inputs should be copied rather than symlinked by this process with |
I ended up 'faking' a process that would ensure my desired channel is the last output of a sub-workflow which should work as a stopgap for now. I have, however, encountered another issue (which may just be a result of my becoming accustomed to DSL2). I have created a basic process with a file as an input.
I have called the process successfully with a channel containing a list of files using the syntax:
whereas:
did not work. In addition, I attempted to reuse the "customProcess" but I received the following error:
Pseudocode for the subworkflow is as follows:
Is this expected? For now I can overcome by duplicating the process but personally I would expect processes to be re-useable? |
Hello, Paolo, I have a question.
If that's so, how do you rewrite Channel.choice() example code shown in the reference document?
|
@pachiras interesting point, please report as a separate issue and let's continue the discussion there. |
I'm locking this thread. Please open a new issue for DSL2 problem or general discussion. Thanks! |
This is a request for comments for the implementation of modules feature for Nextflow.
This feature allows the definition of NF processes in the main script or a separate library file, that can be invoked, one or multiple times, as any other routine passing the requested input channels as arguments.
Process definition
The syntax for the definition of a process is nearly identical to the usual one, it only requires the use of
processDef
instead ofprocess
and the omission of thefrom
/into
declarations. For example:The semantic and supported features remain identical to current process. See a complete example here.
Process invocation
Once a process is defined it can be invoked like any other function in the pipeline script. For example:
Since the
index
defines an output channel its return value can be assigned to a channel variable that can be used as usual eg:If the process were producing two (or more) output channels the multiple assignment syntax can be used to get a reference to the output channels.
Process composition
The result of a process invocation can be passed to another process like any other function, eg:
Process chaining
Processes can also be invoked as custom operators. For example a process
foo
taking one input channel can be invoked as:when taking two channels as:
This allows the chaining of built-in operators and processes together eg:
See the complete script here.
Library file
A library is just a NF script containing one or more
processDef
declarations. Then the library can be imported using theimportLibrary
statement, eg:Relative paths are resolved against the project
baseDir
variable.Test it
You can try to the current implementation using the version
19.0.0.modules-draft2-SNAPSHOT
eg.Open points
When a process is defined in a library file, should it be possible to access to the
params
values? Currently it's possible, but I think this is not a good idea because makes the library depending on the script params making it very fragile.How to pass parameters to a process defined in library files eg. For example memory and cpus settings? It could be done using config file as usual, still I expect there could be the need to parametrise the process definition and specify the parameters at invocation time.
Should a namespace be used when defining the processes in library? What if two or more processes have the same name in different library files?
One or many processes per library file? Currently it can be defined any number of processes, I'm starting to think that it would be better to allow the definition only of one process per file. This would simplify the reuse across different pipelines, the import in tools such as dockstore and it would make the dependencies of the pipeline more intelligible.
Remote library file? Not sure it's a good idea to being able to import remote hosted files e.g.
http://somewhere/script.nf
. Remote paths tend to change over time.Should a versioning number be associated with the process definition? how to use or enforce it?
How test process components? ideally it should be possible to include the required contained in the process definition and unit test each process independently.
How chain a process retuning multiple channels?
The text was updated successfully, but these errors were encountered: