Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option for multiple samples #15

Closed
fmalmeida opened this issue Sep 15, 2021 · 9 comments · Fixed by #23
Closed

Add an option for multiple samples #15

fmalmeida opened this issue Sep 15, 2021 · 9 comments · Fixed by #23
Assignees
Labels
enhancement New feature or request

Comments

@fmalmeida
Copy link
Owner

Add an option to facilitate and organize the execution of the pipeline for multiple samples. Maybe create something using a YAML syntax as it is done in bacannot.

@fmalmeida fmalmeida added the enhancement New feature or request label Sep 15, 2021
@fmalmeida fmalmeida self-assigned this Sep 15, 2021
@fmalmeida
Copy link
Owner Author

Implementation under development in branch issue-15.

@fmalmeida
Copy link
Owner Author

fmalmeida commented Sep 25, 2021

The implementation is already undergoing with something similar to what is done in bacannot samplesheet.

The implementation shall go step-by-step and execution tests must be performed. A workflow will only start to be implemented when the other has been fully developed and tested.

Checklist:

short-reads only

  • Implement YAML samplesheet for short-reads only workflows
  • execute assembly with paired end reads
  • execute assembly with single end reads
  • execute assembly with both paired single end reads

long reads only

  • Implement YAML samplesheet for long-reads only workflows
  • execute assembly with nanopore reads, without nanopolish
  • execute assembly with nanopore reads, with nanopolish
  • execute assembly with pacbio reads, without gcpp
  • execute assembly with pacbio reads, with gcpp

Think about medaka model. It should be defined for each sample in the YAML or it must continue as a "outside-YAML" general parameter that sets the value for all samples?

Hybrid assembly

strategy 1

  • Implement YAML samplesheet for hybrid workflows, in strategy 1.
  • execute strategy one assembly with paired end reads + nanopore
  • execute strategy one assembly with single end reads + nanopore
  • execute strategy one assembly with both paired and single end reads + nanopore
  • execute strategy one assembly with paired end reads + pacbio
  • execute strategy one assembly with single end reads + pacbio
  • execute strategy one assembly with both paired and single end reads + pacbio

strategy 2

  • Implement YAML samplesheet for hybrid workflows, in strategy 2.
  • execute strategy two assembly with paired end reads + nanopore
  • execute strategy two assembly with single end reads + nanopore
  • execute strategy two assembly with both paired and single end reads + nanopore
  • execute strategy two assembly with paired end reads + pacbio
  • execute strategy two assembly with single end reads + pacbio
  • execute strategy two assembly with both paired and single end reads + pacbio

@fmalmeida
Copy link
Owner Author

Almost finished!

It was required to make some changes in the way the channels were called and created. So now everything seems properly implemented ...

However, I will have to execute all the tests again to make sure everything is ok and it can be released.

While the tests run, I will start working on the new documentation.

@fmalmeida
Copy link
Owner Author

All seems to be working.

Now finish the documentation for release.

@fmalmeida fmalmeida mentioned this issue Oct 22, 2021
@fmalmeida
Copy link
Owner Author

fmalmeida commented Oct 23, 2021

Little update on tests:

  • The use of gcpp in the samplesheet did not work. Check how it is differing from the single sample workflow and try to create the input tuple similar to it, since it is working.

@fmalmeida fmalmeida pinned this issue Oct 27, 2021
@fmalmeida
Copy link
Owner Author

Found an issue on how nanopolish was being called! It is now solved, with proper multi-thread and the corrected module will be available in v2.4, together with this multi-sample workflow release.

@fmalmeida fmalmeida unpinned this issue Oct 29, 2021
@fmalmeida fmalmeida pinned this issue Oct 29, 2021
@fmalmeida
Copy link
Owner Author

Working on documentation for release!

@fmalmeida
Copy link
Owner Author

This implementation is provoking major changes in the pipeline, therefore, it will delay a littler bit more since we are trying to decide the best implementation for the parameters in and outside the YAML samplesheet in order to be the less confusing possible.

When finished, it will trigger a major release version, v3.0

@fmalmeida fmalmeida linked a pull request Nov 4, 2021 that will close this issue
@fmalmeida
Copy link
Owner Author

This implementation has now been made available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant