Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement effective masking for BBDuk-based viral screening and move FASTP downstream #129

Open
willbradshaw opened this issue Dec 17, 2024 · 0 comments
Assignees
Labels
done Issues that have been addressed in dev branch, but not reflected in master branch enhancement New feature or request priority_1 time&cost Changes to improve the pipeline's runtime and computational cost

Comments

@willbradshaw
Copy link
Contributor

A major contributor to the cost of the pipeline is the need to run FASTP on all raw reads prior to initial screening for viral status with BBDuk. This has historically been necessary because otherwise BBDuk detects many false positives due to contamination of the reference genomes with adapter & low-entropy sequences. We now have a new implementation of the screening step that uses more effective masking of the references to avoid this problem; implement it in the pipeline and move FASTP downstream to dramatically reduce the computational cost of read cleaning.

(NB: In addition to moving FASTP downstream of BBDuk in the viral subworkflow, we will also need to move it downstream of read subsetting in the PROFILE subworkflow.)

@willbradshaw willbradshaw added enhancement New feature or request priority_1 labels Dec 17, 2024
@willbradshaw willbradshaw added the time&cost Changes to improve the pipeline's runtime and computational cost label Dec 17, 2024
@harmonbhasin harmonbhasin added the done Issues that have been addressed in dev branch, but not reflected in master branch label Jan 22, 2025
@harmonbhasin harmonbhasin self-assigned this Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
done Issues that have been addressed in dev branch, but not reflected in master branch enhancement New feature or request priority_1 time&cost Changes to improve the pipeline's runtime and computational cost
Projects
None yet
Development

No branches or pull requests

2 participants