Implement effective masking for BBDuk-based viral screening and move FASTP downstream #129
Labels
done
Issues that have been addressed in dev branch, but not reflected in master branch
enhancement
New feature or request
priority_1
time&cost
Changes to improve the pipeline's runtime and computational cost
A major contributor to the cost of the pipeline is the need to run FASTP on all raw reads prior to initial screening for viral status with BBDuk. This has historically been necessary because otherwise BBDuk detects many false positives due to contamination of the reference genomes with adapter & low-entropy sequences. We now have a new implementation of the screening step that uses more effective masking of the references to avoid this problem; implement it in the pipeline and move FASTP downstream to dramatically reduce the computational cost of read cleaning.
(NB: In addition to moving FASTP downstream of BBDuk in the viral subworkflow, we will also need to move it downstream of read subsetting in the PROFILE subworkflow.)
The text was updated successfully, but these errors were encountered: