downsampling of excessive reads #50

dpark01 · 2024-11-19T15:44:05Z

Newer data with excessive sequencing coverage (sometimes extremely uneven depending on sequencing method) necessitates some read downsampling steps to be added:

Add an alignment-free k-mer based downsampling (bbnorm) directly prior to SPAdes/de novo assembly. Currently we randomly downsample to 10M reads but if species are unevenly represented, using bbnorm may help recover more underrepresented genomes in metagenomic data. Do not use this downsampled output for anything else, just the pre-SPAdes processing (kind of like our trimmomatic step).
Add an alignment-based downsampler for coverage flattening (rasusa) directly prior to consensus genome generation/polishing/making a new fasta (ie, right before the GATK steps). Also use this downsampled bam for lofreq. Do not emit this downsampled bam as a workflow-level output to the user; do not use this bam for coverage plots or metrics (use all the reads for that).

The text was updated successfully, but these errors were encountered:

dpark01 added the bug Something isn't working label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

downsampling of excessive reads #50

downsampling of excessive reads #50

dpark01 commented Nov 19, 2024

downsampling of excessive reads #50

downsampling of excessive reads #50

Comments

dpark01 commented Nov 19, 2024