Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downsampling of excessive reads #50

Open
dpark01 opened this issue Nov 19, 2024 · 0 comments
Open

downsampling of excessive reads #50

dpark01 opened this issue Nov 19, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@dpark01
Copy link
Member

dpark01 commented Nov 19, 2024

Newer data with excessive sequencing coverage (sometimes extremely uneven depending on sequencing method) necessitates some read downsampling steps to be added:

  1. Add an alignment-free k-mer based downsampling (bbnorm) directly prior to SPAdes/de novo assembly. Currently we randomly downsample to 10M reads but if species are unevenly represented, using bbnorm may help recover more underrepresented genomes in metagenomic data. Do not use this downsampled output for anything else, just the pre-SPAdes processing (kind of like our trimmomatic step).
  2. Add an alignment-based downsampler for coverage flattening (rasusa) directly prior to consensus genome generation/polishing/making a new fasta (ie, right before the GATK steps). Also use this downsampled bam for lofreq. Do not emit this downsampled bam as a workflow-level output to the user; do not use this bam for coverage plots or metrics (use all the reads for that).
@dpark01 dpark01 added the bug Something isn't working label Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant