vgp assembly: flowcharts for meryl and purge_dups #4863

abueg · 2024-03-25T19:08:56Z

hello! 👋🏼

added some new graphics for the meryl section (trying to clarify how we're working on a collection for batch jobs -> merging the outputs), and the purge_dups section

Delphine-L

The graphics look great! I suggested a different formulations in a couple of places, I think it could improve readability.

Delphine-L · 2024-04-01T19:10:10Z

topics/assembly/tutorials/vgp_genome_assembly/tutorial.md

@@ -324,6 +324,10 @@ Meryl will allow us to generate the *k*-mer profile by decomposing the sequencin
 >
 {: .comment}

+In order to do genome profile analysis, first we need the *k*-mer spectrum of the raw reads, which should hopefully contain information about the genome that you sequenced. These *k*-mers are used to build a histogram (the *k*-mer spectrum), and then the GenomeScope model that fits that data will help infer genome characteristics. To count *k*-mers, we first count them on the separate FASTA files, before merging the counts and generating a histogram based on that. This is a way of parallelizing our work. 


Suggestion: In order to identify the key metrics of the genome (profile), we start by generating a histogram of the k-mer distribution in the raw reads (the k-mer spectrum). Then, GenomeScope creates a model fitting the spectrum that allows to estimate the genome characteristics. We work in parallel on each set of raw reads, creating a database of each k-mer counts, then merge all the databases by adding the counts of the k-mers to build the histogram.

incorporated this wording with some edits, does this sound ok? let me know if not !!

Delphine-L · 2024-04-01T19:16:52Z

topics/assembly/tutorials/vgp_genome_assembly/tutorial.md


 ![Post-processing with purge_dups](../../images/vgp_assembly/purge_dupspipeline.png "Purge_dups pipeline. Adapted from github.com/dfguan/purge_dups. Purge_dups is integrated in a multi-step pipeline consisting in three main substages. Red indicates the steps which require to use Minimap2.")

+The way purging is incorporated in the VGP pipeline, first the **primary assembly** is purged, resulting in a clean (purged) primary assembly and a set of contigs that were *removed* from those contigs. These will often contain haplotigs representing alternate alleles. We would like to keep that in the alternate assembly, so the next step is adding (concatenating) this file to the original alternate assembly. This file then undergoes purging as well, to remove any junk or overlaps.


Suggestion: Purging may be used in the VGP pipeline when there are suspicions of false duplications (Figure 1). In such cases, we start with a purging of the primary assembly, resulting in a clean (purged) primary assembly and a set of contigs that were removed from those contigs. These removed contigs will often contain haplotigs representing alternate alleles. We would like to keep that in the alternate assembly, so the next step is adding (concatenating) this file to the original alternate assembly. To make sure we don't introduce redundancies in the alternate assembly that way, we perform a purging on it as well to remove any junk or overlaps.

incorporated! does this look OK?

topics/assembly/tutorials/vgp_genome_assembly/tutorial.md

clarifying text in kmer counting parallelization alt. text, thank you hexylena! Co-authored-by: Helena <[email protected]>

shiltemann · 2024-05-21T11:20:31Z

Thanks @abueg! and thanks for the review @Delphine-L !

vgp assembly: flowcharts for meryl and purge_dups

e8c22d0

abueg requested a review from a team as a code owner March 25, 2024 19:08

github-actions bot added the assembly label Mar 25, 2024

Delphine-L previously approved these changes Apr 1, 2024

View reviewed changes

incorporating delphine's suggestions

9391226

abueg dismissed Delphine-L’s stale review via 9391226 April 1, 2024 19:32

Delphine-L previously approved these changes Apr 1, 2024

View reviewed changes

hexylena reviewed Apr 3, 2024

View reviewed changes

topics/assembly/tutorials/vgp_genome_assembly/tutorial.md Outdated Show resolved Hide resolved

Update topics/assembly/tutorials/vgp_genome_assembly/tutorial.md

8ad6dce

clarifying text in kmer counting parallelization alt. text, thank you hexylena! Co-authored-by: Helena <[email protected]>

abueg dismissed Delphine-L’s stale review via 8ad6dce April 3, 2024 15:43

shiltemann approved these changes May 21, 2024

View reviewed changes

shiltemann merged commit 408fd00 into galaxyproject:main May 21, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vgp assembly: flowcharts for meryl and purge_dups #4863

vgp assembly: flowcharts for meryl and purge_dups #4863

abueg commented Mar 25, 2024

Delphine-L left a comment

Delphine-L Apr 1, 2024

abueg Apr 1, 2024

Delphine-L Apr 1, 2024

abueg Apr 1, 2024

shiltemann commented May 21, 2024


		![Post-processing with purge_dups](../../images/vgp_assembly/purge_dupspipeline.png "Purge_dups pipeline. Adapted from github.com/dfguan/purge_dups. Purge_dups is integrated in a multi-step pipeline consisting in three main substages. Red indicates the steps which require to use Minimap2.")

		The way purging is incorporated in the VGP pipeline, first the primary assembly is purged, resulting in a clean (purged) primary assembly and a set of contigs that were removed from those contigs. These will often contain haplotigs representing alternate alleles. We would like to keep that in the alternate assembly, so the next step is adding (concatenating) this file to the original alternate assembly. This file then undergoes purging as well, to remove any junk or overlaps.

vgp assembly: flowcharts for meryl and purge_dups #4863

vgp assembly: flowcharts for meryl and purge_dups #4863

Conversation

abueg commented Mar 25, 2024

Delphine-L left a comment

Choose a reason for hiding this comment

Delphine-L Apr 1, 2024

Choose a reason for hiding this comment

abueg Apr 1, 2024

Choose a reason for hiding this comment

Delphine-L Apr 1, 2024

Choose a reason for hiding this comment

abueg Apr 1, 2024

Choose a reason for hiding this comment

shiltemann commented May 21, 2024