make complete examples #17

KamilSJaron · 2018-09-27T14:59:56Z

We need some pilot examples that will go from raw reads to smudgeplot. We should do both those that are big and complicated as well as those that are small and simple, I htink three examples will be perfect.

Good candidates for demonstration of power:

rainbow trout genome: it was sequenced by 454, for this reason it would be even more interesting to try it. Reads
sweet potato: hexaploid; reads
something small???

KamilSJaron · 2018-10-01T07:09:55Z

octaploid strawberry: there two sequencing projects, one has it's own nice webpage, publication and reads. The second one has bit more difficult data, since the coverage is not so high, 2–7× per octoploid chromosome, but maybe thanks to octoploidy one will be still able to see something (like AAAABBBB cluster): pub

KamilSJaron · 2018-10-03T10:12:13Z

I was advised to run it on more classical species:

classical diploid models (Arabidopsis, fly etc)
tetraploid Xenopus laevis (SAMN04518361) and diploid Xenopus tropicalis or maybe Xenopus borealis (SAMN04518076) (paper)

KamilSJaron · 2018-10-04T15:00:47Z

Strawberry tutorial is done here: https://github.com/tbenavi1/smudgeplot/wiki/strawberry-tutorial

Genomescope for determination of L and U:

These are strawberry smudges:

rotifergirl · 2019-01-08T16:36:11Z

Hi Kamil,

I just wanted to check something with my data, so I re-ran the strawberry example using jellyfish (no problems, nearly identical output), and then I downsampled the Fragaria iinumae reads (I took a random 10% of the paired reads, using seqtk). Then I reran the jellyfish and smudgeplot pipeline, and the results certainly reflect what I have seen with my data now too.

It seems to me that a lower coverage results in:

slightly worse resolution of smudges (I got the warning "detecting two smudges at the same positions, not enough data for this number of bins lowering number of bins to 35" with the downsampled reads, but not with the full read library)
loss of some smudges, see attached image

Obviously this is not a fault of the program, but users should be aware of coverage based limitations for this program.

For my data, my 1n estimates range from 17 to 27.

KamilSJaron · 2019-01-08T17:09:00Z

Hey Julie, the need for coverage is a good point. Thanks for the feedback!!!

I had troubles to phrase it nicely because it's not that simple. The coverage needed for a nice smudgeplot is dependent on the quality of sequencing (i.e. coverage variance). Smaller variance less coverage is needed to make nice smudges.

I just returned from PopGroup and people are super interested in smudgeplots. I really need to work out better documentation in general. Any input is welcome.

KamilSJaron · 2019-04-30T15:09:11Z

The labeling of octoploid straberry is corrected in 01ddc2e:

KamilSJaron · 2019-05-18T13:14:12Z

Mountain peanut is a quite nice real-life example of a species that was thought to be hexaploid, but has a kmer data supporting more tetraploidy, discussed in #36

KamilSJaron · 2021-06-01T10:45:16Z

Rainbow trout analyses now available #88

OyukaKh · 2023-08-07T09:02:10Z

Hi, I see that the issue is closed long before.
But i need some recommendations regarding my Smudgeplots. Because they are not making sense and I am lost at the moment. So i though it might be helpful to write you all. Please help me!

I have a 120bp single-end RadSeq data (>30x coverage) for a number of Salix species, which I want to determine the ploidy level using the Smudgeplot. I run following commands:

kmc -k21 -t16 -m64 -ci1 -cs10000 "${file}.fastq" "${file}_kmcdb" tmp
kmc_tools transform "${file}_kmcdb" histogram "${file}_kmcdb_k21.hist" -cx10000
L=$(smudgeplot.py cutoff "${file}_kmcdb_k21.hist" L) # Determines automatically; L should be like 20 - 200
U=$(smudgeplot.py cutoff "${file}_kmcdb_k21.hist" U) # U should be like 500 - 3000
kmc_tools transform "${file}_kmcdb" -ci"$L" -cx"$U" reduce "${file}_kmcdb_L${L}_U${U}"
kmc_dump "${file}_kmcdb_L${L}_U${U}" "${file}_kmcdb_L${L}_U${U}_coverages.tsv" "${file}_kmcdb_L${L}_U${U}_pairs.tsv" > "${file}_kmcdb_L${L}_U${U}_familysizes.tsv"
kmc_tools transform "${file}_kmcdb" -ci"$L" -cx"$U" dump -s "${file}_kmcdb_L${L}_U${U}.dump"
smudgeplot.py hetkmers -o "${file}_kmcdb_L${L}_U${U}" < "${file}_kmcdb_L${L}_U${U}.dump"
smudgeplot.py plot "${file}_kmcdb_L${L}_U${U}_coverages.tsv" -o "${file}_kmcdb_L${L}_U${U}_smudgeplot" -t "${file}" -q 0.99

and then got this result for Salix retusa (NW17_076_L10_U520 (x41 coverage) and T2221 (x46) :

This species/individuals should be an octoploid according to our flow cytometry, but the Smudgeplot suggests triploid or diploid.

I did not give a constant number for L and U, because it showed me an error:
"detecting two smudges at the same positions, not enough data for this number of bins lowering number of bins to 35
detecting two smudges at the same positions, not enough data for this number of bins lowering number of bins to 30 ...".

When I checked GenomeScope, for both individuals, it failed to converge:

I have more than 100 individuals that are not fitting to my expectation of ploidy level.
So my question is, what am I possibly doing wrong?
What could be adjusted to get more precise estimation?

OyukaKh · 2023-08-07T09:03:44Z

Here are the Smudgeplots for S_retusa_T2221 (second example):

KamilSJaron added the enhancement New feature or request label Sep 27, 2018

BlueBerrySun mentioned this issue May 30, 2021

Is the kcov of genomescope more informative than the estimated haploid coverage of smudgeplot? #88

Closed

KamilSJaron closed this as completed Mar 13, 2023

KamilSJaron mentioned this issue Aug 7, 2023

Salix RAD-data smudgeplots #122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make complete examples #17

make complete examples #17

KamilSJaron commented Sep 27, 2018

KamilSJaron commented Oct 1, 2018 •

edited

Loading

KamilSJaron commented Oct 3, 2018 •

edited

Loading

KamilSJaron commented Oct 4, 2018

rotifergirl commented Jan 8, 2019

KamilSJaron commented Jan 8, 2019 •

edited

Loading

KamilSJaron commented Apr 30, 2019

KamilSJaron commented May 18, 2019

KamilSJaron commented Jun 1, 2021

OyukaKh commented Aug 7, 2023

OyukaKh commented Aug 7, 2023

make complete examples #17

make complete examples #17

Comments

KamilSJaron commented Sep 27, 2018

KamilSJaron commented Oct 1, 2018 • edited Loading

KamilSJaron commented Oct 3, 2018 • edited Loading

KamilSJaron commented Oct 4, 2018

rotifergirl commented Jan 8, 2019

KamilSJaron commented Jan 8, 2019 • edited Loading

KamilSJaron commented Apr 30, 2019

KamilSJaron commented May 18, 2019

KamilSJaron commented Jun 1, 2021

OyukaKh commented Aug 7, 2023

OyukaKh commented Aug 7, 2023

KamilSJaron commented Oct 1, 2018 •

edited

Loading

KamilSJaron commented Oct 3, 2018 •

edited

Loading

KamilSJaron commented Jan 8, 2019 •

edited

Loading