-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using smudgeplot with very high ploidy levels #66
Comments
Hi Cyril, I am glad to see you work on freaky stuff now :-) (223e6 * 300) / 45e6 = 1486.667 (here 45e6 is the genome size). That's a psycho high coverage! It should be divided by ploidy to get 1n cov estimate, and I totally see why you thought of giving smudgpelot a try. We did not really thought of 45-ploids when we designed smudgeplot and dependent on the difergence of haplotypes it is / is not expected to work. If there is too much diversity between the ploidies, no kmer pairs will be found as unique (there will be always more than a pair) and indeed in your smudgeplot you don't have that many kmers, maybe there is one problem. Another thing I noticed is that you have U=1000. Use something MUCH higher. 3000 or 5000. Almost all the kmers are on the upper edge, you got to spread them in the plane. Let it go as high as it needs to go. Also L = 135 might be too high if your you have really a 45 ploid (then you would expect 1n coverage to be 1486 / 45 ~= 33x; here 45 is the guessed ploidy). Would you mind posting here the histogram too? I would be quite curious to see how it looks like... |
Cool, we are getting somewhere. We see the most of the kmer pairs still up, but it does look much nicer than before (it's not railing the top edge). The genomic smudges should always look like smudges if they are too much a different shape they are most likely an artefact of some sorts. I also hardcoded ylim to be 10x the 1n estimate. Which is not unfortunatelly messing with your plot. Would you be comfortable adjust that bit of code in your version (l71 in the smudgeplot_plot.R script)? It's rather specialised usage, I might add you a flag "--expect-extreme-ploidies", haha. The histogram is kind of messy. The truncation due to a flag you specified to KMC (they have the argument that specifies the threshold on the counter). I usually use genomescope to visualize kmerspectra even I don't really care about the model, but maybe in your case you got to write your own R script to give it a reasonable limits for x/y axis. If the smudgeplot runs fast, I would recommend trying a few more L values, because it's really hard to say from the kmer spectra what is a reasonable L value - I would try 50 and 100 at least. |
As you suggested, I tried removing the hardcoded threshold (in PR #68) but it doesn't seem to be much better. I noticed using the Here is the histogram, visualized in R with L=100 and U=5000: I'm not sure the histogram is very helpful... if I understand correctly, the smudges are too close and appear as a single large smudge ? The Thanks a lot for your help 😄 |
Hi Kamil,
Thanks for the clear documentation and awesome tool.
I have troubles understanding my smudgeplot. I have used follwing command to generate it
smudgeplot complained that there were two smudges the same positions and decreased nbins to 12
and it look like this:
Out of curiosity, I fixed
-nbins
to 30 to see how it looks, but its basically the same.The organism in question is an amoeba (Acanthamoeba castellanii). It is a weird beast, it reproduces asexually (but there is some evidence of meiosis). There are also evidence of reproduction by amitosis, which would make it very homozygous. The exact number of chromosomes per haploid genome is in the range of 30-40, and the ploidy has been estimated to approximately 25. We know the haploid genome size is ~45Mb.
What I fed to smudgeplot is an illumina shotgun library of 223M 2x150bp reads.
Smudgeplot seems to think there are many chromosomes (~500) with a low ploidy level (3-4), but it is likely the opposite. Is it even possible to use smudgeplot with such high levels of ploidies ?
The text was updated successfully, but these errors were encountered: