You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear authors, thanks for the great tool you developed!
I need some recommendations regarding my Smudgeplots. Because they are not making sense and I am lost at the moment.
So i though it might be helpful to write you. Please help me!
I have a 120bp single-end RadSeq data (>30x coverage) for a hundreds of Salix species, which I want to determine the ploidy level using the Smudgeplot. I run following commands:
kmc -k21 -t16 -m64 -ci1 -cs10000 "${file}.fastq" "${file}_kmcdb" tmp
kmc_tools transform "${file}_kmcdb" histogram "${file}_kmcdb_k21.hist" -cx10000
L=$(smudgeplot.py cutoff "${file}_kmcdb_k21.hist" L) # Determines automatically; L should be like 20 - 200
U=$(smudgeplot.py cutoff "${file}_kmcdb_k21.hist" U) # U should be like 500 - 3000
kmc_tools transform "${file}_kmcdb" -ci"$L" -cx"$U" reduce "${file}_kmcdb_L${L}_U${U}"
kmc_dump "${file}_kmcdb_L${L}_U${U}" "${file}_kmcdb_L${L}_U${U}_coverages.tsv" "${file}_kmcdb_L${L}_U${U}_pairs.tsv" > "${file}_kmcdb_L${L}_U${U}_familysizes.tsv"
kmc_tools transform "${file}_kmcdb" -ci"$L" -cx"$U" dump -s "${file}_kmcdb_L${L}_U${U}.dump"
smudgeplot.py hetkmers -o "${file}_kmcdb_L${L}_U${U}" < "${file}_kmcdb_L${L}_U${U}.dump"
smudgeplot.py plot "${file}_kmcdb_L${L}_U${U}_coverages.tsv" -o "${file}_kmcdb_L${L}_U${U}_smudgeplot" -t "${file}" -q 0.99
and then got this result for Salix retusa (e.g., NW17_076_L10_U520 (x41 coverage) and T2221 (x46)) :
This species/individuals should be an octoploid according to our flow cytometry, but the Smudgeplot suggests triploid or diploid.
I did not give a constant number for L and U, because it showed me an error:
"detecting two smudges at the same positions, not enough data for this number of bins lowering number of bins to 35
detecting two smudges at the same positions, not enough data for this number of bins lowering number of bins to 30 ...".
So, i let it to estimate automatically. So, it varies from L10 to L60.
When I checked GenomeScope, for both individuals, it failed to converge:
Here are the .hist files that I loaded to GenomeScope: Hist_files.zip
I have more than 100 individuals that are not fitting to my expectation of ploidy level.
So my questions are:
what am I possibly doing wrong?
What could be adjusted to get more precise estimation? (Are the L and U values crucial?)
Thanks!
The text was updated successfully, but these errors were encountered:
I am sorry, but nor smudgeplot or genomescope are designed to work on RAD data. It should have been on the wiki, I added a section it to FAQ now.
This species/individuals should be an octoploid according to our flow cytometry, but the Smudgeplot suggests triploid or diploid.
I honestly don't think smudgeplot is a good technique here. If you have a reference, you could look at the markers mapped on the reference and the coverage ratios of the alleles you see. A tool that runs these models is called nQuire.
Dear authors, thanks for the great tool you developed!
I need some recommendations regarding my Smudgeplots. Because they are not making sense and I am lost at the moment.
So i though it might be helpful to write you. Please help me!
I have a 120bp single-end RadSeq data (>30x coverage) for a hundreds of Salix species, which I want to determine the ploidy level using the Smudgeplot. I run following commands:
kmc -k21 -t16 -m64 -ci1 -cs10000 "${file}.fastq" "${file}_kmcdb" tmp
kmc_tools transform "${file}_kmcdb" histogram "${file}_kmcdb_k21.hist" -cx10000
L=$(smudgeplot.py cutoff "${file}_kmcdb_k21.hist" L) # Determines automatically; L should be like 20 - 200
U=$(smudgeplot.py cutoff "${file}_kmcdb_k21.hist" U) # U should be like 500 - 3000
kmc_tools transform "${file}_kmcdb" -ci"$L" -cx"$U" reduce "${file}_kmcdb_L${L}_U${U}"
kmc_dump "${file}_kmcdb_L${L}_U${U}" "${file}_kmcdb_L${L}_U${U}_coverages.tsv" "${file}_kmcdb_L${L}_U${U}_pairs.tsv" > "${file}_kmcdb_L${L}_U${U}_familysizes.tsv"
kmc_tools transform "${file}_kmcdb" -ci"$L" -cx"$U" dump -s "${file}_kmcdb_L${L}_U${U}.dump"
smudgeplot.py hetkmers -o "${file}_kmcdb_L${L}_U${U}" < "${file}_kmcdb_L${L}_U${U}.dump"
smudgeplot.py plot "${file}_kmcdb_L${L}_U${U}_coverages.tsv" -o "${file}_kmcdb_L${L}_U${U}_smudgeplot" -t "${file}" -q 0.99
and then got this result for Salix retusa (e.g., NW17_076_L10_U520 (x41 coverage) and T2221 (x46)) :
This species/individuals should be an octoploid according to our flow cytometry, but the Smudgeplot suggests triploid or diploid.
I did not give a constant number for L and U, because it showed me an error:
"detecting two smudges at the same positions, not enough data for this number of bins lowering number of bins to 35
detecting two smudges at the same positions, not enough data for this number of bins lowering number of bins to 30 ...".
So, i let it to estimate automatically. So, it varies from L10 to L60.
When I checked GenomeScope, for both individuals, it failed to converge:
Here are the .hist files that I loaded to GenomeScope:
Hist_files.zip
I have more than 100 individuals that are not fitting to my expectation of ploidy level.
So my questions are:
Thanks!
The text was updated successfully, but these errors were encountered: