You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Thank you in advance for any input anyone may have on this issue. I am assembling a diploid plant species and my assembly is twice the estimated size, highly fragmented, small N50, and high duplication rate. I have run numerous hifiasm assemblies (30+) and have not previously come across this issue and am struggling to figure out what is going on with this species. Please note results are the same with hifiasm version 0.19.9-r616, 0.20.0-r639, and 0.23.0-r691. I have also adjusted the homozygous read coverage, -s, varying hifi read length cutoffs (all, 5kb, 10kb, 15kb, 20kb, 25kb), varying kmer sizes, and with/without HiC data. The estimated genome size is 1.1 Gb via flow cytometry. This species has also been assembled twice confirming this estimate.
Example assembly statistics (all are quite similar to this), hifiasm v 0.23.0:
The kmer plot is not something I have seen before either:
I have run numerous contamination checks (kraken), genomescope, pandepth (HiFi reads vs. published assembly), smudgeplot, etc. if any of those results may be helpful.
Genomescope of HiFi reads, kmer of 51
Thank you for any help/tips anyone may be able to provide.
The text was updated successfully, but these errors were encountered:
Based on the information you provided and the three images, it appears that your HiFi reads contain a significant amount of low-quality reads. You should perform a re-analysis of the Genomescope using the k-mer frequencies of Illumina short reads to validate this hypothesis. If the hypothesis is correct, you can filter your HiFi reads by using Filtlong in combination with the short reads. Finally, you can reassemble the filtered reads using hifiasm.
From my experience with plants, this is likely a polyploid. Double genome size and BUSCO of 5% single and 94% duplicated is an indication; you have a tetraploid and its being output as diploid. Most likely an autotetraploid as you cant see multiple peaks in the genomescope plot. But that is also probably because its low coverage per haplotype for Genomescope (and smudgplot) and that peak of 'errors' are true unique kmers. These are ideally for high coverage Illumina
Try --n-hap 4 then run busco
I could be wrong; if the --hom-cov is incorrect (too low), it outputs too much of the same seq. Check how you calculated that, given GenomeScope could be giving unreliable rusults here.
Alternatively, a disaster has happened in the lab and leaves of two specimens/species have been mixed together.
Hello,
Thank you in advance for any input anyone may have on this issue. I am assembling a diploid plant species and my assembly is twice the estimated size, highly fragmented, small N50, and high duplication rate. I have run numerous hifiasm assemblies (30+) and have not previously come across this issue and am struggling to figure out what is going on with this species. Please note results are the same with hifiasm version 0.19.9-r616, 0.20.0-r639, and 0.23.0-r691. I have also adjusted the homozygous read coverage, -s, varying hifi read length cutoffs (all, 5kb, 10kb, 15kb, 20kb, 25kb), varying kmer sizes, and with/without HiC data. The estimated genome size is 1.1 Gb via flow cytometry. This species has also been assembled twice confirming this estimate.
Example assembly statistics (all are quite similar to this), hifiasm v 0.23.0:
The kmer plot is not something I have seen before either:

I have run numerous contamination checks (kraken), genomescope, pandepth (HiFi reads vs. published assembly), smudgeplot, etc. if any of those results may be helpful.
Genomescope of HiFi reads, kmer of 51


Thank you for any help/tips anyone may be able to provide.
The text was updated successfully, but these errors were encountered: