Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hifiasm using about gametophyte of fern #718

Open
guoshanf opened this issue Nov 2, 2024 · 2 comments
Open

hifiasm using about gametophyte of fern #718

guoshanf opened this issue Nov 2, 2024 · 2 comments

Comments

@guoshanf
Copy link

guoshanf commented Nov 2, 2024

Hello!
Thank you for providing such an excellent software!
I am currently working on a study involving a fern. Due to a minor oversight, I sent its haploid gametophyte for sequencing, meaning I only obtained sequencing data for one set of chromosomes. At the time, I did not notice this issue, so I proceeded to assemble the genome using hifiasm with default settings. Previously, I estimated the genome size to be approximately 1.8G using flow cytometry. However, the resulting genome assembly is 3.6G in size. I would like to ask if this is because hifiasm, by default, assembles two sets of chromosomes under the assumption of diploidy? If so, is there a way to assemble just one haploid set using hifiasm? If not, what could be the reason for the discrepancy between my assembled genome size and the predicted genome size?
Thanks in advance for your help!

@chhylp123
Copy link
Owner

You could consider to use: https://github.com/dfguan/purge_dups

@guoshanf
Copy link
Author

guoshanf commented Nov 5, 2024

Thank you very much for your advice!

Previously, I thought the issue was with the sequencing depth, so I supplemented with additional HiFi data, which means I ended up with a total of 222G of fastq data. I then assembled the data separately using the first sequencing data, the second sequencing data, and the combined data from both sequencing runs. The assembly results were 3.6G, 4.0G, and 4.6G, respectively. I first evaluated the combined assembly result using BUSCO, achieving an impressive 96.2% with the viridiplantae database, but only 83.1% with the embryophyta database. After that, I processed it with purge_haplotigs and eventually obtained 4.2G of clean data. Both sequencing runs were performed on the haploid gametophyte. Since the genome sizes of the three assemblies are different, I am now unsure if my approach is correct. I also am uncertain if the original data from haploid sequencing can be directly assembled using hifiasm. Currently, it seems the results from flow cytometry do not offer much reference value either. Do you think it is necessary to resequence its diploid sporophyte? Or should I increase the sequencing depth further? Alternatively, should I switch to a different assembly tool?

Thank you in advance for your generous assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants