-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.7 - Segmentation fault, bogart failed #844
Comments
Whoops. It's an easy fix, but DO NOT 'git update' your code. The on-disk data structures have changed since you started this assembly. Instead, edit src/bogart/AS_BAT_BestOverlapGraph.C and delete line 416, the middle line ("writeLog(...)") below:
Recompile and then restart canu. |
Thank you for your fast answer. However canu fails again.
and the log file:
Thank you again for your help! |
That one is a bit harder, and I'm distressed it's still failing. I thought I fixed it (see #718 and #546 for other crashes). Any chance you can upload unitigging/.gkpStore and unitigging/.ovlStore (unitigging/4-unitigger would be helpful, but not strictly necessary) so I can debug? I also just noticed you've only got 11x of reads - is that 11x of raw uncorrected reads, or 11x of corrected reads? How were these corrected? Some hints are under 'low coverage' in http://canu.readthedocs.io/en/latest/faq.html. Looking at the overlap report (search for "Overlap store 'unitigging/HEcanuCorrLo.ovlStore' contains") there isn't much of anything to assemble here. Worse, it's also reporting:
so most of your reads aren't getting used at all. I think all you've got here are the repeats. :-( I'd still be interested in debugging the crash, if you're able to upload the data. |
Thank you again for responding so fast. To answer your other questions: Yes, I have only 11x coverage of the genome with PacBio, but also 120x Illumina reads. This assembly I am trying here is using lordec to correct the PacBio reads with Illumina, and then assembling them with canu. I am also trying separately to assemble with lordec-corrected PacBio reads, by declaring them as uncorrected. I know the assembly will not be great, but it will be used to apply for funds to get more data. |
If I'm understanding correctly, you have 11x of lordec corrected reads, and are running two assemblies with those reads, one using -pacbio-corrected (which crashed) and one pretending they're raw reads using -pacbio-raw. Great! You can also try an assembly without trimming the lordec reads - "-assemble -pacbio-corrected reads.fasta". It could result in a better assembly as Canu will trim (or more likely, completely ignore) reads that have only overlaps on the ends. Yes, the gkpStore (read info) and ovlStore (overlaps) are all I need to run the unitigger (bogart) over here. With that, I can poke around in the gory details and find the problem. I probably won't be able to do anything until Wednesday. |
Hi. The crashed assembly was actually started the way you suggest now as -assembly -pacbio-corrected lordec_corrected_reads.fasta. |
Dear Brian
Can you please let me know what shall I do next regarding my segmentation fault problem? Shall I try to install a newer version of canu and rerun the entire analysis? Or are you still going to debug? Could you find the data I uploaded?
Thanks a lot
|
I'm finishing up the fix right now. The fix will be a pair of files in src/bogart/. Unfortunately, you can't easily upgrade to the latest version of Canu, since on-disk data changed. Once I give it a couple more tests I'll post the files here. |
It seems possible to upgrade your on-disk data to the current version. Using your current binaries:
And then with the latest binaries:
This rewrites the overlaps from the old format into the new format. The output is a new ovlStore (creatively called 'new.ovlStore'). The gkpStore data format didn't change. I was debugging a similar crash to yours, and thought I had the problem fixed. But your example still fails. There's no point in moving to the tip code yet; 'bogart' is still the same. It might be possible to get around the problem by decreasing the allowed overlap error rate; decrease both the -tg and -eM values (by 0.1?) in unitigger.sh and run that script (./unitigger.sh 1). Your data seems very noisy; the *.001.filterOverlaps.thr000.num000.log reports
where the mean is usually around 0.01 or 0.02 and the final error is around 3% to 8%. |
Well, I got it to run. But it didn't assemble. Only 968 contigs with total size 15 Mbp were output. About 15 Gbp of 'unassembled' pieces, most of these are singleton reads. No patch to the code yet; I'm working out other issues still. Here's a histogram of the error rates in overlaps. It looks like it's maybe truncated at the high end. It's also much higher than I'm comfortable assembling - any genome duplication(s) cannot be distinguished, repeats will get smashed together, etc, etc. |
Dear Brian |
Thanks for sharing the data! The algorithm that fails seems to be getting confused by repetitiveness of this data. This could be caused by lordec homogenizing repeats, or the high divergence in overlaps, or it could just be a property of your genome. I thought I had a fix, but am now back to rethinking the whole algorithm. |
That was ugly, but I think I (finally) got it fixed. Your data has been removed. It was on a disk that isn't backed up. |
Hi
I am running a canu (1.7) assembly of a plant genome (1.3gb) based on 11x PacBio reads. It all goes well till unitigging, when bogart fails with a segmentation fault.
I tried restarting and then I start the entire process fresh, but I am getting the same error.
I would greatly appreaciate some help.
I am running canu in a Linux cluster. The canu command used: '/canu/Linux-amd64/bin/canu -assemble -d /PacBio/ useGrid=false correctedErrorRate=0.105 -pacbio-corrected /PacBio/all_corr_pacbio.fasta genomeSize=1.3g -p HEcanuCorrLo'
The contents of
unitigging/4-unitigger/unitigger.err
:The contents of the 'log file':
The text was updated successfully, but these errors were encountered: