Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snap crashes during sorting step #164

Closed
matthdsm opened this issue Jan 30, 2023 · 15 comments
Closed

Snap crashes during sorting step #164

matthdsm opened this issue Jan 30, 2023 · 15 comments

Comments

@matthdsm
Copy link

error

Loading index from directory... 13s.  3,100,314,541 bases, seed size 24.
Aligning.
sorting...read header failed
SNAP exited with exit code 1 from line 1293 of file SNAPLib/SortedDataWriter.cpp

version:

conda snap-aligner=2.0.2 on Linux

command:

snap-aligner paired ./snapaligner 1/FD2200256_DNA080266_1.fastp.fastq.gz 2/FD2200256_DNA080266_2.fastp.fastq.gz -o FD2200256_DNA080266.bam -t 18 -so -b- -sm 20 -I -hc- -S id -sa -R '@RG\tID:220623_A00785_0492_AH5TWVDRX2.2.2\tCN:CMGG\tPU:220623_A00785_0492_AH5TWVDRX2.2.AGCGCCAC-AAGACATT\tPL:ILLUMINA\tLB:CNV_LI_2022_088\tSM:FD2200256_DNA080266'

Info

Sorry, nothing much else to go on 🤷🏻 . This error occurred during the alignment of some shallow WGS data. I was able to fix it by downgrading the version to 2.0.1

Please let me know if I can provide more info.

Cheers
M

@bolosky
Copy link
Contributor

bolosky commented Jan 30, 2023 via email

@matthdsm
Copy link
Author

Hi,
This occured with multiple low coverage samples (about 37M reads) and only with v2.0.2.
We're using GRCh38 + decoy without alts, so I don't think the number of contigs are an issue?

Is there a verbose option I can try to get more output?

@bolosky
Copy link
Contributor

bolosky commented Jan 31, 2023

This is weird, since nothing changed in that code path between 2.0.1 and 2.0.2. Or at least nothing obvious.

I created a new branch called issue164 with some instrumentation. Could you please build and run it and report the output? It should only be a few extra lines, but maybe it'll help me figure it out.

@matthdsm
Copy link
Author

matthdsm commented Feb 2, 2023

Hi Bill,

The branch says it's up to date with master. Did you push the changes?

@matthdsm matthdsm closed this as completed Feb 2, 2023
@matthdsm matthdsm reopened this Feb 2, 2023
@bolosky
Copy link
Contributor

bolosky commented Feb 2, 2023

Oops. Try it now.

@matthdsm
Copy link
Author

matthdsm commented Feb 3, 2023

Was able to reproduce

Welcome to SNAP version 2.0.2.issue164.0.

Loading index from directory... 99s.  3,100,314,541 bases, seed size 24.
Aligning.
BAMFormat::writeHeader: headerActualSize 11913
sorting...read header failed, left 3909, headerSize 11913
AsyncFileDataReader (0x558509bc71b0) state:
        fileName FD2300072_DNA092919.bam.tmp, fileSize 11913, readOffset 12, endingOffset 11913
ReadBasedDataReader at 0x558509bc71b0 state:
        headerBufferSize 0, headerExtraSize 0, amountAdvancedThroughUnderlyingStoreByUs 0, nHeaderBuffersAllocated 0, hitEOFReadingHeader 0, bufferSize 8004
        nBuffers 2, headerBuffersOutstanding, 0, startedReadingHeader 0, extraBytes 0, overflowBytes 8000, nextBatchID 4, nextBufferForReader -1, nextBufferForConsumer 1, lastBufferForConsumer 0
SNAP exited with exit code 1 from line 1300 of file SNAPLib/SortedDataWriter.cpp

@matthdsm
Copy link
Author

matthdsm commented Feb 3, 2023

omitting -so fixes the issue, but I think that was a bit obvious looking at the error log 😅

@bolosky
Copy link
Contributor

bolosky commented Feb 4, 2023

I got it to repro. It would only happen with a header size in a certain range and with an input small enough that it's all in memory when the sort starts.

While none of the code in this pathway changed between 2.0.1 and 2.0.2, I did change the default read length, which was (incorrectly) being passed in as a parameter while reading the header from the sort intermediate file to write it to the final BAM and that caused the problem.

I fixed it (at least it seems to work now).

Let me know if it works for you (still in the issue164 branch, version 2.0.3.issue164.7) and I'll do more testing before putting it in dev.

@matthdsm
Copy link
Author

matthdsm commented Feb 6, 2023

Can confirm this fixed the issue (for me). Thanks a lot!

Welcome to SNAP version 2.0.2.issue164.7.

Loading index from directory... 119s.  3,100,314,541 bases, seed size 24.
Aligning.
BAMFormat::writeHeader: headerActualSize 11629
sorting...SortedDataFilterSupplier::mergeSort(): allocating new header reader
SortedDataFilterSupplier::mergeSort(): got 7862 bytes
SortedDataFilterSupplier::mergeSort(): advancing 7862 bytes
SortedDataFilterSupplier::mergeSort(): calling nextBatch()
SortedDataFilterSupplier::mergeSort(): got 3767 bytes
SortedDataFilterSupplier::mergeSort(): advancing 3767 bytes
sorted 14569942 reads in 18 blocks, 28 s
Total Reads    Aligned, MAPQ >= 10    Aligned, MAPQ < 10     Unaligned              Too Short/Too Many Ns  %Pairs    Reads/s   Time in Aligner (s)
14,569,942     11,759,373 (80.71%)    905,082 (6.21%)        362,059 (2.48%)        1,543,428 (10.59%)     74.94%    423,939   34

@bolosky
Copy link
Contributor

bolosky commented Feb 6, 2023

I put a slightly improved version of the fix with the instrumentation removed in the dev branch in 2.0.3.dev.1.

Let's leave this issue open until it makes it into master.

@matthdsm
Copy link
Author

matthdsm commented Feb 7, 2023

Hi!
I can't find the branch you mentioned. I did however run a successful test with the latest changes in the issue164 branch.

Welcome to SNAP version 2.0.2.issue164.8.

Loading index from directory... 118s.  3,100,314,541 bases, seed size 24.
Aligning.

sorting...sorted 14569942 reads in 18 blocks, 34 s
Total Reads    Aligned, MAPQ >= 10    Aligned, MAPQ < 10     Unaligned              Too Short/Too Many Ns  %Pairs    Reads/s   Time in Aligner (s)
14,569,942     11,759,373 (80.71%)    905,082 (6.21%)        362,059 (2.48%)        1,543,428 (10.59%)     74.94%    301,880   48  

Thanks for the quick fix! Eagerly awaiting the new release 😄

@bolosky
Copy link
Contributor

bolosky commented Feb 7, 2023

It's called "dev." Its the branch for staging changes that will go into master.

Anyway, the only difference between what's in dev and issue164.8 is the version number and one fixed typo, so you've effectively tried the latest version.

@matthdsm
Copy link
Author

matthdsm commented Feb 7, 2023

Ah my bad, I misread and was looking for a 2.0.3.dev.1 branch.

@bolosky
Copy link
Contributor

bolosky commented Apr 2, 2023

This is in 2.0.3

@bolosky bolosky closed this as completed Apr 2, 2023
@matthdsm
Copy link
Author

matthdsm commented Apr 3, 2023

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants