-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault for FastK, sometimes, when k isn't 40 #6
Comments
Hmm, I downloaded it and ran it as per your optioning and it ran fine.
I also
tried the different values of k and it still worked. Not what we wanted
given
I'd like to fix the problem :-) Must be hardware, OS, compiler specific.
So I guess the first thing is which machine, which OS, which compiler?
Things that might yield clues:
* Does it matter if it is a symlink to the data? (I doubt it, but
could you check).
* Does it run on any other data sets, i.e. is it just failing on this
specific data set?
On thing unusual about this one is that each scaffold is placed on a
single line,
implying very, very long lines. One quick experiment would be to cut
the lines
up and see if that makes a difference. The very long lines might be
catching a
bug in FastK's input buffering codes.
* Perhaps you could compile with -g option and run with a debugger and
send me
a trace -- might give me an idea of where to look given I can't
reproduce the segfault.
Thanks, Gene
…On 4/5/21, 11:19 PM, Bob Harris wrote:
synopsis: When running FastK with -k set 32 or less, I'm seeing segfaults.
Specifically, I've seen this with one particular input file (the blue
whale assembly) and k in {20,26,31,32}.
Details:
I was trying to count 31-mers in this VGP blue whale assembly:
https://s3.amazonaws.com/genomeark/species/Balaenoptera_musculus/mBalMus1/assembly_curated/mBalMus1.pri.cur.20200528.fasta.gz
My current directory had mBalMus1.pri.cur.20200528.fasta.gz as a
symlink to some other directory. Then I ran
|FastK -v -k31 mBalMus1.pri.cur.20200528.fasta.gz|
and got this output:
| Gzipped file mBalMus1.pri.cur.20200528.fasta.gz being temporarily uncompressed
Partitioning 1 .fasta file into 4 parts
Determining minimizer scheme & partition for mBalMus1.pri.cur.20200528
Estimate 2.375G 31-mers
Dividing data into 2 blocks
Using 5-minimizers with 1024 core prefixes
Phase 1: Partitioning K-mers into 8 Super-mer Files
There are 105 reads totalling 2,374,852,541 bps
Part: 31-mer super-mers ave. length
0: 1,151,752,026 74,926,939 15.4
1: 1,197,133,971 112,483,639 10.6
Sum: 2,348,885,997 187,410,578 12.5
Range 1,151,752,026 - 1,197,133,971 (3.86%)
Resources for phase: 1:27.610u 5.846s 1:01.816w 151.2%
Phase 2: Sorting & Counting K-mers in 2 blocks
Processing block 2: Sorting super-mers **Segmentation fault**
|
I'm using commit 7cebc7d
<7cebc7d>,
from a few hours ago.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUSINV6E6ZS6VLKLZYJHJ3THISPHANCNFSM42NPGOFQ>.
|
Yep, I was just about to try it on some different platforms. The one I was running on is some variant of linux. Compiler is no doubt gcc but might be an old version. I'll provide more specific details after I run on some other machines, and I'll try the things you mentioned. |
I have also run into this. Gene, do you have access to the sanger farm? If so, I can show you how to reproduce it there. directory /lustre/scratch118/malaria/team222/hh5/datasets/assembly/ilvanatal1 In this I'm trying to get profiles of just the het kmers. I created a fasta file from the output of Haplex and then made a ktab from that (had to remove 2 exits for short sequences). Now I am running the pacbio data through vs that het kmer ktab and get 0s except for positions where there is a het kmer and it is seg faulting on that. Kmer size is 31. |
Gol dang it. I am committed to fixing any problems. But I need to be able
to replicate them. FastK is 10,000 lines of fairly complex code and the bug
could be anywhere. First thing if possible is to give me a link to download
the data set and the exact call parameters and see if I can duplicate it
here.
Another thing to do since you are a savvy user is to compile with the -g
option
and send me a stack trace -- sometimes but not often that is enough to
detect the problem. I have run FastK with different k-mer sizes without a
problem (e.g. k=21 and 18 for Illumina data). So ... let's work it out.
-- Gene
…On 4/6/21, 9:37 PM, Haynes Heaton wrote:
I have also run into this. Am trying to different things to see what
the problem may be. Gene, do you have access to the sanger farm? If I
can show you how to reproduce it there.
directory
/lustre/scratch118/malaria/team222/hh5/datasets/assembly/ilvanatal1
binary happrof.sh
bsub command
bsub -o happrof.out -n8 -R"span[hosts=1] select[mem>28000]
rusage[mem=28000]" -M28000 ./happrof.sh
In this I'm trying to get profiles of just the het kmers. I created a
fasta file from the output of Haplex and then made a ktab from that
(had to remove 2 exits for short sequences). Now I am running the
pacbio data through vs that het kmer ktab and get 0s except for
positions where there is a het kmer and it is seg faulting on that.
Kmer size is 31.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUSINXNYJHNNFFNDRU5J5LTHNPGZANCNFSM42NPGOFQ>.
|
I'm still testing, but it looks like in my case the problem is that /tmp fills up. The solution, of course, is for me to use -P to redirect temp files to my own directory. I think the k setting is probably a red herring, at least for me. This explains why I didn't see the problem until I started using a whole genome, and why the failure occurs on some machines and not others. The machine it was failing on has only about 4G allocated to /tmp. |
Typically the biggest difference between my mac and most other
architectures is that the mac
zeros all memory before loading a program -- so sometimes undefined
variables are not
caught on the mac.
Try some of the ideas I sent, but please I'll feel bad if you spend a
lot of time trying things.
The best thing is for us to work together. I have an idea if nothing
else: we zoom, and you
share your screen and I debug with you as my partner. What time zone
are you in?
…-- Gene
On 4/6/21, 5:32 PM, Bob Harris wrote:
Yep, I was just about to try it on some different platforms. The one I
was running on is some variant of linux. Compiler is no doubt gcc but
might be an old version. I'll provide more specific details after I
run on some other machines, and I'll try the things you mentioned.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUSINXYE6LHW57LYVUE3ELTHMSQ7ANCNFSM42NPGOFQ>.
|
Super, I had just sent an e-mail about how we might debug but filling
temp would certainly be
a problem. My code should exit more gracefully than a crash through --
it should say it couldn't
create the file -- I'm going to review my code tomorrow morning and see
if I do that properly.
(Its already 10:30pm in German).
-- Gene
…On 4/6/21, 10:19 PM, Bob Harris wrote:
I'm still testing, but it looks like in my case the problem is that
/tmp fills up. The solution, of course, is for me to use -P to
redirect temp files to my own directory.
I think the k setting is probably a red herring, at least for me.
This explains why I didn't see the problem until I started using a
whole genome, and why the failure occurs on some machines and not
others. The machine it was failing on has only about 4G allocated to /tmp.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUSINSSCSARE4BS4SZH5UDTHNUD3ANCNFSM42NPGOFQ>.
|
Haynes, excuse my last whine about debugging this -- I see now you
suggested I get on the
farm to reproduce the problem. Alas I don't but I could try to get
said. One thing is could
you check /tmp or wherever you are putting the temp files (specificiable
with the -P option).
If it is not big enough, this will cause a crash as I do not check file
writes or reads, just file
opens.
-- Gene
…On 4/6/21, 9:37 PM, Haynes Heaton wrote:
I have also run into this. Am trying to different things to see what
the problem may be. Gene, do you have access to the sanger farm? If I
can show you how to reproduce it there.
directory
/lustre/scratch118/malaria/team222/hh5/datasets/assembly/ilvanatal1
binary happrof.sh
bsub command
bsub -o happrof.out -n8 -R"span[hosts=1] select[mem>28000]
rusage[mem=28000]" -M28000 ./happrof.sh
In this I'm trying to get profiles of just the het kmers. I created a
fasta file from the output of Haplex and then made a ktab from that
(had to remove 2 exits for short sequences). Now I am running the
pacbio data through vs that het kmer ktab and get 0s except for
positions where there is a het kmer and it is seg faulting on that.
Kmer size is 31.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUSINXNYJHNNFFNDRU5J5LTHNPGZANCNFSM42NPGOFQ>.
|
No problem at all. It is late there. I will run with a debugger and post here. I tried with -P temp file of my own and still crashes. We will figure it out. Don't feel the need to rush. Thanks for the help and the great tool. This is what you get for writing a good tool lol - users... Best, |
I ran a couple tests where I put /tmp on the hairy edge before running FastK. I saw two things. (I should mention that I was running with a file containing just the first 15 scaffolds of blue whale. Uncompressed this is 1.7G) (1) Unrelated to this thread, if there's not enough room to unzip the gzipped file, it looks like gzip reports "gzip: stdout: No space left on device" but FastK doesn't realize gzip failed. So FastK continues, processing a short version of the uncompressed genome. I didn't let this run to completion -- possibly this would end up producing a shortened result, and when running in a pipeline the failure wouldn't be noticed or immediately obvious. (2) Leaving enough room for a fully unzipped fasta, then while FastK was running I was repeatedly grabbed an ls-al /tmp. Before the segfault, it had written these files (plus the unzipped fasta)
It then removed the first four files (the mBalMus1.15scaffolds.0.* ones). Just before segfaulting, the final messages were
The four mBalMus1.15scaffolds.1.* files remained at that point. |
Backtrace isn't useful "No such file or directory" then just malloc stuff. There does seem to be a lot of thread churning. Like maybe 1 thread per read. Command is But I had a bit of progress. Seems to be a threading issue for me. I got a small example of just 20 reads on which it hits the seg fault but if I give it only 1 thread it completes. Minimal example attached with the fastq and the hets2.ktab file. With 19 reads it doesn't crash but with 20 reads it does crash. And there isn't anything weird about the 20th read. To make sure, I put the 20th read in a few file first and then 18 reads after it and that one finishes but if you put one more read on it fails again. Also going back to the full dataset with 1 thread it succeeds. And that only took 15min so maybe single threaded is fine for this purpose. Since it's this fast, this will not be blocking for me. And then just as follow up, it works as intended for finding the locations of het kmers. |
Hmmm, happy to take a look and see if it duplicates here if you want to
send it on. -- Gene
…On 4/7/21, 12:43 AM, Haynes Heaton wrote:
Seems to be a threading issue for me. I got a small example of just 20
reads on which it hits the seq fault but if I give it only 1 thread it
completes. Will upload tiny example in a bit.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUSINQHDNC2HHIIKHK2AHDTHOFCVANCNFSM42NPGOFQ>.
|
Yes, this is all consistent with running out of disk space in /tmp.
For now the fix is simply to make sure your target directory is big enough.
Longer term, I will "bite the bullet" and check all my file writes --
what's happening is
that I am writing to the target file(s) but not checking if those writes
are OK. When the
disk fills up the writes fail, but the program carries on and so
crashes. The fix is to
check every write and if one is bad, write a friendly "Your out of disk"
message and
exit.
Best, Gene
…On 4/6/21, 11:28 PM, Bob Harris wrote:
I ran a couple tests where I put /tmp on the hairy edge before running
FastK. I saw two things.
(I should mention that I was running with a file containing just the
first 15 scaffolds of blue whale. Uncompressed this is 1.7G)
(1) Unrelated to this thread, if there's not enough room to unzip the
gzipped file, it looks like gzip reports "gzip: stdout: No space left
on device" but FastK doesn't realize gzip failed. So FastK continues,
processing a short version of the uncompressed genome. I didn't let
this run to completion -- possibly this would end up producing a
shortened result, and when running in a pipeline the failure wouldn't
be noticed or immediately obvious.
(2) Leaving enough room for a fully unzipped fasta, then while FastK
was running I was repeatedly grabbed an ls-al /tmp. Before the
segfault, it had written these files (plus the unzipped fasta)
|-rwx------ 1 rsharris rsharris 36839424 Apr 6 17:08 mBalMus1.15scaffolds.0.T0
-rwx------ 1 rsharris rsharris 38023168 Apr 6 17:08 mBalMus1.15scaffolds.0.T1
-rwx------ 1 rsharris rsharris 39821312 Apr 6 17:08 mBalMus1.15scaffolds.0.T2
-rwx------ 1 rsharris rsharris 40181760 Apr 6 17:08 mBalMus1.15scaffolds.0.T3
-rwx------ 1 rsharris rsharris 54337536 Apr 6 17:08 mBalMus1.15scaffolds.1.T0
-rwx------ 1 rsharris rsharris 46800896 Apr 6 17:08 mBalMus1.15scaffolds.1.T1
-rwx------ 1 rsharris rsharris 49356800 Apr 6 17:08 mBalMus1.15scaffolds.1.T2
-rwx------ 1 rsharris rsharris 53616640 Apr 6 17:08 mBalMus1.15scaffolds.1.T3
|
It then removed the first four files (the mBalMus1.15scaffolds.0.*
ones). Just before segfaulting, the final messages were
|Phase 2: Sorting & Counting K-mers in 2 blocks
Processing block 1: Sorting weighted k-mersSegmentation fault
|
The four mBalMus1.15scaffolds.1.* files remained at that point.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUSINU7DWECTDOVP6XZ5J3THN4I5ANCNFSM42NPGOFQ>.
|
I included example data for my case. It's not a big deal because it runs fine single threaded, but probably something to fix eventually. Best, |
Haynes,
Sorry to get back to you so late on this. I cannot test your example
because you only including the
stub file "hets2.ktab" in your zip. A FastK table consists of several
files, a visible .ktab stub file, and then
several hidden files (of the form .hets.ktab.1, .hets.ktab.2 ...)
depending on the # of threads used in
its creationg. Could you rebundle the seg-fault example with these
files included?
Thanks, Gene
…On 4/7/21, 3:37 PM, Haynes Heaton wrote:
I included example data for my case. It's not a big deal because it
runs fine single threaded, but probably something to fix eventually.
Best,
Haynes
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUSINQRCCEXSFB26NB6VQDTHRNZZANCNFSM42NPGOFQ>.
|
Haynes,
A reminder to send me your crash data set again, this time with
those pesky hidden file :-)
…-- Gene
On 4/7/21, 3:37 PM, Haynes Heaton wrote:
I included example data for my case. It's not a big deal because it
runs fine single threaded, but probably something to fix eventually.
Best,
Haynes
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUSINQRCCEXSFB26NB6VQDTHRNZZANCNFSM42NPGOFQ>.
|
I grabbed the new example and ran "FastK -M20 -k31 -p:hets2 -t1 -T8 test.fq". |
Could you run with -v so there is at least some idea what phase it was in when |
synopsis: When running FastK with -k set 32 or less, I'm seeing segfaults.
Specifically, I've seen this with one particular input file (the blue whale assembly) and k in {20,26,31,32}.
Details:
I was trying to count 31-mers in this VGP blue whale assembly:
https://s3.amazonaws.com/genomeark/species/Balaenoptera_musculus/mBalMus1/assembly_curated/mBalMus1.pri.cur.20200528.fasta.gz
My current directory had mBalMus1.pri.cur.20200528.fasta.gz as a symlink to some other directory. Then I ran
FastK -v -k31 mBalMus1.pri.cur.20200528.fasta.gz
and got this output:
I'm using commit 7cebc7d, from a few hours ago.
The text was updated successfully, but these errors were encountered: