-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No error generated but duphold output missing last few lines. #25
Comments
did those exit with 0? do you see: "[duphold] finished" in the stderr of those jobs? make sure you're using the latest duphold. I don't recall any changes relevant here, but it's always good to have the most recent. |
I am getting a [duphold] finished at the end of the run. I gave it plenty of time. The most recent version (version: 0.1.4) is being used. Below is the job output. The tabix command runs after the duphold program finishes which gets the error seen. This are the two command lines I am using:
The output is:
duphold works fine for the 4600 other crams. |
is the error intermittent? or does it occur every time for these same cram files? |
I had rerun them and had the same issue. I will try increasing the memory available on the nodes and see it that helps. |
I doubt it's the memory, but I can't think of anything else. can you share the vcf+bam for one of the failing samples? |
and you are running all samples with |
Yes, the same script is being used with the same vcf containing all the variants in the 5k samples. I can not share these files but I will see if I can create a smaller test cram and vcf that recreates the issue. I reran one on a node with plenty of memory writing it directly to a vcf. Still an error but not exactly at the same spot but close. So as you suggested not a memory issue. tail output/test,vcf|awk '{print NF "\t" $0}'
|
instead of piping to bgzip, can you instead just use |
That fixed it on the test sample! Must be some issue with closing the pipe. Thanks for pin pointing the issue so quickly. I check a few of the samples that were not getting an error with tabix. Not all were getting the last few variants. They probably just happened to end with an EOL. So I will re-run it on all the samples. tabix output/ADNI_002_S_4229.vcf.gz chr22:50799200|tail|awk '{print NF "\t" $0}'
|
I still don't know why/how this happened, but, if you weren't already, always run any script with pipes with the |
There is no error triggered with that option. I took a look at some of the code. I think it can be traced to this code. https://github.com/brentp/hts-nim/blob/master/src/hts/vcf.nim
Probably just needs a flushFile(stdout) if v.fname='-'. . |
yes. that's it. nicely spotted! I think it should just |
here's a binary with that fixed. |
this is fixed in latest release. thanks for reporting and testing. |
I see a similar/same issue with release 0.2.1; changing from piping to bgzip to using |
I don't see a way to fix this as it is explicitly closing the file and flushing in case of stdout. I did try adding an additional flush if you want to try the attached binary. if this doesn't work, i'll probably add a check to ensure that stdout is not used. |
Thanks for the reply; I'm seeing the same problem with the binary you posted. What happens is that in the pipe case, the last VCF record is cut off. Original and dev binary produce the same VCF in the pipe case and the Last record with
Piped to bgzip:
Piping into bgzip in the first place was a documentation issue for me; I didn't realize duphold could compress files directly before finding this PR, so perhaps this is what could be improved instead. |
I am happy to accept a PR to improve docs. |
I am running duphold on about 5000 crams. About 120 or so are finishing but the output vcf is not complete. It ends prematurely near the end of chromosome 22. There are only a few lines left to process of the imput vcf.
The last full vcf line output varies but all near the end.
1 50796757
24 50797069
24 50797489
7 50797585
7 50797605
4 50797606
11 50798225
14 50798764
25 50798783
20 50799200
here is an example of the end of one file...
Any suggestions?
The text was updated successfully, but these errors were encountered: