Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gridss call Exception #629

Closed
vitty0513 opened this issue Jun 7, 2023 · 2 comments
Closed

gridss call Exception #629

vitty0513 opened this issue Jun 7, 2023 · 2 comments

Comments

@vitty0513
Copy link

Dear,
I met an Exception when I run:
gridss -s call --reference $ref -a assembly.bam --output ACB_gridss --jvmheap 60g --otherjvmheap 60g
--jar /home/banliping/WORKSPACE/dwt/software/gridss-2.13.2-gridss-jar-with-dependencies.jar
$dir/AH14_MarkDup.bam
$dir/AH25_MarkDup.bam
......

And the log :
INFO 2023-06-07 09:33:43 VcfTransformCommandLineProgram Annotated variants written to ./ACB_gridss.gridss.working/ACB_gridss.allocated.vcf
[Wed Jun 07 09:33:43 CST 2023] gridss.AnnotateVariants done. Elapsed time: 3,841.94 minutes.
Runtime.totalMemory()=60362326016
Wed Jun 7 09:33:52 CST 2023: Running AnnotateInsertedSequence ACB_gridss
INFO 2023-06-07 09:33:52 Defaults Found file for property samjdk.reference_fasta: /home/banliping/WORKSPACE/dwt/ACB/ref/Lachesis_assembly_changed.fa
INFO 2023-06-07 09:33:52 AnnotateInsertedSequence

********** NOTE: Picard's command line syntax is changing.


********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


********** The command line looks like this in the new syntax:


********** AnnotateInsertedSequence -TMP_DIR . -WORKING_DIR . -REFERENCE_SEQUENCE /home/banliping/WORKSPACE/dwt/ACB/ref/Lachesis_assembly_changed.fa -WORKER_THREADS 8 -INPUT ./ACB_gridss.gridss.working/ACB_gridss.allocated.vcf -OUTPUT ACB_gridss


09:33:53.323 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/storage1/wukeliang/dwt/software/gridss-2.13.2-gridss-jar-with-dependencies.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Jun 07 09:33:53 CST 2023] AnnotateInsertedSequence WORKER_THREADS=8 INPUT=./ACB_gridss.gridss.working/ACB_gridss.allocated.vcf OUTPUT=ACB_gridss WORKING_DIR=. TMP_DIR=[.] REFERENCE_SEQUENCE=/home/banliping/WORKSPACE/dwt/ACB/ref/Lachesis_assembly_changed.fa MIN_SEQUENCE_LENGTH=20 ALIGNER_COMMAND_LINE=[bwa, mem, -K, 10000000, -L, 0,0, -t, %3$d, %2$s, %1$s] ALIGNER_BATCH_SIZE=500000 ALIGNMENT=REPLACE IGNORE_DUPLICATES=true VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Jun 07 09:33:53 CST 2023] Executing as wukeliang@node-01 on Linux 3.10.0-693.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_212-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.13.2-gridss
INFO 2023-06-07 09:33:53 AnnotateInsertedSequence Annotating inserted sequences in ./ACB_gridss.gridss.working/ACB_gridss.allocated.vcf
INFO 2023-06-07 09:33:53 AnnotateInsertedSequence Using external process alignment
INFO 2023-06-07 09:33:57 ExternalProcessStreamingAligner Starting external aligner
INFO 2023-06-07 09:33:57 ExternalProcessStreamingAligner bwa mem -K 10000000 -L 0,0 -t 8 /home/banliping/WORKSPACE/dwt/ACB/ref/Lachesis_assembly_changed.fa -
[Wed Jun 07 09:33:57 CST 2023] gridss.AnnotateInsertedSequence done. Elapsed time: 0.07 minutes.
Runtime.totalMemory()=2058354688
Exception in thread "main" java.lang.IllegalArgumentException: Output format type is not set, or could not be inferred from the output path. If a path was used, does it have a valid VCF extension (.vcf, .vcf.gz, .bcf)?
at htsjdk.variant.variantcontext.writer.VariantContextWriterBuilder.build(VariantContextWriterBuilder.java:462)
at htsjdk.variant.variantcontext.writer.VariantContextWriterBuilder.build(VariantContextWriterBuilder.java:415)
at gridss.AnnotateInsertedSequence.saveVcf(AnnotateInsertedSequence.java:168)
at gridss.AnnotateInsertedSequence.doWork(AnnotateInsertedSequence.java:137)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at gridss.AnnotateInsertedSequence.main(AnnotateInsertedSequence.java:67)
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 82648 sequences (10000198 bp)...
[M::process] read 72218 sequences (10000240 bp)...
[M::mem_process_seqs] Processed 82648 reads in 139.468 CPU sec, 99.266 real sec
[M::mem_process_seqs] Processed 72218 reads in 135.928 CPU sec, 68.703 real sec

Now I only get a ACB_gridss.allocated.vcf and its idx file. What can I do to solve it? It means I should change the -output ACB_gridss to ACB_gridss.vcf? And how to keep going down from here, from the ACB_gridss.allocated.vcf, because it takes me a lot of time to gain the allocated.vcf and I don't want to start again from begining

Many thanks!

@d-cameron
Copy link
Member

OUTPUT=ACB_gridss
Output format type is not set, or could not be inferred from the output path. If a path was used, does it have a valid VCF extension (.vcf, .vcf.gz, .bcf)?
It means I should change the -output ACB_gridss to ACB_gridss.vcf?

Yes, change to a .vcf or .vcf.gz suffix. The error is because GRIDSS doesn't know if you want a .vcf a .vcf.gz or a .bcf file as your output file format.

And how to keep going down from here, from the ACB_gridss.allocated.vcf, because it takes me a lot of time to gain the allocated.vcf and I don't want to start again from begining

Just change OUTPUT=ACB_gridss to OUTPUT=ACB_gridss.vcf and rerun. As long as you don't edit/delete anything in the .gridss.working directories, GRIDSS will recognise the earlier steps have completed and only run the file step(s). In fact, unless you've specified --keepTempFiles on the command line, GRIDSS will automatically delete the .allocated.vcf file when the final VCF is generated.

@vitty0513
Copy link
Author

Thank you! And I have changed my --output to ACB_gridss.vcf, however, it created a new directory named ACB_gridss.vcf.gridss.working and it started from beginning. Maybe I should change the old directory (ACB_gridss.gridss.working) to ACB_gridss.vcf.gridss.working and then rerun? But I have a question, the old directory has only ACB_gridss.allocated.vcf file, no other files, is it because I didn't use --keepTempFiles? Does it mean I can't run from xxx.allocated.vcf?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants