-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with BQSR maximum cycle value #129
Comments
Hi Izram,
I think we might just need to disable bqsr by default when the long reads
option is used. That parameter of 20kb was meant to be the longest read to
examine, but it looks like you've got reads longer than that. Leave this
open, though, and I'll research it a bit more before the next software
release.
best,
Brian
…On Wed, Nov 15, 2023 at 8:45 PM Irzam Sarfraz ***@***.***> wrote:
Hi,
Thanks for creating this useful pipeline!
I am running the pipeline using a singularity container on some long read
sequencing data. It seems to work well with smaller input fastq files but
as soon as the files tend to become larger, the workflow abruptly ends.
Below is the ending portion of the output:
01:22:26.587 INFO PrintReads -
------------------------------------------------------------
01:22:26.588 INFO PrintReads - The Genome Analysis Toolkit (GATK) v4.1.9.0
01:22:26.588 INFO PrintReads - For support and documentation go to
https://software.broadinstitute.org/gatk/
01:22:26.588 INFO PrintReads - Executing as ***@***.*** on Linux
v4.18.0-477.15.1.el8_8.x86_64 amd64
01:22:26.588 INFO PrintReads - Java runtime: OpenJDK 64-Bit Server VM
v11.0.20.1+1-post-Ubuntu-0ubuntu120.04
01:22:26.588 INFO PrintReads - Start Date/Time: November 16, 2023 at
1:22:26 AM GMT
01:22:26.588 INFO PrintReads -
------------------------------------------------------------
01:22:26.588 INFO PrintReads -
------------------------------------------------------------
01:22:26.588 INFO PrintReads - HTSJDK Version: 2.23.0
01:22:26.589 INFO PrintReads - Picard Version: 2.23.3
01:22:26.589 INFO PrintReads - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:22:26.589 INFO PrintReads - HTSJDK
Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:22:26.589 INFO PrintReads - HTSJDK
Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:22:26.589 INFO PrintReads - HTSJDK
Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:22:26.589 INFO PrintReads - Deflater: IntelDeflater
01:22:26.589 INFO PrintReads - Inflater: IntelInflater
01:22:26.589 INFO PrintReads - GCS max retries/reopens: 20
01:22:26.589 INFO PrintReads - Requester pays: disabled
01:22:26.589 INFO PrintReads - Initializing engine
01:22:26.671 INFO PrintReads - Done initializing engine
01:22:26.685 INFO ProgressMeter - Starting traversal
*01:22:26.685 INFO ProgressMeter - Current Locu*
*[2023-11-16 01:22:34,58] [info] WorkflowManagerActor
WorkflowActor-1318276e-f6f8-460f-bd69-6fd7a1ef3d3b is in a terminal state:
WorkflowFailedState*
[2023-11-16 01:23:01,22] [info] SingleWorkflowRunnerActor workflow
finished with status 'Failed'.
[2023-11-16 01:23:06,05] [info] SingleWorkflowRunnerActor writing metadata
to /tmp/tmple2zpd52.json
[2023-11-16 01:23:06,08] [info] Workflow polling stopped
[2023-11-16 01:23:06,09] [info] 0 workflows released by cromid-0f089e9
[2023-11-16 01:23:06,09] [info] Shutting down WorkflowStoreActor - Timeout
= 5 seconds
[2023-11-16 01:23:06,10] [info] Shutting down WorkflowLogCopyRouter -
Timeout = 5 seconds
[2023-11-16 01:23:06,10] [info] Shutting down JobExecutionTokenDispenser -
Timeout = 5 seconds
[2023-11-16 01:23:06,10] [info] Aborting all running workflows.
[2023-11-16 01:23:06,10] [info] JobExecutionTokenDispenser stopped
[2023-11-16 01:23:06,10] [info] WorkflowStoreActor stopped
[2023-11-16 01:23:06,11] [info] WorkflowLogCopyRouter stopped
[2023-11-16 01:23:06,11] [info] Shutting down WorkflowManagerActor -
Timeout = 3600 seconds
[2023-11-16 01:23:06,11] [info] WorkflowManagerActor All workflows finished
[2023-11-16 01:23:06,11] [info] WorkflowManagerActor stopped
[2023-11-16 01:23:06,27] [info] Connection pools shut down
[2023-11-16 01:23:06,28] [info] Shutting down SubWorkflowStoreActor -
Timeout = 1800 seconds
[2023-11-16 01:23:06,28] [info] Shutting down JobStoreActor - Timeout =
1800 seconds
[2023-11-16 01:23:06,28] [info] Shutting down CallCacheWriteActor -
Timeout = 1800 seconds
[2023-11-16 01:23:06,28] [info] Shutting down ServiceRegistryActor -
Timeout = 1800 seconds
[2023-11-16 01:23:06,28] [info] Shutting down DockerHashActor - Timeout =
1800 seconds
[2023-11-16 01:23:06,28] [info] Shutting down IoProxy - Timeout = 1800
seconds
[2023-11-16 01:23:06,28] [info] SubWorkflowStoreActor stopped
[2023-11-16 01:23:06,28] [info] CallCacheWriteActor Shutting down: 0
queued messages to process
[2023-11-16 01:23:06,28] [info] JobStoreActor stopped
[2023-11-16 01:23:06,28] [info] WriteMetadataActor Shutting down: 0 queued
messages to process
[2023-11-16 01:23:06,28] [info] KvWriteActor Shutting down: 0 queued
messages to process
[2023-11-16 01:23:06,28] [info] CallCacheWriteActor stopped
[2023-11-16 01:23:06,28] [info] IoProxy stopped
[2023-11-16 01:23:06,28] [info] ServiceRegistryActor stopped
[2023-11-16 01:23:06,29] [info] DockerHashActor stopped
[2023-11-16 01:23:06,30] [info] Database closed
[2023-11-16 01:23:06,30] [info] Stream materializer shut down
[2023-11-16 01:23:06,31] [info] WDL HTTP import resolver closed
Workflow 1318276e-f6f8-460f-bd69-6fd7a1ef3d3b transitioned to state Failed
Investigating a little further, I found that it is the BQSR call that
causes the workflow to fail. Looking at the stderr of this call, it seems
that as the input data becomes larger, the default value of 20000 for the
argument --maximum-cycle-value is not enough. Please see below, the output
from the stderr:
01:22:29.268 INFO ApplyBQSR -
------------------------------------------------------------
01:22:29.269 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.1.9.0
01:22:29.269 INFO ApplyBQSR - For support and documentation go to
https://software.broadinstitute.org/gatk/
01:22:29.269 INFO ApplyBQSR - Executing as ***@***.*** on Linux
v4.18.0-477.15.1.el8_8.x86_64 amd64
01:22:29.269 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM
v11.0.20.1+1-post-Ubuntu-0ubuntu120.04
01:22:29.269 INFO ApplyBQSR - Start Date/Time: November 16, 2023 at
1:22:29 AM GMT
01:22:29.269 INFO ApplyBQSR -
------------------------------------------------------------
01:22:29.269 INFO ApplyBQSR -
------------------------------------------------------------
01:22:29.270 INFO ApplyBQSR - HTSJDK Version: 2.23.0
01:22:29.270 INFO ApplyBQSR - Picard Version: 2.23.3
01:22:29.270 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:22:29.270 INFO ApplyBQSR - HTSJDK
Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:22:29.270 INFO ApplyBQSR - HTSJDK
Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:22:29.270 INFO ApplyBQSR - HTSJDK
Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:22:29.270 INFO ApplyBQSR - Deflater: IntelDeflater
01:22:29.270 INFO ApplyBQSR - Inflater: IntelInflater
01:22:29.270 INFO ApplyBQSR - GCS max retries/reopens: 20
01:22:29.270 INFO ApplyBQSR - Requester pays: disabled
01:22:29.270 INFO ApplyBQSR - Initializing engine
01:22:29.382 INFO ApplyBQSR - Done initializing engine
01:22:29.395 INFO ProgressMeter - Starting traversal
01:22:29.396 INFO ProgressMeter - Current Locus Elapsed Minutes Reads
Processed Reads/Minute
01:22:31.784 INFO ApplyBQSR - Shutting down engine
[November 16, 2023 at 1:22:31 AM GMT]
org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed
time: 0.04 minutes.
Runtime.totalMemory()=2147483648
------------------------------
*A USER ERROR has occurred: The maximum allowed value for the cycle is
20000, but a larger cycle (20001) was detected. Please use the
--maximum-cycle-value argument (when creating the recalibration table in
BaseRecalibrator) to increase this value (at the expense of requiring more
memory to run)*
However, I can't really find a way to use this argument to manually
specify a larger value. Using the --no_bqsr argument with the singularity
image execution does seem to eliminate this issue and produces the output.
I am not fully sure if excluding BQSR would have significant effect on the
output produced. Also, using smaller fastq files does not seem to produce
this issue.
I am hoping if there is a potential hack/workaround to manually specify
this param while using the singularity image.
I use the following code for execution:
singularity exec -e -B ${DATA_FOLDER}:${DATA_FOLDER} -B /tmp:/tmp \
-B genome_path/ctat_genome_lib_build_dir:/ctat_genome_lib_dir:ro \
ctat_mutations_latest.sif \
/usr/local/src/ctat-mutations/ctat_mutations \
--left ${DATA_FOLDER}/some_reads.fastq.gz \
--is_long_reads \
--sample_id test \
--output ${DATA_FOLDER}/example.HC.singularity \
--cpu 10 \
--genome_lib_dir /ctat_genome_lib_dir
Thanks for looking into this!
Best,
Irzam
—
Reply to this email directly, view it on GitHub
<#129>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX7CZ33I75PEHTQOP3TYEVV2BAVCNFSM6AAAAAA7NMNGRSVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4TKOBXHE3TCOI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
Thanks for creating this useful pipeline!
I am running the pipeline using a singularity container on some long read sequencing data. It seems to work well with smaller input fastq files but as soon as the files tend to become larger, the workflow abruptly ends.
Below is the ending portion of the output:
Investigating a little further, I found that it is the BQSR call that causes the workflow to fail. Looking at the stderr of this call, it seems that as the input data becomes larger, the default value of 20000 for the argument --maximum-cycle-value is not enough. Please see below, the output from the stderr:
However, I can't really find a way to use this argument to manually specify a larger value. Using the --no_bqsr argument with the singularity image execution does seem to eliminate this issue and produces the output. I am not fully sure if excluding BQSR would have significant effect on the output produced. Also, using smaller fastq files does not seem to produce this issue.
I am hoping if there is a potential hack/workaround to manually specify this param while using the singularity image.
I use the following code for execution:
Thanks for looking into this!
Best,
Irzam
The text was updated successfully, but these errors were encountered: