-
Notifications
You must be signed in to change notification settings - Fork 20
Added germline CNV WDL workflows #1076
base: master
Are you sure you want to change the base?
Conversation
@mbabadi Finally done! Could you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this @asmirnov239! looks good. My comments are stylistic for the most part.
@@ -0,0 +1,3 @@ | |||
/home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam /home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam.bai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeeTL1220 is it OK to use potentially identifiable real data for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the HCC1143 normal, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw the reply below, ignore!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, are you sure about it being NA12878, @LeeTL1220?
ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl | ||
|
||
# Panel WES | ||
java -jar ~/cromwell-0.26.jar run /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_panel_creation_workflow.wdl gCNV_panel_creation_workflow_wes.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract cromwell-0.26.jar
as an environment variable CROMWELL_JAR
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/cnv_wdl/germline/README.md
Outdated
#### Fields of germline CNV panel of normals creation workflow | ||
|
||
``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes | ||
``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can be a bit more verbose here: contig_annotations
-> contig_ploidy_annotations
. In the description, you may want to say "germline contig ploidy annotations table".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/cnv_wdl/germline/README.md
Outdated
|
||
``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes | ||
``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory | ||
``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path to transition priors table
-> path to copy number transition priors table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/cnv_wdl/germline/README.md
Outdated
``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes | ||
``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory | ||
``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory | ||
``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong description: change to path to copy number transition prior for Y contig for XY-genotyped samples
. Same for the other 3 fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# | ||
# - Example invocation: | ||
# java -jar cromwell.jar run gCNV_cohort_calling_workflow.wdl myParameters.json | ||
# See gCNV_cohort_calling_workflow.json for a template json file to modify with your own parameters (please save |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recommend taking gCNV_cohort_calling_workflow.json as a template json file and modifying it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Workflow for creating a panel of normals for germline CNV pipeline | ||
# Notes: | ||
# | ||
# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as those on gCNV_cohort_calling_workflow.wdl
. Also, please review the formatting (line breaks before and after example tables, wording, etc.) all other.
@@ -0,0 +1,159 @@ | |||
# Subworkflow for running GATK germline CNV on a single BAM. Supports both WGS and WES samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as gCNV_cohort_calling_workflow.wdl
.
|
||
# Transition prior table files | ||
File transition_prior_table | ||
File? transition_matrix_XX_Y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put these in an Array[File]? accompanying_transition_prior_files
? does cromwell move around all files in the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -1311,7 +1312,7 @@ public Duplicable apply(final Map<String, Duplicable> parents) { | |||
final INDArray zz_ll = zz_sll.get(NDArrayIndex.point(si), NDArrayIndex.all(), NDArrayIndex.all()); | |||
/* mean_W_contrib_t = \sum_{m,n} E[W_{tm}] E[W_{tn}] E[z_{sm} z_{sn}] */ | |||
final INDArray mean_W_contrib_t = W_tl.mmul(zz_ll).muli(W_tl).sum(1).transpose(); | |||
WzzWT_st.get(NDArrayIndex.point(si), NDArrayIndex.all()).assign(mean_W_contrib_t); | |||
Nd4jUtils.getNDArrayByIndices(WzzWT_st, NDArrayIndex.point(si), NDArrayIndex.all(), numSamples).assign(mean_W_contrib_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get away with using WzzWT_st.getRow(si)
in place of WzzWT_st.get(NDArrayIndex.point(si), NDArrayIndex.all())
and it should solve the problem without needing to introduce this special method. I just looked at the implementation and it seems to treat vector NDArrays properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
SM-74NEG is a public domain sample. It's NA12878.
…On May 22, 2017 7:15 PM, "Mehrtash Babadi" ***@***.***> wrote:
***@***.**** requested changes on this pull request.
Thanks for doing this @asmirnov239 <https://github.com/asmirnov239>!
looks good. My comments are stylistic for the most part.
------------------------------
In scripts/cnv_cromwell_tests/germline/normal_bam_list.tsv
<#1076 (comment)>
:
> @@ -0,0 +1,3 @@
+/home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam /home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam.bai
@LeeTL1220 <https://github.com/leetl1220> is it OK to use potentially
identifiable real data for testing?
------------------------------
In scripts/cnv_cromwell_tests/germline/run_cnv_germline_workflows.sh
<#1076 (comment)>
:
> @@ -0,0 +1,25 @@
+#!/bin/bash -l
+set -e
+#cd in the directory of the script in order to use relative paths
+script_path=$( cd "$(dirname "${BASH_SOURCE}")" ; pwd -P )
+cd "$script_path"
+
+ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/cnv_common_tasks.wdl
+ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_panel_creation_workflow.wdl
+ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_single_sample_calling_workflow.wdl
+ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl
+
+# Panel WES
+java -jar ~/cromwell-0.26.jar run /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_panel_creation_workflow.wdl gCNV_panel_creation_workflow_wes.json
Extract cromwell-0.26.jar as an environment variable CROMWELL_JAR.
------------------------------
In scripts/cnv_wdl/germline/README.md
<#1076 (comment)>
:
> +### Which WDL should you use?
+- Building a panel of normals (PoN): ``gCNV_panel_creation_workflow.wdl``
+- Calling events on a single normal sample: ``gCNV_single_sample_calling_workflow.wdl``
+- Calling events on a cohort of normal samples: ``gCNV_cohort_calling_workflow.wdl``
+
+#### Setting up parameter json file for a run
+
+To get started, copy the relevant ``*_template.json`` for the workflow you wish to run and adjust parameters accordingly.
+You can find all required resource inputs needed to run the workflows in the ``/resources`` directory. These inputs could be run out-of-the-box.
+
+*Please note that there are task-level parameters that do not appear in the template files. These are set to reasonable values by default, but can also be adjusted if desired.
+
+#### Fields of germline CNV panel of normals creation workflow
+
+ ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+ ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
Perhaps we can be a bit more verbose here: contig_annotations ->
contig_ploidy_annotations. In the description, you may want to say
"germline contig ploidy annotations table".
------------------------------
In scripts/cnv_wdl/germline/README.md
<#1076 (comment)>
:
> +- Building a panel of normals (PoN): ``gCNV_panel_creation_workflow.wdl``
+- Calling events on a single normal sample: ``gCNV_single_sample_calling_workflow.wdl``
+- Calling events on a cohort of normal samples: ``gCNV_cohort_calling_workflow.wdl``
+
+#### Setting up parameter json file for a run
+
+To get started, copy the relevant ``*_template.json`` for the workflow you wish to run and adjust parameters accordingly.
+You can find all required resource inputs needed to run the workflows in the ``/resources`` directory. These inputs could be run out-of-the-box.
+
+*Please note that there are task-level parameters that do not appear in the template files. These are set to reasonable values by default, but can also be adjusted if desired.
+
+#### Fields of germline CNV panel of normals creation workflow
+
+ ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+ ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory
path to transition priors table -> path to copy number transition priors
table
------------------------------
In scripts/cnv_wdl/germline/README.md
<#1076 (comment)>
:
> +- Calling events on a single normal sample: ``gCNV_single_sample_calling_workflow.wdl``
+- Calling events on a cohort of normal samples: ``gCNV_cohort_calling_workflow.wdl``
+
+#### Setting up parameter json file for a run
+
+To get started, copy the relevant ``*_template.json`` for the workflow you wish to run and adjust parameters accordingly.
+You can find all required resource inputs needed to run the workflows in the ``/resources`` directory. These inputs could be run out-of-the-box.
+
+*Please note that there are task-level parameters that do not appear in the template files. These are set to reasonable values by default, but can also be adjusted if desired.
+
+#### Fields of germline CNV panel of normals creation workflow
+
+ ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+ ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory
Wrong description: change to path to copy number transition prior for Y
contig for XY-genotyped samples. Same for the other 3 fields.
------------------------------
In scripts/cnv_wdl/germline/README.md
<#1076 (comment)>
:
> +
+To get started, copy the relevant ``*_template.json`` for the workflow you wish to run and adjust parameters accordingly.
+You can find all required resource inputs needed to run the workflows in the ``/resources`` directory. These inputs could be run out-of-the-box.
+
+*Please note that there are task-level parameters that do not appear in the template files. These are set to reasonable values by default, but can also be adjusted if desired.
+
+#### Fields of germline CNV panel of normals creation workflow
+
+ ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+ ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory,
between two autosomal chr -> on autosomal loci.
------------------------------
In scripts/cnv_wdl/germline/README.md
<#1076 (comment)>
:
> +
+*Please note that there are task-level parameters that do not appear in the template files. These are set to reasonable values by default, but can also be adjusted if desired.
+
+#### Fields of germline CNV panel of normals creation workflow
+
+ ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+ ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory,
+ ``CNVGermlinePanelWorkflow.normal_bams_list`` -- TSV file consisting of corresponding bam and corresponding index files as described in gCNV_panel_creation_workflow.wdl
+ ``CNVGermlinePanelWorkflow.pon_output_path`` -- name of the final output directory
+ ``CNVGermlinePanelWorkflow.num_latents`` -- (advanced) number of principal components
update doc: (advanced) maximum number of principal components. Must be
strictly less than the number of samples. The recommended value is 20 ~ 30
for large cohorts. For smaller cohorts, use 0.5 * number of samples.
Unnecessary principal components are automatically pruned during PoN
creation.
------------------------------
In scripts/cnv_wdl/germline/README.md
<#1076 (comment)>
:
> +
+
+#### Fields of germline CNV single sample calling workflow
+
+The reference used must be the same between PoN and case samples.
+
+ ``gCNVSingleSampleWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+ ``gCNVSingleSampleWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
+ ``gCNVSingleSampleWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory
+ ``gCNVSingleSampleWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory
+ ``gCNVSingleSampleWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory
+ ``gCNVSingleSampleWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory
+ ``gCNVSingleSampleWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory
+ ``gCNVSingleSampleWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory,
+ ``gCNVSingleSampleWorkflow.output_path`` -- name of the final output directory
+ ``gCNVSingleSampleWorkflow.num_latents`` -- (advanced) number of principal components
It is fine to have num_latents in calling WDL for the time being. I made
an issue for enforcing the value dictated by the PoN which is a more
reasonable behavior.
------------------------------
In scripts/cnv_wdl/germline/README.md
<#1076 (comment)>
:
> + ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+ ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory,
+ ``CNVGermlinePanelWorkflow.normal_bams_list`` -- TSV file consisting of corresponding bam and corresponding index files as described in gCNV_panel_creation_workflow.wdl
+ ``CNVGermlinePanelWorkflow.pon_output_path`` -- name of the final output directory
+ ``CNVGermlinePanelWorkflow.num_latents`` -- (advanced) number of principal components
+ ``CNVGermlinePanelWorkflow.ref_fasta`` -- path to reference fasta file
+ ``CNVGermlinePanelWorkflow.ref_fasta_dict`` -- path to reference dict file
+ ``CNVGermlinePanelWorkflow.ref_fasta_fai`` -- path to reference fasta fai file
+ ``CNVGermlinePanelWorkflow.gatk_jar`` -- absolute path to gatk-protected.jar
+ ``CNVGermlinePanelWorkflow.targets`` -- (optional) Target file (NOT in bed format) that was used to describe the baits in capture (exome) samples. Please run ``ConvertBedToTargetFile`` to convert a BED file to a target file. If provided, then WES workflow will be run; otherwise, WGS workflow will be run.
Target file (NOT in BED format) corresponding to the genomic loci of
enriched targets in WES sample (e.g. Agilent, Illumina, etc). Please run
ConvertBedToTargetFile to convert a BED file to a target file. If
provided, then WES workflow will be run; otherwise, WGS workflow will be
run.
------------------------------
In scripts/cnv_wdl/germline/README.md
<#1076 (comment)>
:
> + ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+ ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory
+ ``CNVGermlinePanelWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory,
+ ``CNVGermlinePanelWorkflow.normal_bams_list`` -- TSV file consisting of corresponding bam and corresponding index files as described in gCNV_panel_creation_workflow.wdl
+ ``CNVGermlinePanelWorkflow.pon_output_path`` -- name of the final output directory
+ ``CNVGermlinePanelWorkflow.num_latents`` -- (advanced) number of principal components
+ ``CNVGermlinePanelWorkflow.ref_fasta`` -- path to reference fasta file
+ ``CNVGermlinePanelWorkflow.ref_fasta_dict`` -- path to reference dict file
+ ``CNVGermlinePanelWorkflow.ref_fasta_fai`` -- path to reference fasta fai file
+ ``CNVGermlinePanelWorkflow.gatk_jar`` -- absolute path to gatk-protected.jar
+ ``CNVGermlinePanelWorkflow.targets`` -- (optional) Target file (NOT in bed format) that was used to describe the baits in capture (exome) samples. Please run ``ConvertBedToTargetFile`` to convert a BED file to a target file. If provided, then WES workflow will be run; otherwise, WGS workflow will be run.
Fix all other instances.
------------------------------
In scripts/cnv_wdl/germline/README.md
<#1076 (comment)>
:
> + ``gCNVCohortCallingWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory
+ ``gCNVCohortCallingWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory
+ ``gCNVCohortCallingWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory
+ ``gCNVCohortCallingWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory
+ ``gCNVCohortCallingWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory
+ ``gCNVCohortCallingWorkflow.output_path`` -- name of the final output directory
+ ``gCNVCohortCallingWorkflow.num_latents`` -- (advanced) number of principal components
+ ``gCNVCohortCallingWorkflow.model_path`` -- absolute path of the PoN model (posterior_finals directory of the panel creation output)
+ ``gCNVCohortCallingWorkflow.normal_bams_list`` -- TSV file consisting of corresponding bam and corresponding index files as described in gCNV_cohort_calling_workflow.wdl
+ ``gCNVCohortCallingWorkflow.ref_fasta`` -- path to reference fasta file
+ ``gCNVCohortCallingWorkflow.ref_fasta_dict`` -- path to reference dict file
+ ``gCNVCohortCallingWorkflow.ref_fasta_fai`` -- path to reference fasta fai file
+ ``gCNVCohortCallingWorkflow.gatk_jar`` -- absolute path to gatk-protected.jar
+ ``gCNVCohortCallingWorkflow.targets`` -- (optional) Target file (NOT in bed format) that was used to describe the baits in capture (exome) samples. Please run ``ConvertBedToTargetFile`` to convert a BED file to a target file. If provided, then WES workflow will be run; otherwise, WGS workflow will be run.
+
+In additional, there are several task-level parameters that may be set by advanced users; for example:
In addition,
------------------------------
In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl
<#1076 (comment)>
:
> @@ -0,0 +1,94 @@
+# This workflow is used for running germline CNV on a cohort of germline samples
+# Notes:
+#
+# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description):
See SexGenotypeTableReader.java class => Refer to the Javadoc of
SexGenotypeTableReader
------------------------------
In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl
<#1076 (comment)>
:
> @@ -0,0 +1,94 @@
+# This workflow is used for running germline CNV on a cohort of germline samples
+# Notes:
+#
+# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description):
tab-separated
------------------------------
In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl
<#1076 (comment)>
:
> @@ -0,0 +1,94 @@
+# This workflow is used for running germline CNV on a cohort of germline samples
+# Notes:
+#
+# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description):
+# SAMPLE_NAME SEX_GENOTYPE
+# sample_name_1 SEX_XX
+# sample_name_2 SEX_XY
+# sample_name_3 SEX_XY
+# sample_name_4 SEX_XX
+# where sex genotype identifiers must match those in tab-separated contig ploidy annotation table that should be formatted as follows:
Sex genotype identifiers (SEX_XX and SEX_XY in the above example) must
match those in the tab-separated germline contig ploidy annotation table.
The latter is formatted as follows:
------------------------------
In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl
<#1076 (comment)>
:
> +# sample_name_2 SEX_XY
+# sample_name_3 SEX_XY
+# sample_name_4 SEX_XX
+# where sex genotype identifiers must match those in tab-separated contig ploidy annotation table that should be formatted as follows:
+# CONTIG CLASS SEX_XX SEX_XY
+# 1 AUTOSOMAL 2 2
+# 2 AUTOSOMAL 2 2
+# ... ... ... ...
+# X ALLOSOMAL 2 0
+# Y ALLOSOMAL 1 1
+#
+# - Input file (normal_bams_list) must contain file paths to bam and bam index files separated by tabs in the following format:
+# normal_bam_1 bam_idx_1
+# normal_bam_2 bam_idx_2
+#
+# - The target file (targets) is required for the WES workflow and should be a TSV file with the column headers:
TSV -> tab-separated
------------------------------
In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl
<#1076 (comment)>
:
> +# Y ALLOSOMAL 1 1
+#
+# - Input file (normal_bams_list) must contain file paths to bam and bam index files separated by tabs in the following format:
+# normal_bam_1 bam_idx_1
+# normal_bam_2 bam_idx_2
+#
+# - The target file (targets) is required for the WES workflow and should be a TSV file with the column headers:
+# contig start stop name
+# These targets will be padded on both sides by the amount specified by PadTargets.padding (default 250).
+#
+# - If a target file is not provided, then the WGS workflow will be run instead and the specified value of
+# wgs_bin_size (default 10000) will be used.
+#
+# - Example invocation:
+# java -jar cromwell.jar run gCNV_cohort_calling_workflow.wdl myParameters.json
+# See gCNV_cohort_calling_workflow.json for a template json file to modify with your own parameters (please save
We recommend taking gCNV_cohort_calling_workflow.json as a template json
file and modifying it accordingly.
------------------------------
In scripts/cnv_wdl/germline/gCNV_panel_creation_workflow.wdl
<#1076 (comment)>
:
> @@ -0,0 +1,191 @@
+# Workflow for creating a panel of normals for germline CNV pipeline
+# Notes:
+#
+# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description):
Same comments as those on gCNV_cohort_calling_workflow.wdl. Also, please
review the formatting (line breaks before and after example tables,
wording, etc.) all other.
------------------------------
In scripts/cnv_wdl/germline/gCNV_single_sample_calling_workflow.wdl
<#1076 (comment)>
:
> @@ -0,0 +1,159 @@
+# Subworkflow for running GATK germline CNV on a single BAM. Supports both WGS and WES samples.
Same comments as gCNV_cohort_calling_workflow.wdl.
------------------------------
In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl
<#1076 (comment)>
:
> +
+workflow gCNVCohortCallingWorkflow {
+ # Workflow input files
+ File? targets
+ File normal_bams_list
+ Array[Array[String]]+ normal_bams = read_tsv(normal_bams_list)
+ File ref_fasta
+ File ref_fasta_dict
+ File ref_fasta_fai
+ File sex_genotypes
+ File contig_annotations
+ String gatk_jar
+
+ # Transition prior table files
+ File transition_prior_table
+ File? transition_matrix_XX_Y
Can we put these in an Array[File]? accompanying_transition_prior_files?
does cromwell move around all files in the array?
------------------------------
In src/main/java/org/broadinstitute/hellbender/tools/coveragemodel/
CoverageModelEMComputeBlock.java
<#1076 (comment)>
:
> @@ -1311,7 +1312,7 @@ public Duplicable apply(final Map<String, Duplicable> parents) {
final INDArray zz_ll = zz_sll.get(NDArrayIndex.point(si), NDArrayIndex.all(), NDArrayIndex.all());
/* mean_W_contrib_t = \sum_{m,n} E[W_{tm}] E[W_{tn}] E[z_{sm} z_{sn}] */
final INDArray mean_W_contrib_t = W_tl.mmul(zz_ll).muli(W_tl).sum(1).transpose();
- WzzWT_st.get(NDArrayIndex.point(si), NDArrayIndex.all()).assign(mean_W_contrib_t);
+ Nd4jUtils.getNDArrayByIndices(WzzWT_st, NDArrayIndex.point(si), NDArrayIndex.all(), numSamples).assign(mean_W_contrib_t);
I think we can get away with using WzzWT_st.getRow(si) in place of
WzzWT_st.get(NDArrayIndex.point(si), NDArrayIndex.all()) and it should
solve the problem without needing to introduce this special method. I just
looked at the implementation and it seems to treat vector NDArrays properly.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1076 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACDXk5ynRtbKLs62eMiD6XwYuw3GUWpqks5r8hcQgaJpZM4NhxmQ>
.
|
7737c90
to
1b666a2
Compare
@mbabadi Thanks for you comments! Back to you |
My apologies, it's HCC1143 normal.
…On May 28, 2017 09:09, "samuelklee" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In scripts/cnv_cromwell_tests/germline/normal_bam_list.tsv
<#1076 (comment)>
:
> @@ -0,0 +1,3 @@
+/home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam /home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam.bai
Actually, are you sure about it being NA12878, @LeeTL1220
<https://github.com/leetl1220>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1076 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACDXk3DrswjMJ12UG_bnU7kPUFBy-s_8ks5r-XIlgaJpZM4NhxmQ>
.
|
@asmirnov239 looks good to me -- please:
|
@asmirnov239 Migration instructions for this branch: https://github.com/broadinstitute/gatk/wiki/Migrating-branches-from-gatk-protected-to-gatk |
… added gCNV wdl files, set up travis testing of the gCNV WDL workflows, created gCNV template input files and gCNV resources directory
a16ea82
to
1d4ee9b
Compare
This PR includes 3 workflows and required files for travis automatic testing.
The workflows are: