Added germline CNV WDL workflows #1076

asmirnov239 · 2017-05-22T00:33:55Z

This PR includes 3 workflows and required files for travis automatic testing.
The workflows are:

Panel workflow for creating PoN
Single sample calling workflow
Cohort sample calling workflow that scatters across samples and calls single sample workflow

asmirnov239 · 2017-05-22T00:39:09Z

@mbabadi Finally done! Could you please review?

mbabadi

Thanks for doing this @asmirnov239! looks good. My comments are stylistic for the most part.

mbabadi · 2017-05-22T22:01:16Z

scripts/cnv_cromwell_tests/germline/normal_bam_list.tsv

@@ -0,0 +1,3 @@
+/home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam	/home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam.bai


@LeeTL1220 is it OK to use potentially identifiable real data for testing?

This is the HCC1143 normal, right?

Just saw the reply below, ignore!

Actually, are you sure about it being NA12878, @LeeTL1220?

mbabadi · 2017-05-22T22:03:59Z

scripts/cnv_cromwell_tests/germline/run_cnv_germline_workflows.sh

+ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl
+
+# Panel WES
+java -jar ~/cromwell-0.26.jar run /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_panel_creation_workflow.wdl gCNV_panel_creation_workflow_wes.json


Extract cromwell-0.26.jar as an environment variable CROMWELL_JAR.

mbabadi · 2017-05-22T22:25:47Z

scripts/cnv_wdl/germline/README.md

+#### Fields of germline CNV panel of normals creation workflow
+
+  ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+  ``CNVGermlinePanelWorkflow.contig_annotations`` --  path to the contig annotation table; located in ``/resources`` directory


Perhaps we can be a bit more verbose here: contig_annotations -> contig_ploidy_annotations. In the description, you may want to say "germline contig ploidy annotations table".

mbabadi · 2017-05-22T22:26:35Z

scripts/cnv_wdl/germline/README.md

+
+  ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+  ``CNVGermlinePanelWorkflow.contig_annotations`` --  path to the contig annotation table; located in ``/resources`` directory
+  ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory


path to transition priors table -> path to copy number transition priors table

mbabadi · 2017-05-22T22:28:41Z

scripts/cnv_wdl/germline/README.md

+  ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
+  ``CNVGermlinePanelWorkflow.contig_annotations`` --  path to the contig annotation table; located in ``/resources`` directory
+  ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory
+  ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory


Wrong description: change to path to copy number transition prior for Y contig for XY-genotyped samples. Same for the other 3 fields.

mbabadi · 2017-05-22T22:58:26Z

scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl

+#
+# - Example invocation:
+#    java -jar cromwell.jar run gCNV_cohort_calling_workflow.wdl myParameters.json
+#   See gCNV_cohort_calling_workflow.json for a template json file to modify with your own parameters (please save


We recommend taking gCNV_cohort_calling_workflow.json as a template json file and modifying it accordingly.

mbabadi · 2017-05-22T23:02:30Z

scripts/cnv_wdl/germline/gCNV_panel_creation_workflow.wdl

+# Workflow for creating a panel of normals for germline CNV pipeline
+# Notes: 
+#
+# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description):


Same comments as those on gCNV_cohort_calling_workflow.wdl. Also, please review the formatting (line breaks before and after example tables, wording, etc.) all other.

mbabadi · 2017-05-22T23:04:53Z

scripts/cnv_wdl/germline/gCNV_single_sample_calling_workflow.wdl

@@ -0,0 +1,159 @@
+# Subworkflow for running GATK germline CNV on a single BAM. Supports both WGS and WES samples.


Same comments as gCNV_cohort_calling_workflow.wdl.

mbabadi · 2017-05-22T23:06:55Z

scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl

+
+  # Transition prior table files
+  File transition_prior_table
+  File? transition_matrix_XX_Y


Can we put these in an Array[File]? accompanying_transition_prior_files? does cromwell move around all files in the array?

mbabadi · 2017-05-22T23:13:24Z

...main/java/org/broadinstitute/hellbender/tools/coveragemodel/CoverageModelEMComputeBlock.java

@@ -1311,7 +1312,7 @@ public Duplicable apply(final Map<String, Duplicable> parents) {
                final INDArray zz_ll = zz_sll.get(NDArrayIndex.point(si), NDArrayIndex.all(), NDArrayIndex.all());
                /* mean_W_contrib_t = \sum_{m,n} E[W_{tm}] E[W_{tn}] E[z_{sm} z_{sn}] */
                final INDArray mean_W_contrib_t = W_tl.mmul(zz_ll).muli(W_tl).sum(1).transpose();
-                WzzWT_st.get(NDArrayIndex.point(si), NDArrayIndex.all()).assign(mean_W_contrib_t);
+                Nd4jUtils.getNDArrayByIndices(WzzWT_st, NDArrayIndex.point(si), NDArrayIndex.all(), numSamples).assign(mean_W_contrib_t);


I think we can get away with using WzzWT_st.getRow(si) in place of WzzWT_st.get(NDArrayIndex.point(si), NDArrayIndex.all()) and it should solve the problem without needing to introduce this special method. I just looked at the implementation and it seems to treat vector NDArrays properly.

LeeTL1220 · 2017-05-23T12:43:29Z

SM-74NEG is a public domain sample. It's NA12878.

…

On May 22, 2017 7:15 PM, "Mehrtash Babadi" ***@***.***> wrote: ***@***.**** requested changes on this pull request. Thanks for doing this @asmirnov239 <https://github.com/asmirnov239>! looks good. My comments are stylistic for the most part. ------------------------------ In scripts/cnv_cromwell_tests/germline/normal_bam_list.tsv <#1076 (comment)> : > @@ -0,0 +1,3 @@ +/home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam /home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam.bai @LeeTL1220 <https://github.com/leetl1220> is it OK to use potentially identifiable real data for testing? ------------------------------ In scripts/cnv_cromwell_tests/germline/run_cnv_germline_workflows.sh <#1076 (comment)> : > @@ -0,0 +1,25 @@ +#!/bin/bash -l +set -e +#cd in the directory of the script in order to use relative paths +script_path=$( cd "$(dirname "${BASH_SOURCE}")" ; pwd -P ) +cd "$script_path" + +ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/cnv_common_tasks.wdl +ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_panel_creation_workflow.wdl +ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_single_sample_calling_workflow.wdl +ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl + +# Panel WES +java -jar ~/cromwell-0.26.jar run /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_panel_creation_workflow.wdl gCNV_panel_creation_workflow_wes.json Extract cromwell-0.26.jar as an environment variable CROMWELL_JAR. ------------------------------ In scripts/cnv_wdl/germline/README.md <#1076 (comment)> : > +### Which WDL should you use? +- Building a panel of normals (PoN): ``gCNV_panel_creation_workflow.wdl`` +- Calling events on a single normal sample: ``gCNV_single_sample_calling_workflow.wdl`` +- Calling events on a cohort of normal samples: ``gCNV_cohort_calling_workflow.wdl`` + +#### Setting up parameter json file for a run + +To get started, copy the relevant ``*_template.json`` for the workflow you wish to run and adjust parameters accordingly. +You can find all required resource inputs needed to run the workflows in the ``/resources`` directory. These inputs could be run out-of-the-box. + +*Please note that there are task-level parameters that do not appear in the template files. These are set to reasonable values by default, but can also be adjusted if desired. + +#### Fields of germline CNV panel of normals creation workflow + + ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes + ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory Perhaps we can be a bit more verbose here: contig_annotations -> contig_ploidy_annotations. In the description, you may want to say "germline contig ploidy annotations table". ------------------------------ In scripts/cnv_wdl/germline/README.md <#1076 (comment)> : > +- Building a panel of normals (PoN): ``gCNV_panel_creation_workflow.wdl`` +- Calling events on a single normal sample: ``gCNV_single_sample_calling_workflow.wdl`` +- Calling events on a cohort of normal samples: ``gCNV_cohort_calling_workflow.wdl`` + +#### Setting up parameter json file for a run + +To get started, copy the relevant ``*_template.json`` for the workflow you wish to run and adjust parameters accordingly. +You can find all required resource inputs needed to run the workflows in the ``/resources`` directory. These inputs could be run out-of-the-box. + +*Please note that there are task-level parameters that do not appear in the template files. These are set to reasonable values by default, but can also be adjusted if desired. + +#### Fields of germline CNV panel of normals creation workflow + + ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes + ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory path to transition priors table -> path to copy number transition priors table ------------------------------ In scripts/cnv_wdl/germline/README.md <#1076 (comment)> : > +- Calling events on a single normal sample: ``gCNV_single_sample_calling_workflow.wdl`` +- Calling events on a cohort of normal samples: ``gCNV_cohort_calling_workflow.wdl`` + +#### Setting up parameter json file for a run + +To get started, copy the relevant ``*_template.json`` for the workflow you wish to run and adjust parameters accordingly. +You can find all required resource inputs needed to run the workflows in the ``/resources`` directory. These inputs could be run out-of-the-box. + +*Please note that there are task-level parameters that do not appear in the template files. These are set to reasonable values by default, but can also be adjusted if desired. + +#### Fields of germline CNV panel of normals creation workflow + + ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes + ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory Wrong description: change to path to copy number transition prior for Y contig for XY-genotyped samples. Same for the other 3 fields. ------------------------------ In scripts/cnv_wdl/germline/README.md <#1076 (comment)> : > + +To get started, copy the relevant ``*_template.json`` for the workflow you wish to run and adjust parameters accordingly. +You can find all required resource inputs needed to run the workflows in the ``/resources`` directory. These inputs could be run out-of-the-box. + +*Please note that there are task-level parameters that do not appear in the template files. These are set to reasonable values by default, but can also be adjusted if desired. + +#### Fields of germline CNV panel of normals creation workflow + + ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes + ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory, between two autosomal chr -> on autosomal loci. ------------------------------ In scripts/cnv_wdl/germline/README.md <#1076 (comment)> : > + +*Please note that there are task-level parameters that do not appear in the template files. These are set to reasonable values by default, but can also be adjusted if desired. + +#### Fields of germline CNV panel of normals creation workflow + + ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes + ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory, + ``CNVGermlinePanelWorkflow.normal_bams_list`` -- TSV file consisting of corresponding bam and corresponding index files as described in gCNV_panel_creation_workflow.wdl + ``CNVGermlinePanelWorkflow.pon_output_path`` -- name of the final output directory + ``CNVGermlinePanelWorkflow.num_latents`` -- (advanced) number of principal components update doc: (advanced) maximum number of principal components. Must be strictly less than the number of samples. The recommended value is 20 ~ 30 for large cohorts. For smaller cohorts, use 0.5 * number of samples. Unnecessary principal components are automatically pruned during PoN creation. ------------------------------ In scripts/cnv_wdl/germline/README.md <#1076 (comment)> : > + + +#### Fields of germline CNV single sample calling workflow + +The reference used must be the same between PoN and case samples. + + ``gCNVSingleSampleWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes + ``gCNVSingleSampleWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory + ``gCNVSingleSampleWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory + ``gCNVSingleSampleWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory + ``gCNVSingleSampleWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory + ``gCNVSingleSampleWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory + ``gCNVSingleSampleWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory + ``gCNVSingleSampleWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory, + ``gCNVSingleSampleWorkflow.output_path`` -- name of the final output directory + ``gCNVSingleSampleWorkflow.num_latents`` -- (advanced) number of principal components It is fine to have num_latents in calling WDL for the time being. I made an issue for enforcing the value dictated by the PoN which is a more reasonable behavior. ------------------------------ In scripts/cnv_wdl/germline/README.md <#1076 (comment)> : > + ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes + ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory, + ``CNVGermlinePanelWorkflow.normal_bams_list`` -- TSV file consisting of corresponding bam and corresponding index files as described in gCNV_panel_creation_workflow.wdl + ``CNVGermlinePanelWorkflow.pon_output_path`` -- name of the final output directory + ``CNVGermlinePanelWorkflow.num_latents`` -- (advanced) number of principal components + ``CNVGermlinePanelWorkflow.ref_fasta`` -- path to reference fasta file + ``CNVGermlinePanelWorkflow.ref_fasta_dict`` -- path to reference dict file + ``CNVGermlinePanelWorkflow.ref_fasta_fai`` -- path to reference fasta fai file + ``CNVGermlinePanelWorkflow.gatk_jar`` -- absolute path to gatk-protected.jar + ``CNVGermlinePanelWorkflow.targets`` -- (optional) Target file (NOT in bed format) that was used to describe the baits in capture (exome) samples. Please run ``ConvertBedToTargetFile`` to convert a BED file to a target file. If provided, then WES workflow will be run; otherwise, WGS workflow will be run. Target file (NOT in BED format) corresponding to the genomic loci of enriched targets in WES sample (e.g. Agilent, Illumina, etc). Please run ConvertBedToTargetFile to convert a BED file to a target file. If provided, then WES workflow will be run; otherwise, WGS workflow will be run. ------------------------------ In scripts/cnv_wdl/germline/README.md <#1076 (comment)> : > + ``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes + ``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory + ``CNVGermlinePanelWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory, + ``CNVGermlinePanelWorkflow.normal_bams_list`` -- TSV file consisting of corresponding bam and corresponding index files as described in gCNV_panel_creation_workflow.wdl + ``CNVGermlinePanelWorkflow.pon_output_path`` -- name of the final output directory + ``CNVGermlinePanelWorkflow.num_latents`` -- (advanced) number of principal components + ``CNVGermlinePanelWorkflow.ref_fasta`` -- path to reference fasta file + ``CNVGermlinePanelWorkflow.ref_fasta_dict`` -- path to reference dict file + ``CNVGermlinePanelWorkflow.ref_fasta_fai`` -- path to reference fasta fai file + ``CNVGermlinePanelWorkflow.gatk_jar`` -- absolute path to gatk-protected.jar + ``CNVGermlinePanelWorkflow.targets`` -- (optional) Target file (NOT in bed format) that was used to describe the baits in capture (exome) samples. Please run ``ConvertBedToTargetFile`` to convert a BED file to a target file. If provided, then WES workflow will be run; otherwise, WGS workflow will be run. Fix all other instances. ------------------------------ In scripts/cnv_wdl/germline/README.md <#1076 (comment)> : > + ``gCNVCohortCallingWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory + ``gCNVCohortCallingWorkflow.transition_matrix_XX_X`` -- path to transition prior between XX and X chr; located in ``/resources`` directory + ``gCNVCohortCallingWorkflow.transition_matrix_XY_X`` -- path to transition prior between XY and X chr; located in ``/resources`` directory + ``gCNVCohortCallingWorkflow.transition_matrix_XX_Y`` -- path to transition prior between XX and Y chr; located in ``/resources`` directory + ``gCNVCohortCallingWorkflow.transition_matrix_autosomal`` -- path to transition prior between two autosomal chr; located in ``/resources`` directory + ``gCNVCohortCallingWorkflow.output_path`` -- name of the final output directory + ``gCNVCohortCallingWorkflow.num_latents`` -- (advanced) number of principal components + ``gCNVCohortCallingWorkflow.model_path`` -- absolute path of the PoN model (posterior_finals directory of the panel creation output) + ``gCNVCohortCallingWorkflow.normal_bams_list`` -- TSV file consisting of corresponding bam and corresponding index files as described in gCNV_cohort_calling_workflow.wdl + ``gCNVCohortCallingWorkflow.ref_fasta`` -- path to reference fasta file + ``gCNVCohortCallingWorkflow.ref_fasta_dict`` -- path to reference dict file + ``gCNVCohortCallingWorkflow.ref_fasta_fai`` -- path to reference fasta fai file + ``gCNVCohortCallingWorkflow.gatk_jar`` -- absolute path to gatk-protected.jar + ``gCNVCohortCallingWorkflow.targets`` -- (optional) Target file (NOT in bed format) that was used to describe the baits in capture (exome) samples. Please run ``ConvertBedToTargetFile`` to convert a BED file to a target file. If provided, then WES workflow will be run; otherwise, WGS workflow will be run. + +In additional, there are several task-level parameters that may be set by advanced users; for example: In addition, ------------------------------ In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl <#1076 (comment)> : > @@ -0,0 +1,94 @@ +# This workflow is used for running germline CNV on a cohort of germline samples +# Notes: +# +# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description): See SexGenotypeTableReader.java class => Refer to the Javadoc of SexGenotypeTableReader ------------------------------ In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl <#1076 (comment)> : > @@ -0,0 +1,94 @@ +# This workflow is used for running germline CNV on a cohort of germline samples +# Notes: +# +# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description): tab-separated ------------------------------ In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl <#1076 (comment)> : > @@ -0,0 +1,94 @@ +# This workflow is used for running germline CNV on a cohort of germline samples +# Notes: +# +# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description): +# SAMPLE_NAME SEX_GENOTYPE +# sample_name_1 SEX_XX +# sample_name_2 SEX_XY +# sample_name_3 SEX_XY +# sample_name_4 SEX_XX +# where sex genotype identifiers must match those in tab-separated contig ploidy annotation table that should be formatted as follows: Sex genotype identifiers (SEX_XX and SEX_XY in the above example) must match those in the tab-separated germline contig ploidy annotation table. The latter is formatted as follows: ------------------------------ In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl <#1076 (comment)> : > +# sample_name_2 SEX_XY +# sample_name_3 SEX_XY +# sample_name_4 SEX_XX +# where sex genotype identifiers must match those in tab-separated contig ploidy annotation table that should be formatted as follows: +# CONTIG CLASS SEX_XX SEX_XY +# 1 AUTOSOMAL 2 2 +# 2 AUTOSOMAL 2 2 +# ... ... ... ... +# X ALLOSOMAL 2 0 +# Y ALLOSOMAL 1 1 +# +# - Input file (normal_bams_list) must contain file paths to bam and bam index files separated by tabs in the following format: +# normal_bam_1 bam_idx_1 +# normal_bam_2 bam_idx_2 +# +# - The target file (targets) is required for the WES workflow and should be a TSV file with the column headers: TSV -> tab-separated ------------------------------ In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl <#1076 (comment)> : > +# Y ALLOSOMAL 1 1 +# +# - Input file (normal_bams_list) must contain file paths to bam and bam index files separated by tabs in the following format: +# normal_bam_1 bam_idx_1 +# normal_bam_2 bam_idx_2 +# +# - The target file (targets) is required for the WES workflow and should be a TSV file with the column headers: +# contig start stop name +# These targets will be padded on both sides by the amount specified by PadTargets.padding (default 250). +# +# - If a target file is not provided, then the WGS workflow will be run instead and the specified value of +# wgs_bin_size (default 10000) will be used. +# +# - Example invocation: +# java -jar cromwell.jar run gCNV_cohort_calling_workflow.wdl myParameters.json +# See gCNV_cohort_calling_workflow.json for a template json file to modify with your own parameters (please save We recommend taking gCNV_cohort_calling_workflow.json as a template json file and modifying it accordingly. ------------------------------ In scripts/cnv_wdl/germline/gCNV_panel_creation_workflow.wdl <#1076 (comment)> : > @@ -0,0 +1,191 @@ +# Workflow for creating a panel of normals for germline CNV pipeline +# Notes: +# +# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description): Same comments as those on gCNV_cohort_calling_workflow.wdl. Also, please review the formatting (line breaks before and after example tables, wording, etc.) all other. ------------------------------ In scripts/cnv_wdl/germline/gCNV_single_sample_calling_workflow.wdl <#1076 (comment)> : > @@ -0,0 +1,159 @@ +# Subworkflow for running GATK germline CNV on a single BAM. Supports both WGS and WES samples. Same comments as gCNV_cohort_calling_workflow.wdl. ------------------------------ In scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl <#1076 (comment)> : > + +workflow gCNVCohortCallingWorkflow { + # Workflow input files + File? targets + File normal_bams_list + Array[Array[String]]+ normal_bams = read_tsv(normal_bams_list) + File ref_fasta + File ref_fasta_dict + File ref_fasta_fai + File sex_genotypes + File contig_annotations + String gatk_jar + + # Transition prior table files + File transition_prior_table + File? transition_matrix_XX_Y Can we put these in an Array[File]? accompanying_transition_prior_files? does cromwell move around all files in the array? ------------------------------ In src/main/java/org/broadinstitute/hellbender/tools/coveragemodel/ CoverageModelEMComputeBlock.java <#1076 (comment)> : > @@ -1311,7 +1312,7 @@ public Duplicable apply(final Map<String, Duplicable> parents) { final INDArray zz_ll = zz_sll.get(NDArrayIndex.point(si), NDArrayIndex.all(), NDArrayIndex.all()); /* mean_W_contrib_t = \sum_{m,n} E[W_{tm}] E[W_{tn}] E[z_{sm} z_{sn}] */ final INDArray mean_W_contrib_t = W_tl.mmul(zz_ll).muli(W_tl).sum(1).transpose(); - WzzWT_st.get(NDArrayIndex.point(si), NDArrayIndex.all()).assign(mean_W_contrib_t); + Nd4jUtils.getNDArrayByIndices(WzzWT_st, NDArrayIndex.point(si), NDArrayIndex.all(), numSamples).assign(mean_W_contrib_t); I think we can get away with using WzzWT_st.getRow(si) in place of WzzWT_st.get(NDArrayIndex.point(si), NDArrayIndex.all()) and it should solve the problem without needing to introduce this special method. I just looked at the implementation and it seems to treat vector NDArrays properly. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1076 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACDXk5ynRtbKLs62eMiD6XwYuw3GUWpqks5r8hcQgaJpZM4NhxmQ> .

asmirnov239 · 2017-05-25T03:42:51Z

@mbabadi Thanks for you comments! Back to you

LeeTL1220 · 2017-05-28T13:15:20Z

My apologies, it's HCC1143 normal.

…

On May 28, 2017 09:09, "samuelklee" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In scripts/cnv_cromwell_tests/germline/normal_bam_list.tsv <#1076 (comment)> : > @@ -0,0 +1,3 @@ +/home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam /home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam.bai Actually, are you sure about it being NA12878, @LeeTL1220 <https://github.com/leetl1220>? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1076 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACDXk3DrswjMJ12UG_bnU7kPUFBy-s_8ks5r-XIlgaJpZM4NhxmQ> .

mbabadi · 2017-05-30T15:16:44Z

@asmirnov239 looks good to me -- please:

rename "homo_sapiens_germline_HMM_priors.tsv" to "homo_sapiens_germline_CN_priors.tsv"
rename "TCGA_T_matrix_autosomal.tsv" to "homo_sapiens_germline_CN_transition_matrix_autosomal.tsv"
rename "TCGA_T_matrix_XX_X.tsv" to "homo_sapiens_germline_CN_transition_matrix_XX_X.tsv"
rename "TCGA_T_matrix_XY_X.tsv" to "homo_sapiens_germline_CN_transition_matrix_XY_X.tsv"
rename "TCGA_T_matrix_XX_Y.tsv" to "homo_sapiens_germline_CN_transition_matrix_XX_Y.tsv"
rename "TCGA_T_matrix_XY_Y.tsv" to "homo_sapiens_germline_CN_transition_matrix_XY_Y.tsv"
make sure that you update the table "homo_sapiens_germline_CN_priors.tsv" accordingly
add a comment line to the top of each transition matrix: "#The following germline copy number transition matrix is obtained from analyzing Genome STRiP calls on a cohort of 170 blood normal TCGA samples". On "homo_sapiens_germline_CN_transition_matrix_XX_Y.tsv", please comment: "#A trivial transition matrix for enforcing zero ploidy on Y contig in XX samples"
update all jsons files appropriately
if tests pass, squash and merge :)

droazen · 2017-06-02T23:15:58Z

@asmirnov239 Migration instructions for this branch: https://github.com/broadinstitute/gatk/wiki/Migrating-branches-from-gatk-protected-to-gatk

… added gCNV wdl files, set up travis testing of the gCNV WDL workflows, created gCNV template input files and gCNV resources directory

asmirnov239 requested a review from mbabadi May 22, 2017 00:39

mbabadi suggested changes May 22, 2017

View reviewed changes

asmirnov239 force-pushed the as_germline_cnv_wdl branch 2 times, most recently from 7737c90 to 1b666a2 Compare May 25, 2017 03:23

asmirnov239 assigned asmirnov239 and mbabadi and unassigned asmirnov239 May 25, 2017

mbabadi approved these changes May 30, 2017

View reviewed changes

Fixed bug issue #1063, Updated cromwell version in travis.yml to v26,…

1d4ee9b

… added gCNV wdl files, set up travis testing of the gCNV WDL workflows, created gCNV template input files and gCNV resources directory

asmirnov239 force-pushed the as_germline_cnv_wdl branch from a16ea82 to 1d4ee9b Compare June 6, 2017 22:25

asmirnov239 mentioned this pull request Jun 8, 2017

Added germline CNV WDL workflows broadinstitute/gatk#3071

Merged

samuelklee mentioned this pull request Jun 12, 2017

Replace or remove Nd4jUtils.getNDArrayByIndices() method broadinstitute/gatk#3000

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added germline CNV WDL workflows #1076

Added germline CNV WDL workflows #1076

asmirnov239 commented May 22, 2017

asmirnov239 commented May 22, 2017

mbabadi left a comment

mbabadi May 22, 2017

samuelklee May 28, 2017

samuelklee May 28, 2017

samuelklee May 28, 2017

mbabadi May 22, 2017

asmirnov239 May 25, 2017

mbabadi May 22, 2017

asmirnov239 May 25, 2017

mbabadi May 22, 2017

asmirnov239 May 25, 2017

mbabadi May 22, 2017

asmirnov239 May 25, 2017

mbabadi May 22, 2017

asmirnov239 May 25, 2017

mbabadi May 22, 2017

mbabadi May 22, 2017

mbabadi May 22, 2017

asmirnov239 May 25, 2017

mbabadi May 22, 2017

asmirnov239 May 25, 2017

LeeTL1220 commented May 23, 2017 via email

asmirnov239 commented May 25, 2017

LeeTL1220 commented May 28, 2017 via email

mbabadi commented May 30, 2017 •

edited

Loading

droazen commented Jun 2, 2017

		@@ -0,0 +1,3 @@
		/home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam /home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam.bai

		@@ -0,0 +1,159 @@
		# Subworkflow for running GATK germline CNV on a single BAM. Supports both WGS and WES samples.

Added germline CNV WDL workflows #1076

Are you sure you want to change the base?

Added germline CNV WDL workflows #1076

Conversation

asmirnov239 commented May 22, 2017

asmirnov239 commented May 22, 2017

mbabadi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeeTL1220 commented May 23, 2017 via email

asmirnov239 commented May 25, 2017

LeeTL1220 commented May 28, 2017 via email

mbabadi commented May 30, 2017 • edited Loading

droazen commented Jun 2, 2017

mbabadi commented May 30, 2017 •

edited

Loading