Skip to content
This repository has been archived by the owner on Nov 9, 2019. It is now read-only.

Added germline CNV WDL workflows #1076

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

asmirnov239
Copy link
Contributor

This PR includes 3 workflows and required files for travis automatic testing.
The workflows are:

  • Panel workflow for creating PoN
  • Single sample calling workflow
  • Cohort sample calling workflow that scatters across samples and calls single sample workflow

@asmirnov239
Copy link
Contributor Author

@mbabadi Finally done! Could you please review?

@asmirnov239 asmirnov239 requested a review from mbabadi May 22, 2017 00:39
Copy link
Contributor

@mbabadi mbabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @asmirnov239! looks good. My comments are stylistic for the most part.

@@ -0,0 +1,3 @@
/home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam /home/travis/build/broadinstitute/gatk-protected/src/test/resources/large/cnv_germline_workflows_test_files/inputs/bams/SM-74NEG_20xy-downsampled.bam.bai
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeeTL1220 is it OK to use potentially identifiable real data for testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the HCC1143 normal, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw the reply below, ignore!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, are you sure about it being NA12878, @LeeTL1220?

ln -fs /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_cohort_calling_workflow.wdl

# Panel WES
java -jar ~/cromwell-0.26.jar run /home/travis/build/broadinstitute/gatk-protected/scripts/cnv_wdl/germline/gCNV_panel_creation_workflow.wdl gCNV_panel_creation_workflow_wes.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract cromwell-0.26.jar as an environment variable CROMWELL_JAR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#### Fields of germline CNV panel of normals creation workflow

``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can be a bit more verbose here: contig_annotations -> contig_ploidy_annotations. In the description, you may want to say "germline contig ploidy annotations table".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path to transition priors table -> path to copy number transition priors table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

``CNVGermlinePanelWorkflow.sex_genotypes`` -- path to table of per-sample sex genotypes
``CNVGermlinePanelWorkflow.contig_annotations`` -- path to the contig annotation table; located in ``/resources`` directory
``CNVGermlinePanelWorkflow.transition_prior_table`` -- path to transition priors table; located in ``/resources`` directory
``CNVGermlinePanelWorkflow.transition_matrix_XY_Y`` -- path to transition prior between XY and Y chr; located in ``/resources`` directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong description: change to path to copy number transition prior for Y contig for XY-genotyped samples. Same for the other 3 fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#
# - Example invocation:
# java -jar cromwell.jar run gCNV_cohort_calling_workflow.wdl myParameters.json
# See gCNV_cohort_calling_workflow.json for a template json file to modify with your own parameters (please save
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recommend taking gCNV_cohort_calling_workflow.json as a template json file and modifying it accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Workflow for creating a panel of normals for germline CNV pipeline
# Notes:
#
# -Basic sex genotype tab separated table for homo sapiens must be formatted as follows (See SexGenotypeTableReader.java class for full description):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as those on gCNV_cohort_calling_workflow.wdl. Also, please review the formatting (line breaks before and after example tables, wording, etc.) all other.

@@ -0,0 +1,159 @@
# Subworkflow for running GATK germline CNV on a single BAM. Supports both WGS and WES samples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as gCNV_cohort_calling_workflow.wdl.


# Transition prior table files
File transition_prior_table
File? transition_matrix_XX_Y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put these in an Array[File]? accompanying_transition_prior_files? does cromwell move around all files in the array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1311,7 +1312,7 @@ public Duplicable apply(final Map<String, Duplicable> parents) {
final INDArray zz_ll = zz_sll.get(NDArrayIndex.point(si), NDArrayIndex.all(), NDArrayIndex.all());
/* mean_W_contrib_t = \sum_{m,n} E[W_{tm}] E[W_{tn}] E[z_{sm} z_{sn}] */
final INDArray mean_W_contrib_t = W_tl.mmul(zz_ll).muli(W_tl).sum(1).transpose();
WzzWT_st.get(NDArrayIndex.point(si), NDArrayIndex.all()).assign(mean_W_contrib_t);
Nd4jUtils.getNDArrayByIndices(WzzWT_st, NDArrayIndex.point(si), NDArrayIndex.all(), numSamples).assign(mean_W_contrib_t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get away with using WzzWT_st.getRow(si) in place of WzzWT_st.get(NDArrayIndex.point(si), NDArrayIndex.all()) and it should solve the problem without needing to introduce this special method. I just looked at the implementation and it seems to treat vector NDArrays properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@LeeTL1220
Copy link
Contributor

LeeTL1220 commented May 23, 2017 via email

@asmirnov239 asmirnov239 force-pushed the as_germline_cnv_wdl branch 2 times, most recently from 7737c90 to 1b666a2 Compare May 25, 2017 03:23
@asmirnov239
Copy link
Contributor Author

@mbabadi Thanks for you comments! Back to you

@LeeTL1220
Copy link
Contributor

LeeTL1220 commented May 28, 2017 via email

@mbabadi
Copy link
Contributor

mbabadi commented May 30, 2017

@asmirnov239 looks good to me -- please:

  • rename "homo_sapiens_germline_HMM_priors.tsv" to "homo_sapiens_germline_CN_priors.tsv"
  • rename "TCGA_T_matrix_autosomal.tsv" to "homo_sapiens_germline_CN_transition_matrix_autosomal.tsv"
  • rename "TCGA_T_matrix_XX_X.tsv" to "homo_sapiens_germline_CN_transition_matrix_XX_X.tsv"
  • rename "TCGA_T_matrix_XY_X.tsv" to "homo_sapiens_germline_CN_transition_matrix_XY_X.tsv"
  • rename "TCGA_T_matrix_XX_Y.tsv" to "homo_sapiens_germline_CN_transition_matrix_XX_Y.tsv"
  • rename "TCGA_T_matrix_XY_Y.tsv" to "homo_sapiens_germline_CN_transition_matrix_XY_Y.tsv"
  • make sure that you update the table "homo_sapiens_germline_CN_priors.tsv" accordingly
  • add a comment line to the top of each transition matrix: "#The following germline copy number transition matrix is obtained from analyzing Genome STRiP calls on a cohort of 170 blood normal TCGA samples". On "homo_sapiens_germline_CN_transition_matrix_XX_Y.tsv", please comment: "#A trivial transition matrix for enforcing zero ploidy on Y contig in XX samples"
  • update all jsons files appropriately
  • if tests pass, squash and merge :)

@droazen
Copy link
Contributor

droazen commented Jun 2, 2017

… added gCNV wdl files, set up travis testing of the gCNV WDL workflows, created gCNV template input files and gCNV resources directory
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants