Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(all of) gCNV exome joint calling #6554

Merged
merged 2 commits into from
Dec 23, 2020
Merged

Conversation

ldgauthier
Copy link
Contributor

I thought I'd PR this before it gets too big.

The idea is to do defragmentation and breakpoint clustering on the exome CNV variants and output the new coordinated with the copy number for each sample. This is sort of like the CombineGVCFs step. The next step, which is the GenotypeGVCFs equivalent, will be updating the quality scores for each variant. Since we changed the bounds, we have to recalculate QS, QA, QSS, QSE. I think that should be possible using similar code to PostprocessGCNVCalls and using the clustered breakpoints instead of the viterbi segmentation. I guess we'll see.

@ldgauthier ldgauthier requested a review from mwalker174 April 17, 2020 20:54
@ldgauthier
Copy link
Contributor Author

@mwalker174 I have some questions about your implementation. For example, is length calculated as both ends inclusive? And then the current median calculation for start and end of the SVCluster produce pretty lame results in my tests with tiny data.

@ldgauthier
Copy link
Contributor Author

Also this is a pretty spare set of tests, but I thought I'd get the ball rolling while I work on some more sophisticated ones.

Copy link
Contributor

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ldgauthier! This is shaping up well. I have a number of comments and no major concerns, although it would be good to see some preliminary results.

Comment on lines 146 to +148
assert start_index >= 0
assert end_index < self.num_sites
assert end_index >= start_index
assert end_index >= start_index, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm not a fan of using assert except for testing, but at least you've added an error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied Mehrtash's style, but I'm open to suggestions.

if (copyNumber == 2) return null;
final boolean isDel = copyNumber < 2;
final boolean startStrand = isDel ? true : false;
final boolean endStrand = isDel ? false : true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm looking at this again, I don't understand this logic. Inversions, sure, but dupes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DELs are +/- (true/false) and DUPs and -/+. Inversions are -/- and +/+.

@ldgauthier ldgauthier force-pushed the ldg_gcnv_exome_joint_calling branch from db49909 to 4f8b4b0 Compare June 15, 2020 20:33
@gatk-bot
Copy link

gatk-bot commented Jun 16, 2020

Travis reported job failures from build 30643
Failures in the following jobs:

Test Type JDK Job ID Logs
conda openjdk8 30643.5 logs
unit openjdk11 30643.12 logs
integration openjdk11 30643.11 logs
unit openjdk8 30643.3 logs
integration openjdk8 30643.2 logs

@gatk-bot
Copy link

gatk-bot commented Jun 16, 2020

Travis reported job failures from build 30666
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 30666.12 logs
conda openjdk8 30666.5 logs
integration openjdk11 30666.11 logs
unit openjdk8 30666.3 logs
integration openjdk8 30666.2 logs

@gatk-bot
Copy link

gatk-bot commented Jun 16, 2020

Travis reported job failures from build 30678
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 30678.12 logs
conda openjdk8 30678.5 logs
integration openjdk11 30678.11 logs
unit openjdk8 30678.3 logs
integration openjdk8 30678.2 logs

@gatk-bot
Copy link

gatk-bot commented Jun 17, 2020

Travis reported job failures from build 30705
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 30705.12 logs
conda openjdk8 30705.5 logs
integration openjdk11 30705.11 logs
unit openjdk8 30705.3 logs
integration openjdk8 30705.2 logs

@gatk-bot
Copy link

gatk-bot commented Jun 18, 2020

Travis reported job failures from build 30720
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 30720.1 logs
cloud openjdk11 30720.13 logs
unit openjdk11 30720.12 logs
conda openjdk8 30720.5 logs
unit openjdk8 30720.3 logs

@ldgauthier ldgauthier requested a review from mwalker174 June 19, 2020 21:01
@gatk-bot
Copy link

gatk-bot commented Jun 29, 2020

Travis reported job failures from build 30828
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 30828.12 logs
conda openjdk8 30828.5 logs
unit openjdk8 30828.3 logs

@gatk-bot
Copy link

gatk-bot commented Jul 16, 2020

Travis reported job failures from build 30974
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 30974.1 logs
unit openjdk11 30974.13 logs
conda openjdk8 30974.5 logs
integration openjdk11 30974.12 logs
unit openjdk8 30974.3 logs
integration openjdk8 30974.2 logs

@gatk-bot
Copy link

gatk-bot commented Jul 24, 2020

Travis reported job failures from build 31019
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 31019.13 logs
integration openjdk11 31019.12 logs

@gatk-bot
Copy link

gatk-bot commented Jul 24, 2020

Travis reported job failures from build 31021
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 31021.13 logs
conda openjdk8 31021.5 logs
integration openjdk11 31021.12 logs
unit openjdk8 31021.3 logs
integration openjdk8 31021.2 logs

@gatk-bot
Copy link

gatk-bot commented Jul 30, 2020

Travis reported job failures from build 31043
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 31043.13 logs
conda openjdk8 31043.5 logs
integration openjdk11 31043.12 logs
unit openjdk8 31043.3 logs
integration openjdk8 31043.2 logs

@ldgauthier ldgauthier changed the title (first step of) gCNV exome joint calling (all of) gCNV exome joint calling Oct 19, 2020
@ldgauthier ldgauthier force-pushed the ldg_gcnv_exome_joint_calling branch from 7b42ed9 to 854619b Compare October 19, 2020 17:25
@ldgauthier ldgauthier force-pushed the ldg_gcnv_exome_joint_calling branch from 854619b to 0d6bb31 Compare October 19, 2020 20:32
@ldgauthier ldgauthier force-pushed the ldg_gcnv_exome_joint_calling branch from 4e23e91 to 8748484 Compare November 17, 2020 15:45
@gatk-bot
Copy link

gatk-bot commented Nov 17, 2020

Travis reported job failures from build 32149
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 32149.1 logs
integration openjdk11 32149.12 logs
conda openjdk8 32149.5 logs
unit openjdk8 32149.3 logs
integration openjdk8 32149.2 logs

@gatk-bot
Copy link

gatk-bot commented Nov 18, 2020

Travis reported job failures from build 32169
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 32169.1 logs
conda openjdk8 32169.5 logs
unit openjdk8 32169.3 logs

@gatk-bot
Copy link

gatk-bot commented Nov 19, 2020

Travis reported job failures from build 32178
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 32178.1 logs
conda openjdk8 32178.5 logs
unit openjdk8 32178.3 logs

@ldgauthier ldgauthier force-pushed the ldg_gcnv_exome_joint_calling branch from c8bfcfe to 62b655c Compare November 19, 2020 17:03
Copy link
Contributor

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small step for CNVs, one giant leap for SVs. 🧑‍🚀

I have some suggestions ranging from minor to outright pedantic. We may want to think about getting away from manipulating VariantContext and Genotype objects, they seem awkward to use for SVs, such as the DUP no-call business. But I'll leave such issues up to future work.

Edit: there is one potential major bug, see my comment about a possible getLeft/getRight mixup.

fullName = INPUT_INTERVALS_LONG_NAME,
optional = true
)
private File combinedIntervalsVCFFile = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File inputs should be GATKPaths (also in JointGermlineCNVSegmentation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is going to go to the underlying Python script as-is, I think it has to stay a File right? I suspect it will crash and burn if it gets a GCS path.

@ldgauthier ldgauthier force-pushed the ldg_gcnv_exome_joint_calling branch from 858fb47 to 61759e5 Compare December 8, 2020 21:13
@ldgauthier
Copy link
Contributor Author

Back to you @mwalker174 -- the comments I left unresolved are not fully addressed or perhaps not to your satisfaction.

@gatk-bot
Copy link

gatk-bot commented Dec 11, 2020

Travis reported job failures from build 32366
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 32366.13 logs
integration openjdk11 32366.12 logs
conda openjdk8 32366.5 logs
unit openjdk8 32366.3 logs
integration openjdk8 32366.2 logs

Copy link
Contributor

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @ldgauthier! It looks like at least one of the travis tests failed legitimately:

org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCallsUnitTest > testPythonVCFReading FAILED

Good to merge once the tests are completing!

@gatk-bot
Copy link

Travis reported job failures from build 32415
Failures in the following jobs:

Test Type JDK Job ID Logs
conda openjdk8 32415.5 logs

@gatk-bot
Copy link

gatk-bot commented Dec 22, 2020

Travis reported job failures from build 32419
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 32419.13 logs
unit openjdk11 32419.13 logs

@gatk-bot
Copy link

Travis reported job failures from build 32421
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 32421.13 logs

@ldgauthier ldgauthier force-pushed the ldg_gcnv_exome_joint_calling branch from fb519a9 to c5dc7c5 Compare December 22, 2020 19:00
mwalker174 and others added 2 commits December 22, 2020 15:54
Use call intervals for bin-space defragmentation
Adjust copy number for overlapping events (not super efficient)
Diploid genotypes and actually get ref base (if reference is supplied)
QS filtering and AC calculation
Filter by raw calls and filtered calls
New Python unit test runner
@ldgauthier ldgauthier force-pushed the ldg_gcnv_exome_joint_calling branch from c5dc7c5 to c412399 Compare December 22, 2020 21:05
@gatk-bot
Copy link

Travis reported job failures from build 32429
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 32429.13 logs

@ldgauthier ldgauthier merged commit 31df35b into master Dec 23, 2020
@ldgauthier ldgauthier deleted the ldg_gcnv_exome_joint_calling branch December 23, 2020 01:28
mwalker174 added a commit that referenced this pull request May 13, 2021
Move walker tools to walkers package

Some changes to PrintSVEvidence

Add plotting scripts

Move python launcher script packages

Various Depth model updates

Disable SimpleInterval coordinate check

Fix docker project

Update plots; remove unneeded cnv model outputs

Move filtering to new SVSelectVariants tools

Fix bug

Various bug fixes; add RDO tag; remove CNV-to-BND conversions

Some cleanup and refactoring

Remove sv interval filter

Fix some issues with record merging; add no-call genotypes to output

Rework sv cluster; break into 2 tools

Working refactor of SV clustering classes; untested

Improve cluster interface; refactor pesr aggregation

Improvements to pesr aggregator tool code

Fix compiler warnings

Fix cluster arguments collection

Fix max-clique subset filtering

Fix single-linkage null pointer exception

Delete debug line

Aggregator now preserves genotype attributes

Fix SR aggregation

Fix breakpointrefiner bug

Use IntervalTree cache for pesr aggregation

A bit of cleanup with defragmenter and preprocessor tool

Fix tests

Fixt tests

Fix compiler warnings

Fix bugs in clustering algorithm; remove reciprocal overlap padding

Improve TrainDepth sample list handling

Fix test compiler error

Depth model tool fixes; infer ploidy from CNV vcf during depth aggregation

Improve cluster engine tests

Fix SR aggregation

Add contig ploidy collections back to depth aggregator

More cleanup and fixes

Clustering bug fixes

Tweaked clustering - good results on Manta

Use getAttributeAsInt

Start fixing cluster engine tests

Start fixing tests

Fix join cnv seg defrag integration test params

Fix swapped CNVDefragmenter and BinnedCNVDefragmenter in joint cnv segmentation

Expose clustering parameters in joint segmentation

Fix parameter shenanigans

Remove collapser default strategy

Defragmenter tests and bug fixes

Fix cluster engine test

Implement SVCollapser tests

Implement SVClusterIntegrationTest

Add cluster test resources

Improve cluster engine test coverage

Fix tests

Start documentation

Finish tests for sv call record utils
mwalker174 added a commit that referenced this pull request Aug 19, 2021
Move walker tools to walkers package

Some changes to PrintSVEvidence

Add plotting scripts

Move python launcher script packages

Various Depth model updates

Disable SimpleInterval coordinate check

Fix docker project

Update plots; remove unneeded cnv model outputs

Move filtering to new SVSelectVariants tools

Fix bug

Various bug fixes; add RDO tag; remove CNV-to-BND conversions

Some cleanup and refactoring

Remove sv interval filter

Fix some issues with record merging; add no-call genotypes to output

Rework sv cluster; break into 2 tools

Working refactor of SV clustering classes; untested

Improve cluster interface; refactor pesr aggregation

Improvements to pesr aggregator tool code

Fix compiler warnings

Fix cluster arguments collection

Fix max-clique subset filtering

Fix single-linkage null pointer exception

Delete debug line

Aggregator now preserves genotype attributes

Fix SR aggregation

Fix breakpointrefiner bug

Use IntervalTree cache for pesr aggregation

A bit of cleanup with defragmenter and preprocessor tool

Fix tests

Fixt tests

Fix compiler warnings

Fix bugs in clustering algorithm; remove reciprocal overlap padding

Improve TrainDepth sample list handling

Fix test compiler error

Depth model tool fixes; infer ploidy from CNV vcf during depth aggregation

Improve cluster engine tests

Fix SR aggregation

Add contig ploidy collections back to depth aggregator

More cleanup and fixes

Clustering bug fixes

Tweaked clustering - good results on Manta

Use getAttributeAsInt

Start fixing cluster engine tests

Start fixing tests

Fix join cnv seg defrag integration test params

Fix swapped CNVDefragmenter and BinnedCNVDefragmenter in joint cnv segmentation

Expose clustering parameters in joint segmentation

Fix parameter shenanigans

Remove collapser default strategy

Defragmenter tests and bug fixes

Fix cluster engine test

Implement SVCollapser tests

Implement SVClusterIntegrationTest

Add cluster test resources

Improve cluster engine test coverage

Fix tests

Start documentation

Finish tests for sv call record utils

Rework SVAnnotateOverlappingRegions

Fix compiler warning

Tweak how annotate regions works; fix svcluster end2 field type

Minor fix to caching evidence aggregator

Increase svgenotyper plot image size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants