-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(all of) gCNV exome joint calling #6554
Conversation
@mwalker174 I have some questions about your implementation. For example, is length calculated as both ends inclusive? And then the current median calculation for start and end of the SVCluster produce pretty lame results in my tests with tiny data. |
Also this is a pretty spare set of tests, but I thought I'd get the ball rolling while I work on some more sophisticated ones. |
src/main/java/org/broadinstitute/hellbender/tools/sv/SVClusterEngine.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/sv/SVClusterEngine.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/sv/SVClusterEngine.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/sv/SVClusterEngine.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/sv/SVDepthOnlyCallDefragmenter.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/sv/SVCallRecordWithEvidence.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/sv/SVClusterEngine.java
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/sv/SVDepthOnlyCallDefragmenter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ldgauthier! This is shaping up well. I have a number of comments and no major concerns, although it would be good to see some preliminary results.
src/main/java/org/broadinstitute/hellbender/utils/codecs/SVCallRecordCodec.java
Outdated
Show resolved
Hide resolved
assert start_index >= 0 | ||
assert end_index < self.num_sites | ||
assert end_index >= start_index | ||
assert end_index >= start_index, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm not a fan of using assert
except for testing, but at least you've added an error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied Mehrtash's style, but I'm open to suggestions.
src/main/python/org/broadinstitute/hellbender/gcnvkernel/io/io_ploidy.py
Show resolved
Hide resolved
src/main/python/org/broadinstitute/hellbender/gcnvkernel/io/io_vcf_parsing.py
Show resolved
Hide resolved
...esources/org/broadinstitute/hellbender/tools/copynumber/cohort_determine_ploidy_and_depth.py
Show resolved
Hide resolved
if (copyNumber == 2) return null; | ||
final boolean isDel = copyNumber < 2; | ||
final boolean startStrand = isDel ? true : false; | ||
final boolean endStrand = isDel ? false : true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm looking at this again, I don't understand this logic. Inversions, sure, but dupes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DELs are +/- (true/false) and DUPs and -/+. Inversions are -/- and +/+.
db49909
to
4f8b4b0
Compare
.../broadinstitute/hellbender/tools/copynumber/arguments/CopyNumberArgumentValidationUtils.java
Outdated
Show resolved
Hide resolved
7b42ed9
to
854619b
Compare
854619b
to
0d6bb31
Compare
src/main/java/org/broadinstitute/hellbender/tools/copynumber/PostprocessGermlineCNVCalls.java
Outdated
Show resolved
Hide resolved
4e23e91
to
8748484
Compare
c8bfcfe
to
62b655c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small step for CNVs, one giant leap for SVs. 🧑🚀
I have some suggestions ranging from minor to outright pedantic. We may want to think about getting away from manipulating VariantContext and Genotype objects, they seem awkward to use for SVs, such as the DUP no-call business. But I'll leave such issues up to future work.
Edit: there is one potential major bug, see my comment about a possible getLeft/getRight mixup.
src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/JointGermlineCNVSegmentation.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/JointGermlineCNVSegmentation.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/JointGermlineCNVSegmentation.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/JointGermlineCNVSegmentation.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/JointGermlineCNVSegmentation.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/JointGermlineCNVSegmentation.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/copynumber/PostprocessGermlineCNVCalls.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/engine/MultiVariantDataSource.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/engine/MultiVariantDataSource.java
Outdated
Show resolved
Hide resolved
fullName = INPUT_INTERVALS_LONG_NAME, | ||
optional = true | ||
) | ||
private File combinedIntervalsVCFFile = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File inputs should be GATKPath
s (also in JointGermlineCNVSegmentation
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is going to go to the underlying Python script as-is, I think it has to stay a File right? I suspect it will crash and burn if it gets a GCS path.
858fb47
to
61759e5
Compare
Back to you @mwalker174 -- the comments I left unresolved are not fully addressed or perhaps not to your satisfaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @ldgauthier! It looks like at least one of the travis tests failed legitimately:
org.broadinstitute.hellbender.tools.copynumber.PostprocessGermlineCNVCallsUnitTest > testPythonVCFReading FAILED
Good to merge once the tests are completing!
fb519a9
to
c5dc7c5
Compare
Use call intervals for bin-space defragmentation Adjust copy number for overlapping events (not super efficient) Diploid genotypes and actually get ref base (if reference is supplied) QS filtering and AC calculation Filter by raw calls and filtered calls New Python unit test runner
c5dc7c5
to
c412399
Compare
Move walker tools to walkers package Some changes to PrintSVEvidence Add plotting scripts Move python launcher script packages Various Depth model updates Disable SimpleInterval coordinate check Fix docker project Update plots; remove unneeded cnv model outputs Move filtering to new SVSelectVariants tools Fix bug Various bug fixes; add RDO tag; remove CNV-to-BND conversions Some cleanup and refactoring Remove sv interval filter Fix some issues with record merging; add no-call genotypes to output Rework sv cluster; break into 2 tools Working refactor of SV clustering classes; untested Improve cluster interface; refactor pesr aggregation Improvements to pesr aggregator tool code Fix compiler warnings Fix cluster arguments collection Fix max-clique subset filtering Fix single-linkage null pointer exception Delete debug line Aggregator now preserves genotype attributes Fix SR aggregation Fix breakpointrefiner bug Use IntervalTree cache for pesr aggregation A bit of cleanup with defragmenter and preprocessor tool Fix tests Fixt tests Fix compiler warnings Fix bugs in clustering algorithm; remove reciprocal overlap padding Improve TrainDepth sample list handling Fix test compiler error Depth model tool fixes; infer ploidy from CNV vcf during depth aggregation Improve cluster engine tests Fix SR aggregation Add contig ploidy collections back to depth aggregator More cleanup and fixes Clustering bug fixes Tweaked clustering - good results on Manta Use getAttributeAsInt Start fixing cluster engine tests Start fixing tests Fix join cnv seg defrag integration test params Fix swapped CNVDefragmenter and BinnedCNVDefragmenter in joint cnv segmentation Expose clustering parameters in joint segmentation Fix parameter shenanigans Remove collapser default strategy Defragmenter tests and bug fixes Fix cluster engine test Implement SVCollapser tests Implement SVClusterIntegrationTest Add cluster test resources Improve cluster engine test coverage Fix tests Start documentation Finish tests for sv call record utils
Move walker tools to walkers package Some changes to PrintSVEvidence Add plotting scripts Move python launcher script packages Various Depth model updates Disable SimpleInterval coordinate check Fix docker project Update plots; remove unneeded cnv model outputs Move filtering to new SVSelectVariants tools Fix bug Various bug fixes; add RDO tag; remove CNV-to-BND conversions Some cleanup and refactoring Remove sv interval filter Fix some issues with record merging; add no-call genotypes to output Rework sv cluster; break into 2 tools Working refactor of SV clustering classes; untested Improve cluster interface; refactor pesr aggregation Improvements to pesr aggregator tool code Fix compiler warnings Fix cluster arguments collection Fix max-clique subset filtering Fix single-linkage null pointer exception Delete debug line Aggregator now preserves genotype attributes Fix SR aggregation Fix breakpointrefiner bug Use IntervalTree cache for pesr aggregation A bit of cleanup with defragmenter and preprocessor tool Fix tests Fixt tests Fix compiler warnings Fix bugs in clustering algorithm; remove reciprocal overlap padding Improve TrainDepth sample list handling Fix test compiler error Depth model tool fixes; infer ploidy from CNV vcf during depth aggregation Improve cluster engine tests Fix SR aggregation Add contig ploidy collections back to depth aggregator More cleanup and fixes Clustering bug fixes Tweaked clustering - good results on Manta Use getAttributeAsInt Start fixing cluster engine tests Start fixing tests Fix join cnv seg defrag integration test params Fix swapped CNVDefragmenter and BinnedCNVDefragmenter in joint cnv segmentation Expose clustering parameters in joint segmentation Fix parameter shenanigans Remove collapser default strategy Defragmenter tests and bug fixes Fix cluster engine test Implement SVCollapser tests Implement SVClusterIntegrationTest Add cluster test resources Improve cluster engine test coverage Fix tests Start documentation Finish tests for sv call record utils Rework SVAnnotateOverlappingRegions Fix compiler warning Tweak how annotate regions works; fix svcluster end2 field type Minor fix to caching evidence aggregator Increase svgenotyper plot image size
I thought I'd PR this before it gets too big.
The idea is to do defragmentation and breakpoint clustering on the exome CNV variants and output the new coordinated with the copy number for each sample. This is sort of like the CombineGVCFs step. The next step, which is the GenotypeGVCFs equivalent, will be updating the quality scores for each variant. Since we changed the bounds, we have to recalculate QS, QA, QSS, QSE. I think that should be possible using similar code to PostprocessGCNVCalls and using the clustered breakpoints instead of the viterbi segmentation. I guess we'll see.