-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed filtering of normal hets on overlap with copy-ratio intervals… #4510
Conversation
@sooheelee Are you comfortable reviewing this? The code changes are minimal. Note that there is no "regression test" since the test files do not give rise to the scenario you encountered in https://github.com/broadinstitute/dsde-docs/issues/2891. Perhaps we can make sure that we include such a test case when we address #4007. |
Codecov Report
@@ Coverage Diff @@
## master #4510 +/- ##
==============================================
- Coverage 79.121% 79.12% -0.001%
- Complexity 16677 16682 +5
==============================================
Files 1051 1051
Lines 60285 60293 +8
Branches 9875 9876 +1
==============================================
+ Hits 47698 47704 +6
- Misses 8759 8761 +2
Partials 3828 3828
|
I'm only in an introductory Java class, geared towards Android app development. So I cannot comment on the code. What I have done is take the branch and test it against the data that I have and I can say the counts now match up to the lower expected value. Furthermore, the four questionable sites are now absent. Let me know if and how I can help with your efforts in creating a test if you decide to test for such a scenario. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the changes against a dataset that I know displays the discrepancy and now the discrepant sites are absent from the hets.tsv such that T-N.hets.normal.tsv
and N.hets.tsv
match in sites exactly.
OK, thanks! @MartonKN, can you take a quick look at the code? |
… in ModelSegments to be consistent with filtering of case hets.
27f75f7
to
76b34e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with my code review, everything looks good.
hetAllelicCounts = new AllelicCountCollection( | ||
metadata, | ||
filteredAllelicCounts.getRecords().stream() | ||
.filter(ac -> hetNormalAllelicCountSites.contains(ac.getInterval())) | ||
.filter(ac -> hetNormalAllelicCounts.getOverlapDetector().overlapsAny(ac)) | ||
.collect(Collectors.toList())); | ||
final File hetAllelicCountsFile = new File(outputDir, outputPrefix + HET_ALLELIC_COUNTS_FILE_SUFFIX); | ||
hetAllelicCounts.write(hetAllelicCountsFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good :)
public OverlapDetector<RECORD> getOverlapDetector() { | ||
return OverlapDetector.create(getRecords()); | ||
return overlapDetector.get(); | ||
} | ||
|
||
public Comparator<Locatable> getComparator() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -67,7 +67,7 @@ public AlleleFractionSegmentCollection findSegmentation(final int maxNumChangepo | |||
"Log-linear factor for the penalty on the number of changepoints per chromosome must be non-negative."); | |||
|
|||
logger.info(String.format("Finding changepoints in %d data points and %d chromosomes...", | |||
allelicCounts.getRecords().size(), allelicCountsPerChromosome.size())); | |||
allelicCounts.size(), allelicCountsPerChromosome.size())); | |||
|
|||
//loop over chromosomes, find changepoints, and create allele-fraction segments | |||
final List<AlleleFractionSegment> segments = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -70,7 +70,7 @@ public CopyRatioSegmentCollection findSegmentation(final int maxNumChangepointsP | |||
"Log-linear factor for the penalty on the number of changepoints per chromosome must be non-negative."); | |||
|
|||
logger.info(String.format("Finding changepoints in %d data points and %d chromosomes...", | |||
denoisedCopyRatios.getRecords().size(), denoisedCopyRatiosPerChromosome.size())); | |||
denoisedCopyRatios.size(), denoisedCopyRatiosPerChromosome.size())); | |||
|
|||
//loop over chromosomes, find changepoints, and create copy-ratio segments | |||
final List<CopyRatioSegment> segments = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -143,7 +143,7 @@ public MultidimensionalSegmentCollection findSegmentation(final int maxNumChange | |||
kernelVarianceCopyRatio, kernelVarianceAlleleFraction, kernelScalingAlleleFraction); | |||
|
|||
logger.info(String.format("Finding changepoints in (%d, %d) data points and %d chromosomes...", | |||
denoisedCopyRatios.getRecords().size(), allelicCounts.size(), multidimensionalPointsPerChromosome.size())); | |||
denoisedCopyRatios.size(), allelicCounts.size(), multidimensionalPointsPerChromosome.size())); | |||
|
|||
//loop over chromosomes, find changepoints, and create allele-fraction segments | |||
final List<MultidimensionalSegment> segments = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
EXPECTED_METADATA.getSequenceDictionary(), hetNormalAllelicCounts.getMetadata().getSequenceDictionary())); | ||
} | ||
if (isAllelicCountsPresent && isNormalAllelicCountsPresent) { | ||
Assert.assertEquals(hetAllelicCounts.getIntervals(), hetNormalAllelicCounts.getIntervals()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
… in ModelSegments to be consistent with filtering of case hets.
Also some miscellaneous code cleanup.