Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a missing special case in MarkDuplicates ReadsKey code #4899

Merged
merged 2 commits into from
Jun 18, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -55,21 +55,39 @@ public Pair(final GATKRead read1, final GATKRead read2, final SAMFileHeader head
final int read1UnclippedStart = ReadUtils.getStrandedUnclippedStart(read1);
final int read2UnclippedStart = ReadUtils.getStrandedUnclippedStart(read2);

if( read1UnclippedStart < read2UnclippedStart ){
int read1ReferenceIndex = ReadUtils.getReferenceIndex(read1,header);
int read2ReferenceIndex = ReadUtils.getReferenceIndex(read2,header);

if( read1ReferenceIndex != read2ReferenceIndex ? read1ReferenceIndex < read2ReferenceIndex : read1UnclippedStart <= read2UnclippedStart ){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this would be easier to read as

if( read1ReferenceIndex <= read2ReferenceIndex && read1UnclippedStart <= read2UnclippedStart) {

unless I missed something there that makes them not equivalent

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No actually that is not equivalent, if read2RefIndex > read1RefIndex BUT it starts earlier on its contig that statement would be false when it should not be

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, heh, you're write. I didn't think that through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test for this edge case that we were missing.

first = read1;
second = read2;
} else {
first = read2;
second = read1;
}
// Keep track of the orientation of read1 and read2 as it is important for optical duplicate marking
wasFlipped = second.isFirstOfPair();

firstStartPosition = first.getAssignedStart();

// if the two read ends are in the same position, pointing in opposite directions,
// the orientation is undefined and the procedure above
// will depend on the order of the reads in the file.
// To avoid this, and match picard's behavior in this case, we ensure the orientation will be FR:
if (read1ReferenceIndex == read2ReferenceIndex &&
read1UnclippedStart == read2UnclippedStart &&
first.isReverseStrand() && !second.isReverseStrand()) {
// setting to FR for consistencies sake. (which involves flipping) if both reads had the same unclipped start
GATKRead tmp = first;
first = second;
second = tmp;
}

this.key = ReadsKey.getKeyForPair(header, first, second, headerLibraryMap);

isRead1ReverseStrand = first.isReverseStrand();
isRead2ReverseStrand = second.isReverseStrand();

// Keep track of the orientation of read1 and read2 as it is important for optical duplicate marking
wasFlipped = second.isFirstOfPair();

this.key = ReadsKey.getKeyForPair(header, first, second, headerLibraryMap);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,29 @@ public void testOpticalDuplicatesAndPCRDuplicatesOrientationDifference() {
tester.runTest();
}

@Test
// To match picard, when two reads are grouped into a duplicate group and have the same unclipped start position we will unify their orientations
// to FR if they point in opposite directions. This test asserts that two pairs of reads that ultimately all map to 93470499-93470499 with clipping
// and orientations will ultimately be marked and mapped together even though they have opposite orientations (one is FR while the other is RF).
public void testReadSameStartPositionOrientationOverride() {
final AbstractMarkDuplicatesTester tester = getTester();
tester.setExpectedOpticalDuplicate(0);
tester.addMatePair("HJYFJCCXX160204:8:1124:31659:21526", 19, 19, 93470380, 93470499, false, false, false, false, "31S111M9S", "64M87S", true, false, false, false, false, DEFAULT_BASE_QUALITY, "1"); // after clipping these both start on 93470499
tester.addMatePair("HJYFJCCXX160204:8:1124:31659:21527", 19, 19, 93470499, 93470380, false, false, true, true, "64M87S", "31S111M9S", false, true, false, false, false, DEFAULT_BASE_QUALITY, "1"); // after clipping these both start on 93470499
tester.runTest();
}

@Test
// This asserts that we are flipping reads in the Pair object based on both start position and contig index, this does not
// make a difference unless the start position is the same across two contigs, so we assert it is handled properly
public void testReadsHaveSameStartPositionButDifferentChromosomeNonEquivalence() {
final AbstractMarkDuplicatesTester tester = getTester();
tester.setExpectedOpticalDuplicate(0);
tester.addMatePair("HJYFJCCXX160204:8:1124:31659:21526", 19, 20, 93470380, 93470499, false, false, false, false, "31S111M9S", "64M87S", true, false, false, false, false, DEFAULT_BASE_QUALITY, "1"); // after clipping these both start on 93470499
tester.addMatePair("HJYFJCCXX160204:8:1124:31659:21527", 20, 19, 93470499, 93470380, false, false, true, true, "64M87S", "31S111M9S", false, true, false, false, false, DEFAULT_BASE_QUALITY, "1"); // after clipping these both start on 93470499
tester.runTest();
}

@Test
public void testOpticalDuplicateClusterSamePositionNoOpticalDuplicatesWithinPixelDistance() {
final AbstractMarkDuplicatesTester tester = getTester();
Expand Down