feat!: support reference alignment computation for negative stranded intervals #138

Veyron2121 · 2025-03-04T18:42:23Z

Fixes #130.

Previously, if a user wanted to get the reference alignment for a negative strand through apply_variants, they'd get a ValueError saying that it was not supported. This PR adds support for this feature.

The alignment generation is done by first flipping the positive strand alignment and then translating it so that it is returned in an ascending order.

…r_anchor

…nchor_cases

ovesh · 2025-03-04T21:46:53Z

Breaks backward compat, so need to edit PR title to feat!

ovesh

In addition to comments, I'd like to get @AliceDG 's approval on correctness of the tests.

ovesh · 2025-03-04T21:10:57Z

docs-src/develop.rst

@@ -47,7 +47,7 @@ by linking your source tree from python's ``site-packages``.

 Finally, run the all the tests::

-    python -m unittest discover
+    CI=1 python -m unittest discover


I gave you bad advice. This is to skip running some tests in test_data_manager.py and test_gk_data.py that require cloud credentials, but unfortunately this will also skip some tests it's better to run. I'll think of a better solution, in the meantime can you revert the 2 changes to this file?

ovesh · 2025-03-04T21:14:19Z

genome_kit/_apply_variants.py

+        if isinstance(i, tuple):  # If the item is a tuple, add it to the temp list
+            tmp.append(i)
+        else:
+            if tmp:


I prefer not to rely on falsey values.

Suggested change

if tmp:

if len(tmp) > 0:

ovesh · 2025-03-04T21:26:33Z

genome_kit/_apply_variants.py

+    This encompasses two steps: firstly, it flips the provided alignment, since
+    negative strand intervals are defined from the 3' to the 5' end. Secondly,
+    it translates the alignment so that it is reported in ascending order
+    instead of descending order.


replace with a small example

ovesh · 2025-03-04T21:39:47Z

genome_kit/_apply_variants.py

@@ -195,7 +233,8 @@ def apply_variants(sequence, variants, interval, reference_alignment=False):
        var_sequence = reverse_complement(var_sequence)

        if reference_alignment:
-            raise ValueError("Reference alignment only work on forward strand.")
+            # raise ValueError("Reference alignment only work on forward strand.")


Suggested change

# raise ValueError("Reference alignment only work on forward strand.")

ovesh · 2025-03-04T21:43:12Z

genome_kit/_apply_variants.py

+        else:
+            flipped_alignment.append(highest_index-i)
+    return flipped_alignment
+

 def apply_variants(sequence, variants, interval, reference_alignment=False):


Can you document the format of the returned reference_alignment?

This reverts commit b668ab7.

tests/test_apply_variants.py

genome_kit/_apply_variants.py

tests/test_apply_variants.py

s22chan · 2025-03-05T21:17:57Z

tests/test_interval.py

+        interval = Interval('chr1', '-', 5, 10, 'hg19', 8)
+        self.assertEqual(interval.anchor_offset, 0)
+


why add this test but not mirror the assert on the anchor value?

s22chan · 2025-03-05T21:19:40Z

genome_kit/_apply_variants.py

+    This encompasses two steps: firstly, it flips the provided alignment, since
+    negative strand intervals are defined from the 3' to the 5' end. Secondly,
+    it translates the alignment so that it is reported in ascending order


why don't we do this in a single pass?

In the implementation, this is done in a single pass. I thought separating out the steps in the docstring would make it easier to understand!

It's iterating over the sequence twice as far as I can tell

tests/test_apply_variants.py

s22chan · 2025-03-05T21:59:19Z

tests/test_apply_variants.py

        self.assertEqual(reference_alignment, [-1, 0, 1, 2, 3, 4, 6, 7, 8, 9])

+        reference_alignment = apply_variants(genome37.dna, variants, negative_strand_interval, reference_alignment=True)[1]
+        self.assertEqual(reference_alignment, [0, 1, 2, 3, 5, 6, 7, 8, 9, 10])


I might be getting confused but I'm not sure the original case is correct even: I see a deletion of 2, where the anchor is in the middle?

see https://deepgenomics.github.io/GenomeKit/anchors.html#deletions-at-the-anchor

5p-->3p +'ve 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 C G T A C G T A C G G C A T G C A T G C 9 8 7 6 5 4 3 2 1 0 deleting pos [10,12) should delete 5,6 4 5 6 7 8 9 10 11 12 13 14 15 16 -1 0 1 2 3 4 5 6 7 8 9 10 A C G T A C . |. A C G T T G C A T G . |. T G C A 10 9 8 7 6 5 4 3 2 1 0 -1

i'll take a look later but likely it's not splitting the var over apply left + right,

s22chan · 2025-03-05T22:22:43Z

genome_kit/_apply_variants.py

+        (if ``reference_alignment=True``). The alignment is comprised of integers
+        and (int, int) tuples, with the tuples signifying an insertion of
+        nucleotides. The first index of these tuples denotes the index of the
+        nucleotide immediately proceeding the insertion sequence, while the
+        second index denotes the index of that nucleotide relative to all the
+        other nucleotides included in that insertion operation.


a bit wordy and the second index can be misinterpreted, this might be more concise

Suggested change

(if ``reference_alignment=True``). The alignment is comprised of integers

and (int, int) tuples, with the tuples signifying an insertion of

nucleotides. The first index of these tuples denotes the index of the

nucleotide immediately proceeding the insertion sequence, while the

second index denotes the index of that nucleotide relative to all the

other nucleotides included in that insertion operation.

(if ``reference_alignment=True``). The alignment is comprised of

int/tuple[int, int] offsets to `interval.5p`.

The tuples denote insertions, where tuple[0] is the offset after the

insertion and tuple[1] is the indexes the inserted sequence.

ovesh and others added 8 commits February 25, 2025 13:14

test: add negative strand tests

1003a69

docs: added CI=1 instruction

b668ab7

feat: added negative strand tests to test_reference_alignment_no_anchor

1c4e605

feat: added negative strand tests to test_reference_alignment_interio…

b5f39ca

…r_anchor

feat: added negative strand tests to test_reference_alignment_other_a…

3bbd0bf

…nchor_cases

feat: missed a test case in test_reference_alignment_interior_anchor

ef702bf

fix: remove redundant test

2844a03

feat: added implementation for reverse strand alignment

61ae2c9

ovesh requested review from s22chan and ovesh March 4, 2025 21:25

ovesh requested changes Mar 4, 2025

View reviewed changes

Veyron2121 added 3 commits March 5, 2025 09:39

Revert "docs: added CI=1 instruction"

030c67c

This reverts commit b668ab7.

fix: use len(temp)>0

dbf4231

chore: remove commented code

570ea04

s22chan requested changes Mar 5, 2025

View reviewed changes

tests/test_apply_variants.py Outdated Show resolved Hide resolved

genome_kit/_apply_variants.py Show resolved Hide resolved

tests/test_apply_variants.py Show resolved Hide resolved

Veyron2121 changed the title ~~feat: support reference alignment computation for negative stranded intervals~~ feat!: support reference alignment computation for negative stranded intervals Mar 5, 2025

Veyron2121 added 3 commits March 5, 2025 15:01

docs: added example for alignment flip

2673c4d

docs: documentation on formatting for reference_alignment

fb0e908

fix: use gk._utils.reverse_complement instead

637c620

s22chan reviewed Mar 5, 2025

View reviewed changes

tests/test_apply_variants.py Outdated Show resolved Hide resolved

docs: fixed test comment to be more accurate

c8bce59

s22chan reviewed Mar 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: support reference alignment computation for negative stranded intervals #138

feat!: support reference alignment computation for negative stranded intervals #138

Veyron2121 commented Mar 4, 2025

ovesh commented Mar 4, 2025

ovesh left a comment

ovesh Mar 4, 2025

ovesh Mar 4, 2025

ovesh Mar 4, 2025

ovesh Mar 4, 2025

ovesh Mar 4, 2025

s22chan Mar 5, 2025

s22chan Mar 5, 2025

Veyron2121 Mar 5, 2025

s22chan Mar 5, 2025

s22chan Mar 5, 2025 •

edited

Loading

s22chan Mar 6, 2025

s22chan Mar 5, 2025

		interval = Interval('chr1', '-', 5, 10, 'hg19', 8)
		self.assertEqual(interval.anchor_offset, 0)

feat!: support reference alignment computation for negative stranded intervals #138

Are you sure you want to change the base?

feat!: support reference alignment computation for negative stranded intervals #138

Conversation

Veyron2121 commented Mar 4, 2025

ovesh commented Mar 4, 2025

ovesh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s22chan Mar 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s22chan Mar 5, 2025 •

edited

Loading