-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: support reference alignment computation for negative stranded intervals #138
Open
Veyron2121
wants to merge
15
commits into
deepgenomics:main
Choose a base branch
from
Veyron2121:varun/ref-align-neg-strand
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+367
−39
Open
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
1003a69
test: add negative strand tests
ovesh b668ab7
docs: added CI=1 instruction
Veyron2121 1c4e605
feat: added negative strand tests to test_reference_alignment_no_anchor
Veyron2121 b5f39ca
feat: added negative strand tests to test_reference_alignment_interio…
Veyron2121 3bbd0bf
feat: added negative strand tests to test_reference_alignment_other_a…
Veyron2121 ef702bf
feat: missed a test case in test_reference_alignment_interior_anchor
Veyron2121 2844a03
fix: remove redundant test
Veyron2121 61ae2c9
feat: added implementation for reverse strand alignment
Veyron2121 030c67c
Revert "docs: added CI=1 instruction"
Veyron2121 dbf4231
fix: use len(temp)>0
Veyron2121 570ea04
chore: remove commented code
Veyron2121 2673c4d
docs: added example for alignment flip
Veyron2121 fb0e908
docs: documentation on formatting for reference_alignment
Veyron2121 637c620
fix: use gk._utils.reverse_complement instead
Veyron2121 c8bce59
docs: fixed test comment to be more accurate
Veyron2121 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -53,6 +53,60 @@ def check_variants_list(reference_genome, variants): | |||||||||||||||||||||
|
||||||||||||||||||||||
return variants | ||||||||||||||||||||||
|
||||||||||||||||||||||
def _reverse_tuples_in_alignment(alignment): | ||||||||||||||||||||||
result = [] | ||||||||||||||||||||||
tmp = [] | ||||||||||||||||||||||
|
||||||||||||||||||||||
for i in alignment: | ||||||||||||||||||||||
if isinstance(i, tuple): # If the item is a tuple, add it to the temp list | ||||||||||||||||||||||
tmp.append(i) | ||||||||||||||||||||||
else: | ||||||||||||||||||||||
if len(tmp) > 0: | ||||||||||||||||||||||
# If we encountered a non-tuple and there was a tuple sequence, reverse it | ||||||||||||||||||||||
result.extend(tmp[::-1]) | ||||||||||||||||||||||
tmp = [] | ||||||||||||||||||||||
result.append(i) | ||||||||||||||||||||||
|
||||||||||||||||||||||
# If there were any tuples left in temp at the end, reverse and add them | ||||||||||||||||||||||
if len(tmp) > 0: | ||||||||||||||||||||||
result.extend(tmp[::-1]) | ||||||||||||||||||||||
|
||||||||||||||||||||||
return result | ||||||||||||||||||||||
|
||||||||||||||||||||||
def _flip_alignment_for_negative_strand(interval, alignment): | ||||||||||||||||||||||
"""Given an alignment for the positive strand, flip it so that it represents the alignment for the negative strand. | ||||||||||||||||||||||
|
||||||||||||||||||||||
This encompasses two steps: firstly, it flips the provided alignment, since | ||||||||||||||||||||||
negative strand intervals are defined from the 3' to the 5' end. Secondly, | ||||||||||||||||||||||
it translates the alignment so that it is reported in ascending order | ||||||||||||||||||||||
instead of descending order. Here's an example: | ||||||||||||||||||||||
|
||||||||||||||||||||||
Say we had Interval('chr1', '-', 0, 10, "hg37"), with 2 nt insertion at | ||||||||||||||||||||||
position 5 and a deletion of position 7. If these variants are applied to | ||||||||||||||||||||||
the positive strand, we'd get the following alignment back: | ||||||||||||||||||||||
|
||||||||||||||||||||||
[0, 1, 2, 3, 4, (5, 0), (5, 1), 5, 6, 8] | ||||||||||||||||||||||
|
||||||||||||||||||||||
In order to convert this to the negative strand, we'd follow these steps: | ||||||||||||||||||||||
|
||||||||||||||||||||||
1. Flip the alignment: [8, 6, 5, (5, 1), (5, 0), 4, 3, 2, 1, 0] | ||||||||||||||||||||||
2. Reverse the insertion alignment: [8, 6, 5, (5, 0), (5, 1), 4, 3, 2, 1, 0] | ||||||||||||||||||||||
3. Translate the alignment to be in ascending order: | ||||||||||||||||||||||
3a. [9-8, 9-6, 9-5, (9-5, 0), (9-5, 1), 9-4, 9-3, 9-2, 9-1, 9-0] | ||||||||||||||||||||||
3b. [1, 3, 4, (4, 0), (4, 1), 5, 6, 7, 8, 9] | ||||||||||||||||||||||
4. Increment the tuples by 1 to anchor them to the nucleotide following them: | ||||||||||||||||||||||
[1, 3, 4, (5, 0), (5, 1), 5, 6, 7, 8, 9] | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
highest_index = len(interval) - 1 | ||||||||||||||||||||||
flipped_alignment = [] | ||||||||||||||||||||||
alignment = _reverse_tuples_in_alignment(alignment) | ||||||||||||||||||||||
for i in alignment[::-1]: | ||||||||||||||||||||||
if isinstance(i, tuple): | ||||||||||||||||||||||
flipped_alignment.append((highest_index-i[0]+1, i[1])) | ||||||||||||||||||||||
s22chan marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
else: | ||||||||||||||||||||||
flipped_alignment.append(highest_index-i) | ||||||||||||||||||||||
return flipped_alignment | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
def apply_variants(sequence, variants, interval, reference_alignment=False): | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you document the format of the returned reference_alignment? |
||||||||||||||||||||||
"""Apply variants to a sequence interval. | ||||||||||||||||||||||
|
@@ -81,7 +135,12 @@ def apply_variants(sequence, variants, interval, reference_alignment=False): | |||||||||||||||||||||
|
||||||||||||||||||||||
:py:class:`list` | ||||||||||||||||||||||
The alignment of the variant sequence with the reference sequence | ||||||||||||||||||||||
(if ``reference_alignment=True``). | ||||||||||||||||||||||
(if ``reference_alignment=True``). The alignment is comprised of integers | ||||||||||||||||||||||
and (int, int) tuples, with the tuples signifying an insertion of | ||||||||||||||||||||||
nucleotides. The first index of these tuples denotes the index of the | ||||||||||||||||||||||
nucleotide immediately proceeding the insertion sequence, while the | ||||||||||||||||||||||
second index denotes the index of that nucleotide relative to all the | ||||||||||||||||||||||
other nucleotides included in that insertion operation. | ||||||||||||||||||||||
Comment on lines
+138
to
+143
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a bit wordy and the second index can be misinterpreted, this might be more concise
Suggested change
|
||||||||||||||||||||||
|
||||||||||||||||||||||
""" | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
@@ -195,7 +254,7 @@ def apply_variants(sequence, variants, interval, reference_alignment=False): | |||||||||||||||||||||
var_sequence = reverse_complement(var_sequence) | ||||||||||||||||||||||
|
||||||||||||||||||||||
if reference_alignment: | ||||||||||||||||||||||
raise ValueError("Reference alignment only work on forward strand.") | ||||||||||||||||||||||
alignment = _flip_alignment_for_negative_strand(interval, alignment) | ||||||||||||||||||||||
|
||||||||||||||||||||||
if reference_alignment: | ||||||||||||||||||||||
return var_sequence, alignment | ||||||||||||||||||||||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we do this in a single pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the implementation, this is done in a single pass. I thought separating out the steps in the docstring would make it easier to understand!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's iterating over the sequence twice as far as I can tell