Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up from O(n^2) iteration over all left/right primer hits in OffTargetDetector._to_amplicons #94

Open
ameynert opened this issue Nov 18, 2024 · 1 comment · May be fixed by #99
Open

Comments

@ameynert
Copy link

OffTargetDetector._to_amplicons uses itertools.product over the left and right primer hits and evaluates each of them in a loop to identify valid left/right hit combinations for the pair.

Suggestion:

  1. Get hits by primer sequence from mappings_of
  2. Split into left and right hits
  3. Group hits by refname
  4. For each refname, split into left +, right -, left -, right+ hits
  5. Build amplicons from left + and right - (amplicon strand +), left - and right + (amplicon strand -)

Building amplicons - dealing with only positive & negative in correct orientation and known to be on the same reference, with known amplicon strand

  1. itertools.product over positive & negative primer hits
  2. Check product size against min/max amplicon size range
  3. If in range, return the product hit as positive start to negative end with the known reference name and amplicon strand
@tfenne
Copy link
Member

tfenne commented Dec 4, 2024

I think this sounds like a great idea. I would make a couple of suggestions also:

  1. You could probably do the splitting into left/right/+/- and by refname all in one pass ... if you created a simple dataclass that held four lists (one for each strand and orientation). You could then have a dict[refname, new_class] and place hits into the appropriate place.
  2. If you sort all the sub-collections by start_pos you can short-circuit more, to avoid checking more combinations. I.e. for a given "left" primer you can stop checking "right" primer hits once you get beyond the maximum amplicon size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants