-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intensity Table Concat Processing #1118
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1118 +/- ##
==========================================
+ Coverage 88.95% 89.08% +0.13%
==========================================
Files 127 128 +1
Lines 4824 4891 +67
==========================================
+ Hits 4291 4357 +66
- Misses 533 534 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review checkpoint
concatenated = IntensityTable.concatanate_intensity_tables( | ||
[it1, it2], overlap_strategy=OverlapStrategy.TAKE_MAX) | ||
|
||
# The overlap section hits half of the spots from each intensity table, 5 from it1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait what? if it hits 5 of the spots from it1, then shouldn't we get a total of 25 spots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both the sel and remove_area_of_xarray methods are inclusive...so we get one spot in the comparison count and the concatenation...maybe this is wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would dump the table to make sure it is consistent with your understanding, though I suspect you are correct. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but please see comments.
|
||
""" | ||
all_overlaps: List[Tuple[int, int]] = list() | ||
for idx1, idx2 in itertools.combinations(range(len(xarrays)), 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is, as you pointed out, a n^2 operation. but each operation requires scanning the table to find the min/max for the coordinates. can you get some numbers from @ambrosejcarr for realistic FOV and spot counts, and build a set of tables that reflect that, and see what the perf is like? we can likely significantly shrink the cost by precomputing a min-max for each xarray and reusing that. it would also help inform whether we need to do the nlogn approach.
Finally, it might be worth trying to actually merge a large set of intensity tables, even if synthetic, to see if there are any performance implications to any of the xarray ops used during the merge.
overlap_method = OVERLAP_STRATEGY_MAP[overlap_strategy] | ||
idx1, idx2 = indices | ||
# modify IntensityTables based on overlap strategy | ||
it1, it2 = overlap_method(its[idx1], its[idx2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this strategy might break down where you have three intensity tables that overlap in one area. It might be easier to illustrate this if I'm not making sense.
concatenated = IntensityTable.concatanate_intensity_tables( | ||
[it1, it2], overlap_strategy=OverlapStrategy.TAKE_MAX) | ||
|
||
# The overlap section hits half of the spots from each intensity table, 5 from it1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would dump the table to make sure it is consistent with your understanding, though I suspect you are correct. :)
This PR introduces the idea of processing a list of IntensityTables with an overlap strategy before concatenating them. It also explicitly adds the TAKE_MAX strategy described by @berl in which we compare overlapping intensity tables and remove spots from the one with less spots in the overlap.
The PR also includes unit tests for overlapping_util methods.
NOTE:
In an effort to make the code easier to understand I went with O(n^2) approach to finding overlaps within a list of IntensityTables (just comparing each on to each other one). But there is a O(nlogn) approach that involves sorting the list by x/y coordinates first. If we think we'll need to optimize this process for large lists I can refactor.