Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intensity Table Concat Processing #1118

Merged
merged 23 commits into from
Apr 10, 2019
Merged

Intensity Table Concat Processing #1118

merged 23 commits into from
Apr 10, 2019

Conversation

shanaxel42
Copy link
Collaborator

@shanaxel42 shanaxel42 commented Mar 29, 2019

This PR introduces the idea of processing a list of IntensityTables with an overlap strategy before concatenating them. It also explicitly adds the TAKE_MAX strategy described by @berl in which we compare overlapping intensity tables and remove spots from the one with less spots in the overlap.

The PR also includes unit tests for overlapping_util methods.

NOTE:
In an effort to make the code easier to understand I went with O(n^2) approach to finding overlaps within a list of IntensityTables (just comparing each on to each other one). But there is a O(nlogn) approach that involves sorting the list by x/y coordinates first. If we think we'll need to optimize this process for large lists I can refactor.

@shanaxel42 shanaxel42 requested a review from ttung March 29, 2019 23:48
@codecov-io
Copy link

codecov-io commented Mar 29, 2019

Codecov Report

Merging #1118 into master will increase coverage by 0.13%.
The diff coverage is 97.1%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1118      +/-   ##
==========================================
+ Coverage   88.95%   89.08%   +0.13%     
==========================================
  Files         127      128       +1     
  Lines        4824     4891      +67     
==========================================
+ Hits         4291     4357      +66     
- Misses        533      534       +1
Impacted Files Coverage Δ
starfish/types/__init__.py 100% <ø> (ø) ⬆️
starfish/types/_constants.py 100% <100%> (ø) ⬆️
starfish/intensity_table/intensity_table.py 94.44% <100%> (+1.31%) ⬆️
starfish/util/overlap_utils.py 96.15% <96.15%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb37ba5...4da1a19. Read the comment docs.

@shanaxel42 shanaxel42 requested a review from berl March 29, 2019 23:48
Copy link
Collaborator

@ttung ttung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review checkpoint

starfish/intensity_table/intensity_table.py Outdated Show resolved Hide resolved
starfish/util/overlap_utils.py Outdated Show resolved Hide resolved
starfish/util/overlap_utils.py Outdated Show resolved Hide resolved
starfish/util/overlap_utils.py Outdated Show resolved Hide resolved
starfish/util/overlap_utils.py Outdated Show resolved Hide resolved
starfish/util/overlap_utils.py Outdated Show resolved Hide resolved
starfish/util/overlap_utils.py Outdated Show resolved Hide resolved
starfish/util/overlap_utils.py Outdated Show resolved Hide resolved
starfish/test/test_overlap_utils.py Outdated Show resolved Hide resolved
concatenated = IntensityTable.concatanate_intensity_tables(
[it1, it2], overlap_strategy=OverlapStrategy.TAKE_MAX)

# The overlap section hits half of the spots from each intensity table, 5 from it1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait what? if it hits 5 of the spots from it1, then shouldn't we get a total of 25 spots?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both the sel and remove_area_of_xarray methods are inclusive...so we get one spot in the comparison count and the concatenation...maybe this is wrong?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would dump the table to make sure it is consistent with your understanding, though I suspect you are correct. :)

@shanaxel42 shanaxel42 requested review from ttung and ambrosejcarr April 8, 2019 17:47
Copy link
Collaborator

@ttung ttung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but please see comments.


"""
all_overlaps: List[Tuple[int, int]] = list()
for idx1, idx2 in itertools.combinations(range(len(xarrays)), 2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is, as you pointed out, a n^2 operation. but each operation requires scanning the table to find the min/max for the coordinates. can you get some numbers from @ambrosejcarr for realistic FOV and spot counts, and build a set of tables that reflect that, and see what the perf is like? we can likely significantly shrink the cost by precomputing a min-max for each xarray and reusing that. it would also help inform whether we need to do the nlogn approach.

Finally, it might be worth trying to actually merge a large set of intensity tables, even if synthetic, to see if there are any performance implications to any of the xarray ops used during the merge.

starfish/util/overlap_utils.py Outdated Show resolved Hide resolved
starfish/util/overlap_utils.py Outdated Show resolved Hide resolved
starfish/intensity_table/intensity_table.py Outdated Show resolved Hide resolved
overlap_method = OVERLAP_STRATEGY_MAP[overlap_strategy]
idx1, idx2 = indices
# modify IntensityTables based on overlap strategy
it1, it2 = overlap_method(its[idx1], its[idx2])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this strategy might break down where you have three intensity tables that overlap in one area. It might be easier to illustrate this if I'm not making sense.

starfish/intensity_table/intensity_table.py Outdated Show resolved Hide resolved
starfish/intensity_table/intensity_table.py Outdated Show resolved Hide resolved
concatenated = IntensityTable.concatanate_intensity_tables(
[it1, it2], overlap_strategy=OverlapStrategy.TAKE_MAX)

# The overlap section hits half of the spots from each intensity table, 5 from it1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would dump the table to make sure it is consistent with your understanding, though I suspect you are correct. :)

starfish/test/test_overlap_utils.py Outdated Show resolved Hide resolved
starfish/util/overlap_utils.py Outdated Show resolved Hide resolved
@shanaxel42 shanaxel42 merged commit 1bd8abf into master Apr 10, 2019
@shanaxel42 shanaxel42 deleted the saxelrod-special-concat branch April 10, 2019 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants