Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Allows users to get microarray samples in mask #155

Merged
merged 8 commits into from
Sep 29, 2020
Merged

Conversation

rmarkello
Copy link
Owner

@rmarkello rmarkello commented Sep 23, 2020

Closes #46.

Adds function abagen.get_samples_in_mask(mask) which allows users to provide a mask (binary or otherwise 🤷‍♂️) and all samples within tolerance of mask boundaries are returned (along with MNI coordinates of samples). If no mask is provided, (i.e., mask=None) all available tissue samples are returned instead.

This required adding a region_agg=None option to abagen.get_expression_data(), though I am choosing not to document this functionality in favor of having people use get_samples_in_mask() directly so that the relevant sample coordinates are returned alongside expression data.

To do:

  • Add smoke tests for get_samples_in_mask()
  • Update documentation for function

Allows users to specify binary mask and all samples within mask
boundaries are returned (as well as MNI coordinates of samples).

If no mask is specified, all samples are returned.
@codecov
Copy link

codecov bot commented Sep 23, 2020

Codecov Report

Merging #155 into master will increase coverage by 0.03%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #155      +/-   ##
==========================================
+ Coverage   90.82%   90.85%   +0.03%     
==========================================
  Files          32       32              
  Lines        2201     2253      +52     
==========================================
+ Hits         1999     2047      +48     
- Misses        202      206       +4     
Impacted Files Coverage Δ
abagen/allen.py 93.07% <93.47%> (-0.03%) ⬇️
abagen/__init__.py 100.00% <100.00%> (ø)
abagen/correct.py 98.60% <100.00%> (ø)
abagen/tests/test_allen.py 100.00% <100.00%> (ø)
abagen/samples_.py 99.21% <0.00%> (-0.79%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0cfe5f7...43f94a5. Read the comment docs.

Mostly smoke tests to ensure that the functions run and return
reasonable-looking outputs 🤷‍♂️
@rmarkello
Copy link
Owner Author

Alright so the docs failure is a typo (shouldn't have done exp.head() but just exp) but it did highlight an issue with gene normalization when using get_samples_in_mask().

Namely, if a relatively small ROI is provided as the mask then often there are relatively few samples found within the mask for each donor. If normalization is set to 'srs' (the default), then 2/N samples will have expression values set to either 0 or 1 (due to the rescaling). While this is not normally an issue when aggregating over donors, when returning sample-level expression data these 0/1 values are retained and can make for very confusing interpretations (since the resulting expression matrix will have one 0 value for each donor and one 1 value for each donor. As such, normalization should probably be done over all available samples—not just those matched to the provided mask. However, this will requires modifying get_expression_data() in such a way that the functionality of that workflow does not change (since we generally want to retain the functionality there, I think—though now I might reconsider how that process should work...).

When normalization is performed during get_expression_data() is now
controllable via the `norm_matched` parameter, which will either
perform normalization before (False) or after (True) dropping
unmatched samples
@rmarkello rmarkello merged commit 0b99f46 into master Sep 29, 2020
@rmarkello rmarkello deleted the mask_samples branch September 29, 2020 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get expression samples inside a supplied mask
1 participant