Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): refactor base AnnData class to use a AnnDataBase abstract class #949

Closed
wants to merge 15 commits into from

Conversation

ilan-gold
Copy link
Contributor

This PR refactors the AnnData class to inherit from an AnnDataBase class so that we can develop new candidate AnnData classes more easily and in a structured. At the moment there a number of issues I see here:

  1. Generating the different axis classes (i.e AxisArray for obsm and DataFrame for var) is not particularly uniform, so we can't just set up AnnData classes with different classes as attributes - instead we have these _assign_XXXX methods
  2. The current AnnData class performs a series of checks that are not easily decoupled from the initialization process. I've taken a stab at this but it will require more refinement.
  3. The obs_names and var_names (which are basically indices for the AnnData object at the moment) are coupled to the existence of obs and var, so it's a bit of a dog eating it's tail. This gets back to the first point - if we had an easy/uniform was to generate "empty" versions of the wrapping classes like DataFrame, this wouldn't be such an issue. I'm not really sure this is an issue, though. We could set up a check on the obs_names property/setter to just look for the existence fo obs first and handle the case where it doesn't exist.
  4. I'm not sure the abstraction is so great - it defines a sequence of methods in _init_as_actual that are both somehwat logical and based on the current AnnData method but there is probably a cleaner way of doing this. My hope would be that people could fill in these methods without touching _init_as_actual
  5. I'm not sure _init_as_view needs to be abstracted. It's not so long.
  6. There are A LOT of abstract methods, which may be confusing.

There might be other things, but these are my first thoughts.

@ilan-gold
Copy link
Contributor Author

  1. How do we determine what contract obs, obsm, etc. should fulfill to work with AnnData's indexing/view system? Should we be particular about this?

@codecov
Copy link

codecov bot commented Mar 9, 2023

Codecov Report

Attention: Patch coverage is 76.23762% with 24 lines in your changes missing coverage. Please review.

Project coverage is 82.95%. Comparing base (d4cde5c) to head (dc9f12f).
Report is 213 commits behind head on main.

Files with missing lines Patch % Lines
anndata/_core/anndata_base.py 75.25% 24 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #949      +/-   ##
==========================================
- Coverage   83.08%   82.95%   -0.14%     
==========================================
  Files          34       35       +1     
  Lines        5404     5502      +98     
==========================================
+ Hits         4490     4564      +74     
- Misses        914      938      +24     
Files with missing lines Coverage Δ
anndata/_core/anndata.py 85.08% <100.00%> (+0.01%) ⬆️
anndata/_core/anndata_base.py 75.25% <75.25%> (ø)
---- 🚨 Try these New Features:

Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks interesting! Put some comments down, but have some additional thoughts:

  • __getitem__(self, idx) -> AbstractAnnData: abstract method?
  • write_h5ad/ write_zarr abstract methods?

The obs_names and var_names (which are basically indices for the AnnData object at the moment) are coupled to the existence of obs and var, so it's a bit of a dog eating it's tail. This gets back to the first point - if we had an easy/uniform was to generate "empty" versions of the wrapping classes like DataFrame, this wouldn't be such an issue.

For pandas dataframes, this is easy: pd.DataFrame({}, index=index)

But one could imagine using a pyarrow Table, which doesn't have an index.


There are A LOT of abstract methods, which may be confusing.

Yeah. It would be good to think about including more concrete implementations, instead of just a list of things to implement.

I'm wondering if we could even just explicitly say layers, obsm, varm, obsp, varp are aligned mappings.

How do we determine what contract obs, obsm, etc. should fulfill to work with AnnData's indexing/view system? Should we be particular about this?

For sure on things like shape. Which would be addressed by my point above.

Other things (like indexing behavior) are difficult to specify/ determine statically and can be expensive to ensure dynamically.

anndata/_core/anndata_base.py Outdated Show resolved Hide resolved
anndata/_core/anndata_base.py Outdated Show resolved Hide resolved
anndata/_core/anndata_base.py Outdated Show resolved Hide resolved
anndata/_core/anndata_base.py Outdated Show resolved Hide resolved
@ilan-gold
Copy link
Contributor Author

No need after #1247

@ilan-gold ilan-gold closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants