Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Consider porting extension accessor from xarray #14781

Closed
chris-b1 opened this issue Dec 1, 2016 · 7 comments
Closed

API: Consider porting extension accessor from xarray #14781

chris-b1 opened this issue Dec 1, 2016 · 7 comments
Labels
API Design Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@chris-b1
Copy link
Contributor

chris-b1 commented Dec 1, 2016

xarray provides a decorator to register a custom accessor on a Dataset - see docs here.

It's a bit of a blunt tool, in that it applies to all instances, but outside of that, I found it to be a nice alternative to subclassing or a composition class.

I haven't looked at the implementation, so not sure how hard it would be retrofit onto DataFrame

@shoyer - any thoughts or feedback from people using it?

@chris-b1 chris-b1 added API Design Needs Discussion Requires discussion from core team before further action labels Dec 1, 2016
@shoyer
Copy link
Member

shoyer commented Dec 1, 2016

This is pretty lightweight and self-contained -- would be easy to copy into pandas: https://github.com/pydata/xarray/blob/master/xarray/core/extensions.py

The only change in the works is switching to warning instead of erroring when overriding a pre-existing attribute: pydata/xarray#1149

@jbrockmendel
Copy link
Member

Following up on this instead of opening a new issue: are accessors considered user-facing? If so, is there a One True Way to put one together? The str accessor comes from mixing in strings.StringAccessorMixin, while the dt and cat accessors are constructed inside the Series class definition.

Related 1: are _make_dt_accessor and _make_cat_accessor needed in the Series namespace? Would those make sense to put into _dir_deletions?

Related 2: Right now ser.dt == ser.dt returns False. If an enterprising young soul were to make a PR implementing an __eq__ method, what class would that belong in?

@jreback
Copy link
Contributor

jreback commented Jul 3, 2017

@jbrockmendel not sure what u are asking here

@jbrockmendel
Copy link
Member

a) Are users supposed to be able to implement custom accessors?
b) Is there a recipe for doing so?

The docstring example in @shoyer's suggestion is pretty similar to what I have in mind.

@chris-b1
Copy link
Contributor Author

chris-b1 commented Jul 3, 2017

No, the existing accessor machinery is not user facing, this issue was about potentially adding some that is.

I would be supportive of adding something like the xarray extension decorators, so PR would be welcome if you're interested in implementing something like that. I think you could basically copy that implementation, most of the work would be docs and tests.

@jbrockmendel
Copy link
Member

Back-migrating question #16890

Are [make_dt_accessor, _make_cat_accessor, _make_str_accesor] needed in the user-facing namespace for some reason? If not, can we wedge them into _dir_deletions? I know the leading underscore makes it private, but dir(pd.DataFrame) and dir(pd.Series) have 444 and 441 members respectively. This gets pretty imposing.

not sure what you are mean here. accessor are by-definition namespaces that make the discovery easier and much more geared to the dtype.

I communicated poorly by using _add_series_or_dataframe_operations as an example instead of the more-pertinent _make_foo_accessor. My claim is that we can have e.g. the dt accessor without _make_dt_accessor being in the Series namespace. Is there a reason it is needed in the namespace other than creating the accessor?

The "wedge into _dir_deletions" comment referred more generally to items that aren't so easy to get rid of entirely, but may be namespace-clutter. This is not all that pertinent to the accessor topic.

@shoyer
Copy link
Member

shoyer commented Jul 12, 2017

Is there a reason it is needed in the namespace other than creating the accessor?

I think these are basically one-time use methods. We're been somewhat sloppy about adding lots of private methods in the past, but since IPython exposes them in tab-completition there is indeed some virtue to cleaning things up later. Better still might be making these functions instead, so they never pollute the namespace in the first place.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 18, 2017
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 18, 2017
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 18, 2017
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 18, 2017
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 18, 2017
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 8, 2018
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jan 9, 2018
Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing pandas-dev#18767
outside of pandas.

Closes pandas-dev#14781
@jreback jreback added this to the 0.23.0 milestone Jan 10, 2018
TomAugspurger added a commit that referenced this issue Jan 16, 2018
* ENH: Added public accessor registrar

Adds new methods for registing custom accessors to pandas objects.

This will be helpful for implementing #18767
outside of pandas.

Closes #14781

* PEP8

* Moved to extensions

* More docs

* Fix see also

* DOC: Added whatsnew

* Move to api

* Update post review

* flake8

* Raise the underlying error instead of a RuntimeError

* str validate

* DOC: Moved to developer

* REF: Use public registrars for accessors

* cleanup

* Implemented optional caching

* Document cache

* Tests passing

* Use for plot

* Fix autodoc

* Fix the class instantiation

* Refactor again.

1. Removed optional caching
2. Refactored `Properties` to create the indexes it uses on demand
3. Moved accessor definitions to classes for clarity

* Fix API files

* Remove stale comment

* Tests pass

* DOC: some cleanup

* No need to assign doc

* Rename, shared docs

* Doc __new__

* Use UserWarning

* Update test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants