Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Refactor accessors, unify usage, make "recipe" #17042

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
b77e103
Move PandasDelegate and AccessorProperty; update imports
jbrockmendel Jul 19, 2017
dbc149d
Move apply _shared_docs to functions and attach to methods with copy
jbrockmendel Jul 20, 2017
3c77d94
Implement _make_accessor as classmethod on StringMethods
jbrockmendel Jul 20, 2017
19f7ff6
Add example/recipe
jbrockmendel Jul 20, 2017
d152421
Test to go along with example/recipe
jbrockmendel Jul 20, 2017
101e7e5
Transition to _make_accessor
jbrockmendel Jul 20, 2017
774a35d
Merge branch 'master' into accessory
jbrockmendel Jul 20, 2017
ccec595
Merge branch 'master' into accessory
jbrockmendel Jul 20, 2017
74e4539
Remove unused import that was causing a lint error
jbrockmendel Jul 20, 2017
953598a
merge pulled
jbrockmendel Jul 20, 2017
22d4892
Wrap long line
jbrockmendel Jul 22, 2017
014fae0
Refactor tests and documentation
jbrockmendel Jul 22, 2017
dd8315c
Typos, flake8 fixes, rearrange comments
jbrockmendel Jul 22, 2017
74a237b
Simplify categorical make_accessor args
jbrockmendel Jul 23, 2017
c931d4b
Rename PandasDelegate subclasses FooDelegate
jbrockmendel Jul 25, 2017
6c771b4
Revert import rearrangement; update names FooDelegate
jbrockmendel Jul 25, 2017
d3a4460
Deprecate StringAccessorMixin
jbrockmendel Jul 25, 2017
48f3b4d
Merge branch 'master' into accessory
jbrockmendel Jul 25, 2017
73a0633
lint fixes
jbrockmendel Jul 25, 2017
aa793ad
Merge branch 'accessory' of https://github.com/jbrockmendel/pandas in…
jbrockmendel Jul 25, 2017
264a7e7
Merge branch 'master' into accessory
jbrockmendel Sep 20, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
345 changes: 345 additions & 0 deletions pandas/core/accessors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,345 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""

An example/recipe for creating a custom accessor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove all about the custom use of this. This as a refactor is fine, but this would really need a good usecase to actually add a custom delegate. Happy to have this in internals.rst if its necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer: You commented on this in #14781. Have you found that custom accessors are frequently used by xarray users?

@jreback I can delete it or move it somewhere out of the way. The test_accessors file might make a good place to park the discussion seeing as how it uses the same example anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on the general idea of exposing a way to register custom accessors, though the api in your example seems a little clunky compared to what xarray does? That said, may be better to punt that piece to a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that docs should live in internals.rts, logical companion to the subclassing docs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example seems a little clunky compared to what xarray does

The examples are for pretty different use cases. The xarray example defines a center attribute in terms of multiple existing [column]s of a [DataFrame]. That's why it doesn't need any vectorization boilerplate.

The example I used vectorizes properties/methods of the elements in a Series. StringMethods or CategoricalAccessor work roughly this way. (CombinedDatetimelikeProperties doesn't need to apply the property/method point-wise because they already exist in the underlying Series/Index.)

That said, if this isn't already obvious, that is a shortcoming of the documentation.

There is some avoidable clunkiness in that we are requiring every accessible attribute to be explicitly listed.

Copy link
Member Author

@jbrockmendel jbrockmendel Jul 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed commits that address some of these. Took the example out of the module docstring, put a comment in PandasDelegate.__doc__ pointing readers towards the existing examples.



The primary use case for accessors is when a Series contains instances
of a particular class and we want to access properties/methods of these
instances in Series form.

Suppose we have a custom State class representing US states:

class State(object):
def __repr__(self):
return repr(self.name)

def __init__(self, name):
self.name = name
self._abbrev_dict = {'California': 'CA', 'Alabama': 'AL'}

@property
def abbrev(self):
return self._abbrev_dict[self.name]

@abbrev.setter
def abbrev(self, value):
self._abbrev_dict[self.name] = value

def fips(self):
return {'California': 6, 'Alabama': 1}[self.name]


We can construct a series of these objects:

>>> ser = pd.Series([State('Alabama'), State('California')])
>>> ser
0 'Alabama'
1 'California'
dtype: object

We would like direct access to the `abbrev` property and `fips` method.
One option is to access these manually with `apply`:

>>> ser.apply(lambda x: x.fips())
0 1
1 6
dtype: int64

But doing that repeatedly gets old in a hurry, so we decide to make a
custom accessor. This entails subclassing `PandasDelegate` to specify
what should be accessed and how.

There are four methods that *may* be defined in this subclass, one of which
*must* be defined. The mandatory method is a classmethod called
`_make_accessor`. `_make_accessor` is responsible doing any validation on
inputs for the accessor. In this case, the inputs must be a Series
containing State objects.


class StateDelegate(PandasDelegate):

def __init__(self, values):
self.values = values

@classmethod
def _make_accessor(cls, data):
if not isinstance(data, pd.Series):
raise ValueError('Input must be a Series of States')
elif not data.apply(lambda x: isinstance(x, State)).all():
raise ValueError('All entries must be State objects')
return StateDelegate(data)


With `_make_accessor` defined, we have enough to create the accessor, but
not enough to actually do anything useful with it. In order to access
*methods* of State objects, we implement `_delegate_method`.
`_delegate_method` calls the underlying method for each object in the
series and wraps these in a new Series. The simplest version looks like:

def _delegate_method(self, name, *args, **kwargs):
state_method = lambda x: getattr(x, name)(*args, **kwargs)
return self.values.apply(state_method)

Similarly in order to access *properties* of State objects, we need to
implement `_delegate_property_get`:

def _delegate_property_get(self, name):
state_property = lambda x: getattr(x, name)
return self.values.apply(state_property)


On ocassion, we may want to be able to *set* property being accessed.
This is discouraged, but allowed (as long as the class being accessed
allows the property to be set). Doing so requires implementing
`_delegate_property_set`:

def _delegate_property_set(self, name, new_values):
for (obj, val) in zip(self.values, new_values):
setattr(obj, name, val)


With these implemented, `StateDelegate` knows how to handle methods and
properties. We just need to tell it what names and properties it is
supposed to handle. This is done by decorating the `StateDelegate`
class with `pd.accessors.wrap_delegate_names`. We apply the decorator
once with a list of all the methods the accessor should recognize and
once with a list of all the properties the accessor should recognize.


@wrap_delegate_names(delegate=State,
accessors=["fips"],
typ="method")
@wrap_delegate_names(delegate=State,
accessors=["abbrev"],
typ="property")
class StateDelegate(PandasDelegate):
[...]


We can now pin the `state` accessor to the pd.Series class (we could
alternatively pin it to the pd.Index class with a slightly different
implementation above):

pd.Series.state = accessors.AccessorProperty(StateDelegate)


>>> ser = pd.Series([State('Alabama'), State('California')])
>>> isinstance(ser.state, StateDelegate)
True

>>> ser.state.abbrev
0 AL
1 CA
dtype: object

>>> ser.state.fips()
0 1
1 6

>>> ser.state.abbrev = ['Foo', 'Bar']
>>> ser.state.abbrev
0 Foo
1 Bar
dtype: object



"""
from pandas.core.base import PandasObject
from pandas.core import common as com


class PandasDelegate(PandasObject):
""" an abstract base class for delegating methods/properties

Usage: To make a custom accessor, start by subclassing `Delegate`.
See example in the module-level docstring.

"""

def __init__(self, values):
self.values = values
# #self._freeze()

@classmethod
def _make_accessor(cls, data): # pragma: no cover
raise NotImplementedError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use AbstractMethodError

'It is up to subclasses to implement '
'_make_accessor. This does input validation on the object to '
'which the accessor is being pinned. '
'It should return an instance of `cls`.')

def _delegate_property_get(self, name, *args, **kwargs):
raise TypeError("You cannot access the "
"property {name}".format(name=name))

def _delegate_property_set(self, name, value, *args, **kwargs):
raise TypeError("The property {name} cannot be set".format(name=name))

def _delegate_method(self, name, *args, **kwargs):
raise TypeError("You cannot call method {name}".format(name=name))


class AccessorProperty(object):
"""Descriptor for implementing accessor properties like Series.str
"""

def __init__(self, accessor_cls, construct_accessor=None):
self.accessor_cls = accessor_cls

if construct_accessor is None:
# accessor_cls._make_accessor must be a classmethod
construct_accessor = accessor_cls._make_accessor

self.construct_accessor = construct_accessor
self.__doc__ = accessor_cls.__doc__

def __get__(self, instance, owner=None):
if instance is None:
# this ensures that Series.str.<method> is well defined
return self.accessor_cls
return self.construct_accessor(instance)

def __set__(self, instance, value):
raise AttributeError("can't set attribute")

def __delete__(self, instance):
raise AttributeError("can't delete attribute")


class Delegator(object):
""" Delegator class contains methods that are used by PandasDelegate
and Accesor subclasses, but that so not ultimately belong in
the namespaces of user-facing classes.

Many of these methods *could* be module-level functions, but are
retained as staticmethods for organization purposes.
"""

@staticmethod
def create_delegator_property(name, delegate):
# Note: we really only need the `delegate` here for the docstring

def _getter(self):
return self._delegate_property_get(name)

def _setter(self, new_values):
return self._delegate_property_set(name, new_values)
# TODO: not hit in tests; not sure this is something we
# really want anyway

_getter.__name__ = name
_setter.__name__ = name
_doc = getattr(delegate, name).__doc__
return property(fget=_getter, fset=_setter, doc=_doc)

@staticmethod
def create_delegator_method(name, delegate):
# Note: we really only need the `delegate` here for the docstring

def func(self, *args, **kwargs):
return self._delegate_method(name, *args, **kwargs)

if callable(name):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: you tried this option and decided against it.

# A function/method was passed directly instead of a name
# This may also render the `delegate` arg unnecessary.
func.__name__ = name.__name__ # TODO: is this generally valid?
func.__doc__ = name.__doc__
else:
func.__name__ = name
func.__doc__ = getattr(delegate, name).__doc__
return func

@staticmethod
def delegate_names(delegate, accessors, typ, overwrite=False):
"""
delegate_names decorates class definitions, e.g:

@delegate_names(Categorical, ["categories", "ordered"], "property")
class CategoricalAccessor(PandasDelegate):

@classmethod
def _make_accessor(cls, data):
[...]


This replaces the older usage in which following a class definition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to say what is was, just what it does now.

we would use `Foo._add_delegate_accessors(...)`. The motivation
is that we would like to keep as much of a class's internals inside
the class definition. For things that we cannot keep directly
in the class definition, a decorator is more directly tied to
the definition than a method call outside the definition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paragraph above may be helpful for the dev discussion, but probably doesn't belong in the docstring.

"""
# Note: we really only need the `delegate` here for the docstring

def add_delegate_accessors(cls):
"""
add accessors to cls from the delegate class

Parameters
----------
cls : the class to add the methods/properties to
delegate : the class to get methods/properties & doc-strings
acccessors : string list of accessors to add
typ : 'property' or 'method'
overwrite : boolean, default False
overwrite the method/property in the target class if it exists
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe some asserts to valid types of the accessors and such

for name in accessors:
if typ == "property":
func = Delegator.create_delegator_property(name, delegate)
else:
func = Delegator.create_delegator_method(name, delegate)

# Allow for a callable to be passed instead of a name.
title = com._get_callable_name(name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: you decided against this option.

title = title or name
# don't overwrite existing methods/properties unless
# specifically told to do so
if overwrite or not hasattr(cls, title):
setattr(cls, title, func)

return cls

return add_delegate_accessors


wrap_delegate_names = Delegator.delegate_names
# TODO: the `delegate` arg to `wrap_delegate_names` is really only relevant
# for a docstring. It'd be nice if we didn't require it and could duck-type
# instead.

# TODO: There are 2-3 implementations of `_delegate_method`
# and `_delegate_property` that are common enough that we should consider
# making them the defaults. First, if the series being accessed has `name`
# method/property:
#
# def _delegate_method(self, name, *args, **kwargs):
# result = getattr(self.values, name)(*args, **kwargs)
# return result
#
# def _delegate_property_get(self, name):
# result = getattr(self.values, name)
# return result
#
#
# Alternately if the series being accessed does not have this attribute,
# but is a series of objects that do have the attribute:
#
# def _delegate_method(self, name, *args, **kwargs):
# meth = lambda x: getattr(x, name)(*args, **kwargs)
# return self.values.apply(meth)
#
# def _delegate_property_get(self, name):
# prop = lambda x: getattr(x, name)
# return self.values.apply(prop)
#
#
# `apply` would need to be changed to `map` if self.values is an Index.
#
# The third thing to consider moving into the general case is
# core.strings.StringMethods._wrap_result, which handles a bunch of cases
# for how to wrap delegated outputs.
Loading