Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: 2D EAs via composition #27015

Closed
wants to merge 19 commits into from
Closed

Conversation

jbrockmendel
Copy link
Member

Plenty of kludges and linting errors in here, just want to push it to add composition to the discussion.

Instead of patching existing EAs, this introduces ReshapeableArray which just wraps other EAs, and implements reshape methods. EAs that do natively support 2D can set a _allows_2d = True and avoid being wrapped.

In the process of getting this passing, found a handful of new issues/bugs. Will try to push fixes for those independently.

@pep8speaks
Copy link

pep8speaks commented Jun 24, 2019

Hello @jbrockmendel! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-27 15:47:41 UTC

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So high-level, this makes Block.values a ReshapableArray, which is an ExtensionArray implementing the 2-D interface. Then a DataFrame is made up of a collection of Blocks whose values are all reshapable, either by being an ndarray, or an ExtensionArray with _allows_2d = True?

Is this your preferred approach for fixing Block.shape == Block.values.shape going forward?

@@ -105,6 +105,9 @@ def _ensure_data(values, dtype=None):
else:
# Datetime
from pandas import DatetimeIndex
from pandas.core.arrays import unwrap_reshapeable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for something like factorize(EA)? Shouldn't the EA (or your wrapper) do this in _values_for_factorize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values_for_factorize might do it, ill check. But at this point we dont necessarily have a EA, so need a conditional un-wrapper.

Making DTA validate that inputs are 1D can be done separately from the rest of this, which should resolve this particular part of the diff

pandas/core/algorithms.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

So high-level, this makes Block.values a ReshapableArray, which is an ExtensionArray implementing the 2-D interface. Then a DataFrame is made up of a collection of Blocks whose values are all reshapable, either by being an ndarray, or an ExtensionArray with _allows_2d = True?

Yes.

Is this your preferred approach for fixing Block.shape == Block.values.shape going forward?

No, this is my second-best. First-best would be to require EAs to handle the (1, N) case themselves, so we wouldn't need this extra layer. But I definitely prefer this to the metaclass approach, which I wasn't able to get working at all (MRO issues)

@jbrockmendel jbrockmendel mentioned this pull request Jun 25, 2019
4 tasks
@codecov
Copy link

codecov bot commented Jun 25, 2019

Codecov Report

Merging #27015 into master will decrease coverage by 50.09%.
The diff coverage is 51.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #27015      +/-   ##
==========================================
- Coverage   91.99%    41.9%   -50.1%     
==========================================
  Files         180      181       +1     
  Lines       50774    51124     +350     
==========================================
- Hits        46711    21422   -25289     
- Misses       4063    29702   +25639
Flag Coverage Δ
#multiple ?
#single 41.9% <51.26%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexing.py 53.64% <ø> (-39.85%) ⬇️
pandas/core/groupby/ops.py 19.67% <0%> (-76.33%) ⬇️
pandas/core/groupby/generic.py 14.74% <0%> (-74.59%) ⬇️
pandas/core/generic.py 38.18% <0%> (-56.03%) ⬇️
pandas/core/arrays/base.py 59.89% <100%> (-39.55%) ⬇️
pandas/core/dtypes/concat.py 53.55% <100%> (-43.04%) ⬇️
pandas/core/arrays/categorical.py 42.09% <100%> (-53.84%) ⬇️
pandas/core/internals/concat.py 72.48% <100%> (-24.01%) ⬇️
pandas/io/formats/format.py 50.63% <100%> (-47.28%) ⬇️
pandas/core/arrays/datetimelike.py 41.49% <100%> (-56.44%) ⬇️
... and 154 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ea2d08...6e4f207. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 25, 2019

Codecov Report

Merging #27015 into master will decrease coverage by 50.12%.
The diff coverage is 53.89%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #27015       +/-   ##
===========================================
- Coverage   92.03%   41.91%   -50.13%     
===========================================
  Files         180      181        +1     
  Lines       50714    51086      +372     
===========================================
- Hits        46675    21412    -25263     
- Misses       4039    29674    +25635
Flag Coverage Δ
#multiple ?
#single 41.91% <53.89%> (+0.04%) ⬆️
Impacted Files Coverage Δ
pandas/core/indexing.py 53.64% <ø> (-39.66%) ⬇️
pandas/core/groupby/ops.py 19.67% <0%> (-76.33%) ⬇️
pandas/core/groupby/generic.py 14.74% <0%> (-74.59%) ⬇️
pandas/core/generic.py 38.18% <0%> (-56.03%) ⬇️
pandas/core/arrays/base.py 59.89% <100%> (-39.55%) ⬇️
pandas/core/dtypes/concat.py 53.58% <100%> (-43.46%) ⬇️
pandas/core/arrays/categorical.py 42.09% <100%> (-53.84%) ⬇️
pandas/core/internals/concat.py 73.04% <100%> (-23.81%) ⬇️
pandas/io/formats/format.py 50.63% <100%> (-47.28%) ⬇️
pandas/core/arrays/datetimelike.py 41.36% <100%> (-56.57%) ⬇️
... and 154 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1452e71...f6b8d23. Read the comment docs.

@gfyoung gfyoung added Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. labels Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants