Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes NumericIndex constructor dtype aware #29529

Merged
merged 18 commits into from
Nov 17, 2019
Merged

Makes NumericIndex constructor dtype aware #29529

merged 18 commits into from
Nov 17, 2019

Conversation

oguzhanogreden
Copy link
Contributor

@oguzhanogreden oguzhanogreden commented Nov 10, 2019

Copy link
Member

@jschendel jschendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Approach looks good. Can you add an entry to the 1.0 whatsnew under the "Bug Fixes - Numeric" subsection (around line 339)?

pandas/tests/indexes/test_base.py Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
pandas/tests/indexes/test_base.py Outdated Show resolved Hide resolved
pandas/tests/indexes/test_base.py Outdated Show resolved Hide resolved
@@ -57,7 +57,7 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None, fastpath=None):
return cls._simple_new(data, name=name)

# is_scalar, generators handled in coerce_to_ndarray
data = cls._coerce_to_ndarray(data)
data = cls._coerce_to_ndarray(data, dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should rely on dtype being explicitly passed in (it's None by default) so maybe make this dtype=self._dtype for now since the dtype parameter isn't being validated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initially made sense to me.

However, numpy seems to complicate this suggestion since it'll cast input if dtype is explicitly passed:

In [1]: import numpy as np 
   ...: np.__version__                                                                                                                                                                                                                                                                                                         
Out[1]: '1.17.3'

In [2]: np.array([1, "2"]) 
   ...:                                                                                                                                                                                                                                                                                                                        
Out[2]: array(['1', '2'], dtype='<U21')

In [3]: np.array(["1", "2"])  
   ...:                                                                                                                                                                                                                                                                                                                        
Out[3]: array(['1', '2'], dtype='<U1')

In [4]: np.array(["1", "2"], dtype='int')                                                                                                                                                                                                                                                                                      
Out[4]: array([1, 2])

This test suggests that pandas keeps the same behavior. With 458c25e, I tried to check the type of elements. The side effect of that is consuming iterators, leading to the current errors.

Copy link
Contributor Author

@oguzhanogreden oguzhanogreden Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numpy behavior I report above is brought up here but not acted on.

@jschendel jschendel added this to the 1.0 milestone Nov 11, 2019
@jschendel jschendel added Bug Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses labels Nov 11, 2019
@oguzhanogreden
Copy link
Contributor Author

I added the what's new entry but I must admit I haven't yet managed to set up building docs in a performant way. I'll check that soon and let you know.

@oguzhanogreden
Copy link
Contributor Author

The failing tast was this. Now there are more of them :) Perhaps it's too late now, I'll take a fresh look over the weekend.

I had some questions as well...

  1. Index._coerce_to_ndarray() has practically become Index._check_types_and_coerce_to_ndarray(). Does it make sense to make an issue/PR to split these?
  2. The change I made is in the method of an abstract Index class. This method has effectively become a "numeric-only" method. However, CategoricalIndex also inherits from Index. Does it make sense to change the conditional added in 458c25e to reflect this, something like
# use _typ instead of inferred_type property since this method, if ever used, will be used before the instance exists
if cls._typ != 'categoricalindex' and lib.infer_dtype(data) == 'string':

@pep8speaks
Copy link

pep8speaks commented Nov 16, 2019

Hello @oguzhanogreden! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-11-17 21:30:54 UTC

@oguzhanogreden
Copy link
Contributor Author

Sorry about a bit of back and forth here.

Finally:

  1. The solution depends on explicit passing of dtype.
  2. @jschendel 's point about argument validation seems to be addressed centrally here.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some small requests. ping on green.

pandas/core/indexes/base.py Outdated Show resolved Hide resolved
pandas/core/indexes/base.py Outdated Show resolved Hide resolved
@oguzhanogreden oguzhanogreden changed the title Makes UInt64Index constructor dtype aware Makes NumericIndex constructor dtype aware Nov 17, 2019
# is_scalar, generators handled in coerce_to_ndarray
data = cls._coerce_to_ndarray(data)
# Coerce to ndarray if not already ndarray or Index
if not isinstance(data, (np.ndarray, Index)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah, this is what we had quite some time ago. thanks

@jreback jreback merged commit 7038f85 into pandas-dev:master Nov 17, 2019
@jreback
Copy link
Contributor

jreback commented Nov 17, 2019

thanks @oguzhanogreden

Copy link
Member

@jschendel jschendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've been a little behind on reviewing and noticed a couple small things that could be done as a follow up. Edit: Nevermind, this can be ignored.

pandas/tests/indexes/test_numeric.py Show resolved Hide resolved
pandas/core/indexes/numeric.py Show resolved Hide resolved
pandas/tests/indexes/test_numeric.py Show resolved Hide resolved
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-unique integers coerced to float during UInt64Index creation with explicit
4 participants