-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/BUG: Raise when int-dtype coercions fail #21456
Conversation
e895c12
to
0387ae1
Compare
raise OverflowError("The elements provided in the data cannot all be " | ||
"casted to the dtype {dtype}".format(dtype=dtype)) | ||
|
||
if np.array_equal(arr, casted): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/tests/indexes/test_numeric.py
Outdated
@pytest.mark.parametrize("int_dtype", ["uint8", "uint16", "uint32", | ||
"uint64", "int32", "int64", | ||
"int16", "int8"]) | ||
@pytest.mark.parametrize("float_dtype", ["float16", "float32", "float64"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps these (i.e. int_dtype
, float_dtype
, uint_dtype
below) should be conftest.py
fixtures, if there is sufficient use for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add them as fixtures to this PR and then can see after merging what can be done with them for other tests in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback : BTW, I didn't realize that the comment in your "review" was referencing the one in this diff
(I thought I was deleting a duplicate comment). Oops.
0387ae1
to
3a65ee9
Compare
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -36,6 +36,7 @@ Datetimelike API Changes | |||
Other API Changes | |||
^^^^^^^^^^^^^^^^^ | |||
|
|||
- Series and Index constructors now raise when the data is incompatible with the specified dtype (:issue:`15832`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks around Series/Index
say with a passed dtype=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done.
incompatible with integer/unsigned integer dtypes. | ||
|
||
.. versionadded:: 0.24.0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this duplicates maybe_downcast_to_dtype which is used internally, rather have the doc-string of that updated / examples (and can add the copy=)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. We're not always down-casting e.g.
Series(np.array([1, 2], dtype="int32"), dtype="int64")
Silly? Yes, but it should work.
Codecov Report
@@ Coverage Diff @@
## master #21456 +/- ##
==========================================
+ Coverage 91.92% 91.92% +<.01%
==========================================
Files 153 153
Lines 49570 49586 +16
==========================================
+ Hits 45566 45582 +16
Misses 4004 4004
Continue to review full report at Codecov.
|
3a65ee9
to
7d5c732
Compare
pandas/conftest.py
Outdated
|
||
|
||
@pytest.fixture(params=["float16", "float32", "float64"]) | ||
def float_dtype(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also asked to add these fixtures in #21432. I think it might be best to merge #21432 first to make backporting easier since it's tagged for 0.23.2? Not sure if it really matters though.
Your code looks 95% identical to mine, so I'll just modify mine to match yours prior to pushing so there shouldn't be any significant conflicts regardless of which PR gets merged first. The only potential sources of differences:
- I was asked to not include float16 since it's barely supported (BUG: Fix Series.nlargest for integer boundary values #21432 (comment))
- Looks like the existing convention is to not include a blank line between the docstring and
return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jschendel : I'll address all of your questions:
-
The merging doesn't particularly matter at this point. I would hold on modifying your fixtures for the moment. Let's ask @jreback in your PR.
-
float16
, ah that's true. I'll remove that. -
I generally prefer a newline after the docstring. Facilitates reading IMO.
pandas/conftest.py
Outdated
return request.param | ||
|
||
|
||
@pytest.fixture(params=SIGNED_INT_DTYPES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params
should be ALL_INT_DTYPES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should we rename either the list or the fixture to make them consistent ("all" vs. "any")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Oops, good catch 😄
-
So this was a decision in semantics. The list is "all" of the integer dtypes, hence the name. However,
all_int_dtype
as a parameter to the test makes less sense vs.any_int_dtype.
7d5c732
to
5568230
Compare
# see gh-15832 | ||
msg = "Trying to coerce float values to integers" | ||
with tm.assert_raises_regex(ValueError, msg): | ||
Series([1, 2, 3.5], dtype=any_int_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is going to cartesian product these, but these look like independent cases, can you split to 2 tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pandas/tests/indexes/test_numeric.py
Outdated
# see gh-15832 | ||
msg = "Trying to coerce float values to integers" | ||
with tm.assert_raises_regex(ValueError, msg): | ||
Index([1, 2, 3.5], dtype=any_int_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, can you split to 2 tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
msg = "could not convert string to float" | ||
with tm.assert_raises_regex(ValueError, msg): | ||
Series(["a", "b", "c"], dtype=float) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have these tests for Index as well (I think we do)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😮 : That's a bug!
>>> Index(["a", "b", "c"], dtype=float)
Index([["a", "b", "c"], dtype=object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #21311
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jschendel @jreback : I'll add an xfail
test for this.
5568230
to
b85efa7
Compare
@gfyoung if you can rebase your fixures should cleanly merge |
7185086
to
390d914
Compare
@jreback : Tests are finally green again! PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question, also I think its worthwhile to make a sub-section in the whatsnew with a mini-example
pandas/core/indexes/base.py
Outdated
elif inferred in ['floating', 'mixed-integer-float']: | ||
if isna(data).any(): | ||
raise ValueError('cannot convert float ' | ||
'NaN to integer') | ||
|
||
if inferred == "mixed-integer-float": | ||
maybe_cast_to_integer_array(data, dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need data =
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Also added mini-section to whatsnew
.
390d914
to
2be269b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you run some constructor asv's to see if any effect.
# | ||
# We didn't do this earlier because NumPy | ||
# doesn't handle `uint64` correctly. | ||
arr = np.asarray(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this? at this point its either an ndarray, Series, Index, which is all ok here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. list
can also propagate here. This was prompted by #21432, which introduced an annoying but necessary corner case with uint64
.
pandas/core/series.py
Outdated
@@ -4068,6 +4069,9 @@ def _try_cast(arr, take_fast_path): | |||
return arr | |||
|
|||
try: | |||
if is_float_dtype(dtype) or is_integer_dtype(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done.
2be269b
to
b3d0b4e
Compare
@jreback : Tests are green again. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment. perf check?
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -71,6 +71,32 @@ Datetimelike API Changes | |||
Other API Changes | |||
^^^^^^^^^^^^^^^^^ | |||
|
|||
.. _whatsnew_0240.api.other.incompatibilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a colon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Fixed.
* Related to the Index and Series constructors. Closes pandas-devgh-15832. * Add integer dtype fixtures to conftest.py Can used for subsequent refactoring.
b3d0b4e
to
27caec9
Compare
@jreback : Oh sorry, forgot to respond on that. Perf looks good! |
@jreback : Comments addressed, perf looks good, and CI is green. PTAL. |
thanks @gfyoung nice patch! |
|
||
if is_unsigned_integer_dtype(dtype) and (arr < 0).any(): | ||
raise OverflowError("Trying to coerce negative values " | ||
"to unsigned integers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gfyoung should maybe_cast_to_integer_array also check for overflows like
arr = np.array([1, 200, 923442])
dtype = np.dtype(np.int8)
?
Related to the Index and Series constructors.
Closes #15832.
cc @ucals (since this is mostly based off what you did in #15859)