API/BUG: Raise when int-dtype coercions fail #21456

gfyoung · 2018-06-13T06:39:59Z

Related to the Index and Series constructors.

cc @ucals (since this is mostly based off what you did in #15859)

gfyoung · 2018-06-13T07:18:12Z

pandas/core/dtypes/cast.py

+        raise OverflowError("The elements provided in the data cannot all be "
+                            "casted to the dtype {dtype}".format(dtype=dtype))
+
+    if np.array_equal(arr, casted):


Since this is in part due to numpy's self-conflict regarding element-wise comparisons, should we care about this showing up (perhaps we swallow the warning)?

gfyoung · 2018-06-13T07:18:42Z

pandas/tests/indexes/test_numeric.py

+    @pytest.mark.parametrize("int_dtype", ["uint8", "uint16", "uint32",
+                                           "uint64", "int32", "int64",
+                                           "int16", "int8"])
+    @pytest.mark.parametrize("float_dtype", ["float16", "float32", "float64"])


Perhaps these (i.e. int_dtype, float_dtype, uint_dtype below) should be conftest.py fixtures, if there is sufficient use for them?

I'll add them as fixtures to this PR and then can see after merging what can be done with them for other tests in the codebase.

@jreback : BTW, I didn't realize that the comment in your "review" was referencing the one in this diff (I thought I was deleting a duplicate comment). Oops.

jreback · 2018-06-13T10:26:53Z

doc/source/whatsnew/v0.24.0.txt

@@ -36,6 +36,7 @@ Datetimelike API Changes
 Other API Changes
 ^^^^^^^^^^^^^^^^^

+- Series and Index constructors now raise when the data is incompatible with the specified dtype (:issue:`15832`)


double backticks around Series/Index

say with a passed dtype=

jreback · 2018-06-13T10:28:10Z

pandas/core/dtypes/cast.py

+    incompatible with integer/unsigned integer dtypes.
+
+    .. versionadded:: 0.24.0
+


this duplicates maybe_downcast_to_dtype which is used internally, rather have the doc-string of that updated / examples (and can add the copy=)

Not quite. We're not always down-casting e.g.

Series(np.array([1, 2], dtype="int32"), dtype="int64")

Silly? Yes, but it should work.

codecov · 2018-06-13T10:57:35Z

Codecov Report

Merging #21456 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21456      +/-   ##
==========================================
+ Coverage   91.92%   91.92%   +<.01%     
==========================================
  Files         153      153              
  Lines       49570    49586      +16     
==========================================
+ Hits        45566    45582      +16     
  Misses       4004     4004

Flag	Coverage Δ
#multiple	`90.32% <100%> (ø)`	⬆️
#single	`41.81% <66.66%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/base.py	`96.62% <100%> (ø)`	⬆️
pandas/core/dtypes/cast.py	`88.49% <100%> (+0.26%)`	⬆️
pandas/core/series.py	`94.19% <100%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2da06c...27caec9. Read the comment docs.

jschendel · 2018-06-13T23:31:52Z

pandas/conftest.py

+
+
+@pytest.fixture(params=["float16", "float32", "float64"])
+def float_dtype(request):


I was also asked to add these fixtures in #21432. I think it might be best to merge #21432 first to make backporting easier since it's tagged for 0.23.2? Not sure if it really matters though.

Your code looks 95% identical to mine, so I'll just modify mine to match yours prior to pushing so there shouldn't be any significant conflicts regardless of which PR gets merged first. The only potential sources of differences:

I was asked to not include float16 since it's barely supported (BUG: Fix Series.nlargest for integer boundary values #21432 (comment))

Looks like the existing convention is to not include a blank line between the docstring and return

@jschendel : I'll address all of your questions:

The merging doesn't particularly matter at this point. I would hold on modifying your fixtures for the moment. Let's ask @jreback in your PR.

float16, ah that's true. I'll remove that.

I generally prefer a newline after the docstring. Facilitates reading IMO.

jschendel · 2018-06-13T23:35:08Z

pandas/conftest.py

+    return request.param
+
+
+@pytest.fixture(params=SIGNED_INT_DTYPES)


params should be ALL_INT_DTYPES

Also, should we rename either the list or the fixture to make them consistent ("all" vs. "any")?

Oops, good catch 😄

So this was a decision in semantics. The list is "all" of the integer dtypes, hence the name. However, all_int_dtype as a parameter to the test makes less sense vs. any_int_dtype.

jreback · 2018-06-14T10:15:37Z

pandas/tests/series/test_constructors.py

+        # see gh-15832
+        msg = "Trying to coerce float values to integers"
+        with tm.assert_raises_regex(ValueError, msg):
+            Series([1, 2, 3.5], dtype=any_int_dtype)


so this is going to cartesian product these, but these look like independent cases, can you split to 2 tests.

jreback · 2018-06-14T10:16:24Z

pandas/tests/indexes/test_numeric.py

+        # see gh-15832
+        msg = "Trying to coerce float values to integers"
+        with tm.assert_raises_regex(ValueError, msg):
+            Index([1, 2, 3.5], dtype=any_int_dtype)


same here, can you split to 2 tests

jreback · 2018-06-14T10:17:01Z

pandas/tests/series/test_constructors.py

+        msg = "could not convert string to float"
+        with tm.assert_raises_regex(ValueError, msg):
+            Series(["a", "b", "c"], dtype=float)
+


do we have these tests for Index as well (I think we do)?

😮 : That's a bug!

>>> Index(["a", "b", "c"], dtype=float) Index([["a", "b", "c"], dtype=object)

@jschendel @jreback : I'll add an xfail test for this.

jreback · 2018-06-15T17:22:39Z

@gfyoung if you can rebase your fixures should cleanly merge

gfyoung · 2018-06-16T07:25:40Z

@jreback : Tests are finally green again! PTAL.

jreback

question, also I think its worthwhile to make a sub-section in the whatsnew with a mini-example

jreback · 2018-06-18T22:49:16Z

pandas/core/indexes/base.py

                        elif inferred in ['floating', 'mixed-integer-float']:
                            if isna(data).any():
                                raise ValueError('cannot convert float '
                                                 'NaN to integer')

+                            if inferred == "mixed-integer-float":
+                                maybe_cast_to_integer_array(data, dtype)


do you need data = here?

Fixed. Also added mini-section to whatsnew.

jreback

can you run some constructor asv's to see if any effect.

jreback · 2018-06-19T00:20:42Z

pandas/core/dtypes/cast.py

+    #
+    # We didn't do this earlier because NumPy
+    # doesn't handle `uint64` correctly.
+    arr = np.asarray(arr)


why do you need this? at this point its either an ndarray, Series, Index, which is all ok here

Not quite. list can also propagate here. This was prompted by #21432, which introduced an annoying but necessary corner case with uint64.

jreback · 2018-06-19T00:22:48Z

pandas/core/series.py

@@ -4068,6 +4069,9 @@ def _try_cast(arr, take_fast_path):
                return arr

        try:
+            if is_float_dtype(dtype) or is_integer_dtype(dtype):


can you add a comment here

gfyoung · 2018-06-19T06:50:47Z

@jreback : Tests are green again. PTAL.

jreback

minor comment. perf check?

jreback · 2018-06-19T11:24:47Z

doc/source/whatsnew/v0.24.0.txt

@@ -71,6 +71,32 @@ Datetimelike API Changes
 Other API Changes
 ^^^^^^^^^^^^^^^^^

+.. _whatsnew_0240.api.other.incompatibilities


need a colon

Oops. Fixed.

* Related to the Index and Series constructors. Closes pandas-devgh-15832. * Add integer dtype fixtures to conftest.py Can used for subsequent refactoring.

gfyoung · 2018-06-19T17:01:20Z

perf check?

@jreback : Oh sorry, forgot to respond on that. Perf looks good!

gfyoung · 2018-06-20T07:39:17Z

@jreback : Comments addressed, perf looks good, and CI is green. PTAL.

jreback · 2018-06-20T10:35:18Z

thanks @gfyoung nice patch!

Closes pandas-devgh-15832.

…andas-dev#21634)

jbrockmendel · 2020-12-20T17:34:51Z

pandas/core/dtypes/cast.py

+
+    if is_unsigned_integer_dtype(dtype) and (arr < 0).any():
+        raise OverflowError("Trying to coerce negative values "
+                            "to unsigned integers")


@gfyoung should maybe_cast_to_integer_array also check for overflows like

arr = np.array([1, 200, 923442]) dtype = np.dtype(np.int8)

?

gfyoung added Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas labels Jun 13, 2018

gfyoung requested a review from jreback June 13, 2018 06:40

gfyoung force-pushed the int-dtype-coercions branch from e895c12 to 0387ae1 Compare June 13, 2018 07:16

gfyoung commented Jun 13, 2018

View reviewed changes

gfyoung added this to the 0.24.0 milestone Jun 13, 2018

gfyoung force-pushed the int-dtype-coercions branch from 0387ae1 to 3a65ee9 Compare June 13, 2018 08:09

jreback requested changes Jun 13, 2018

View reviewed changes

pandas-dev deleted a comment from jreback Jun 13, 2018

gfyoung force-pushed the int-dtype-coercions branch from 3a65ee9 to 7d5c732 Compare June 13, 2018 21:36

jschendel reviewed Jun 13, 2018

View reviewed changes

gfyoung mentioned this pull request Jun 13, 2018

BUG: Fix Series.nlargest for integer boundary values #21432

Merged

4 tasks

gfyoung force-pushed the int-dtype-coercions branch from 7d5c732 to 5568230 Compare June 13, 2018 23:50

jreback requested changes Jun 14, 2018

View reviewed changes

gfyoung force-pushed the int-dtype-coercions branch from 5568230 to b85efa7 Compare June 15, 2018 00:43

gfyoung force-pushed the int-dtype-coercions branch 2 times, most recently from 7185086 to 390d914 Compare June 15, 2018 23:54

jreback requested changes Jun 18, 2018

View reviewed changes

gfyoung force-pushed the int-dtype-coercions branch from 390d914 to 2be269b Compare June 18, 2018 23:41

jreback requested changes Jun 19, 2018

View reviewed changes

gfyoung force-pushed the int-dtype-coercions branch from 2be269b to b3d0b4e Compare June 19, 2018 00:30

jreback requested changes Jun 19, 2018

View reviewed changes

API/BUG: Raise when int-dtype coercions fail

27caec9

* Related to the Index and Series constructors. Closes pandas-devgh-15832. * Add integer dtype fixtures to conftest.py Can used for subsequent refactoring.

gfyoung force-pushed the int-dtype-coercions branch from b3d0b4e to 27caec9 Compare June 19, 2018 17:01

jreback approved these changes Jun 20, 2018

View reviewed changes

jreback merged commit b36b451 into pandas-dev:master Jun 20, 2018

gfyoung deleted the int-dtype-coercions branch June 20, 2018 16:53

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Jun 26, 2018

DOC: fixup old whatsnew for dtype coercing change (pandas-dev#21456)

fa6da50

jorisvandenbossche mentioned this pull request Jun 26, 2018

DOC: fixup old whatsnew for dtype coercing change (#21456) #21634

Merged

jorisvandenbossche added a commit that referenced this pull request Jun 26, 2018

DOC: fixup old whatsnew for dtype coercing change (#21456) (#21634)

5188c00

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

API/BUG: Raise when int-dtype coercions fail (pandas-dev#21456)

4344468

Closes pandas-devgh-15832.

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: fixup old whatsnew for dtype coercing change (pandas-dev#21456) (p…

72a3f52

…andas-dev#21634)

TomAugspurger mentioned this pull request Oct 3, 2018

Safer is dtype #22975

Merged

jbrockmendel reviewed Dec 20, 2020

View reviewed changes

jbrockmendel mentioned this pull request May 22, 2021

API/BUG: Series(floating, dtype=intlike) ignores dtype, DataFrame casts #40110

Closed

		incompatible with integer/unsigned integer dtypes.

		.. versionadded:: 0.24.0



		@pytest.fixture(params=["float16", "float32", "float64"])
		def float_dtype(request):

		return request.param


		@pytest.fixture(params=SIGNED_INT_DTYPES)

API/BUG: Raise when int-dtype coercions fail #21456

API/BUG: Raise when int-dtype coercions fail #21456

Conversation

gfyoung commented Jun 13, 2018

Choose a reason for hiding this comment

gfyoung Jun 13, 2018 • edited Loading

Choose a reason for hiding this comment

gfyoung Jun 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Jun 13, 2018 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Jun 13, 2018 • edited Loading

Codecov Report

jschendel Jun 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jschendel Jun 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Jun 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Jun 15, 2018 • edited Loading

Choose a reason for hiding this comment

jreback commented Jun 15, 2018

gfyoung commented Jun 16, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Jun 19, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Jun 19, 2018

gfyoung commented Jun 20, 2018

jreback commented Jun 20, 2018

Choose a reason for hiding this comment

gfyoung Jun 13, 2018 •

edited

Loading

gfyoung Jun 13, 2018 •

edited

Loading

gfyoung Jun 13, 2018 •

edited

Loading

codecov bot commented Jun 13, 2018 •

edited

Loading

jschendel Jun 13, 2018 •

edited

Loading

jschendel Jun 13, 2018 •

edited

Loading

gfyoung Jun 14, 2018 •

edited

Loading

gfyoung Jun 15, 2018 •

edited

Loading