Changes to support Numpy >= 1.24 #13325

shwina · 2023-05-10T12:39:36Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

shwina · 2023-05-10T12:41:22Z

python/cudf/cudf/tests/test_csv.py

@@ -150,8 +150,8 @@ def make_all_numeric_extremes_dataframe():
        np_type = pdf_dtypes[gdf_dtype]
        if np.issubdtype(np_type, np.integer):
            itype = np.iinfo(np_type)
-            extremes = [0, +1, -1, itype.min, itype.max]
-            df[gdf_dtype] = np.array(extremes * 4, dtype=np_type)[:20]
+            extremes = [itype.min, itype.max]


Need to change the comments at the beginning of these tests

Could you elaborate? Is this a task you want to accomplish in this PR?

I know that 0, +1, and -1 aren't extrema for integer types, but is there a reason you remove them from these tests? I suppose perhaps that np.uint8(-1) now raises OverflowError or something?

np.uint8(-1) in particular raises a deprecation warning. But the reason I gave up on this test was because I couldn't figure out why this was happening:

In [2]: np.array([-1]).astype("uint64") Out[2]: array([18446744073709551615], dtype=uint64) In [3]: np.array([18446744073709551615]).astype("uint64") Out[3]: array([18446744073709551615], dtype=uint64) In [4]: np.array([-1, 18446744073709551615]).astype("uint64") <ipython-input-4-03014ed268fc>:1: RuntimeWarning: invalid value encountered in cast np.array([-1, 18446744073709551615]).astype("uint64") Out[4]: array([18446744073709551615, 0], dtype=uint64)

I've gone ahead and filtered out that warning from this test.

But the reason I gave up on this test was because I couldn't figure out why this was happening:

np.array([-1, 2**64 - 1]).dtype == "float64"

which is lossy.

shwina · 2023-05-11T21:49:37Z

python/cudf/cudf/__init__.py

@@ -96,6 +97,7 @@

    _setup_numba_linker(_PTX_FILE)

+    patch_numba_codegen_if_needed()


This hack will go away

I think it can be removed from this PR itself. Now that we've verified numpy 1.24 support, I would recommend removing the numba-related changes in this PR that you're using in order to allow running numba 0.57 (which is necessary to use numpy 1.24). We'll still get it tested because our CUDA 12 wheel builds will patch us to use 0.57 anyway (but with CUDA 12 we don't use cubinlinker/ptxcompiler so we don't need any edits for those). Then when we bump our numba to 0.57 tests should pass thanks to this PR.

I have one other question on this PR, would wait to make changes here until everything else is resolved in case you need to run more tests locally.

vyasr · 2023-05-11T21:51:24Z

Can you remove the numpy upper bound pinnings in dependencies.yaml and the cudf meta.yaml? Then in the CUDA 12.0 CI let's make sure that at least one run installs numpy 1.24 and still passes tests.

vyasr

Success! https://github.com/rapidsai/cudf/actions/runs/4953167580/jobs/8860521944?pr=13325#step:10:828 shows that numpy 1.24 is now getting installed into the 12.0.1 wheel build, and tests are passing

vyasr · 2023-05-11T23:19:57Z

python/cudf/cudf/__init__.py

@@ -96,6 +97,7 @@

    _setup_numba_linker(_PTX_FILE)

+    patch_numba_codegen_if_needed()


I think it can be removed from this PR itself. Now that we've verified numpy 1.24 support, I would recommend removing the numba-related changes in this PR that you're using in order to allow running numba 0.57 (which is necessary to use numpy 1.24). We'll still get it tested because our CUDA 12 wheel builds will patch us to use 0.57 anyway (but with CUDA 12 we don't use cubinlinker/ptxcompiler so we don't need any edits for those). Then when we bump our numba to 0.57 tests should pass thanks to this PR.

vyasr · 2023-05-11T23:20:25Z

python/cudf/cudf/tests/test_csv.py

@@ -150,8 +150,8 @@ def make_all_numeric_extremes_dataframe():
        np_type = pdf_dtypes[gdf_dtype]
        if np.issubdtype(np_type, np.integer):
            itype = np.iinfo(np_type)
-            extremes = [0, +1, -1, itype.min, itype.max]
-            df[gdf_dtype] = np.array(extremes * 4, dtype=np_type)[:20]
+            extremes = [itype.min, itype.max]


Could you elaborate? Is this a task you want to accomplish in this PR?

vyasr · 2023-05-11T23:20:50Z

python/cudf/cudf/tests/test_column.py

@@ -398,8 +398,8 @@ def test_column_view_string_slice(slc):
            cudf.core.column.as_column([], dtype="uint8"),
        ),
        (
-            cp.array([453], dtype="uint8"),
-            cudf.core.column.as_column([453], dtype="uint8"),
+            cp.array([255], dtype="uint8"),


I guess it doesn't matter what value we choose here? Just wondering if it's important to use 453-256.

I don't think it matters. I use 255 just because it's `np.iinfo(uint8).max'

vyasr · 2023-05-11T23:22:35Z

python/cudf/cudf/tests/test_numerical.py

@@ -194,6 +194,7 @@ def test_to_numeric_downcast_int(data, downcast):
    assert_eq(expected, got)


+@pytest.mark.filterwarnings("ignore:invalid value encountered in cast")


Instead of applying this to the whole test, can we just wrap the pd.to_numeric call? This doesn't affect the cudf.to_numeric call, does it?

Also, should we be handling the warning conditionally? i.e. I assuming this happens when trying to downcast a signed to an unsigned type or something?

vyasr · 2023-05-11T23:23:06Z

python/cudf/cudf/__init__.py

@@ -96,6 +97,7 @@

    _setup_numba_linker(_PTX_FILE)

+    patch_numba_codegen_if_needed()


I have one other question on this PR, would wait to make changes here until everything else is resolved in case you need to run more tests locally.

wence-

It seems like there are some strange things going on in the changes in the tests?

wence- · 2023-05-15T09:23:07Z

python/cudf/cudf/tests/test_csv.py

-            extremes = [0, +1, -1, itype.min, itype.max]
-            df[gdf_dtype] = np.array(extremes * 4, dtype=np_type)[:20]
+            extremes = [itype.min, itype.max]
+            df[gdf_dtype] = np.array(extremes * 10, dtype=np_type)[:20]


Any reason to change from 4 pairs of extrema to 10?

wence- · 2023-05-15T09:23:35Z

python/cudf/cudf/tests/test_csv.py

@@ -150,8 +150,8 @@ def make_all_numeric_extremes_dataframe():
        np_type = pdf_dtypes[gdf_dtype]
        if np.issubdtype(np_type, np.integer):
            itype = np.iinfo(np_type)
-            extremes = [0, +1, -1, itype.min, itype.max]
-            df[gdf_dtype] = np.array(extremes * 4, dtype=np_type)[:20]
+            extremes = [itype.min, itype.max]


I know that 0, +1, and -1 aren't extrema for integer types, but is there a reason you remove them from these tests? I suppose perhaps that np.uint8(-1) now raises OverflowError or something?

wence- · 2023-05-15T09:28:19Z

python/cudf/cudf/tests/test_feather.py

    nrows = request.param

    # Create a pandas dataframe with random data of mixed types
    test_pdf = pd.DataFrame(
-        [list(range(ncols * i, ncols * (i + 1))) for i in range(nrows)],
-        columns=pd.Index([f"col_{typ}" for typ in types], name="foo"),
+        {f"col_{typ}": np.random.randint(0, nrows, nrows) for typ in types}


All of the columns in this dataframe now have type int64, no? Since they are never downcast with astype.

wence- · 2023-05-15T09:45:54Z

python/cudf/cudf/tests/test_json.py

    nrows = request.param

    # Create a pandas dataframe with random data of mixed types
    test_pdf = pd.DataFrame(
-        [list(range(ncols * i, ncols * (i + 1))) for i in range(nrows)],
-        columns=pd.Index([f"col_{typ}" for typ in types], name="foo"),
+        {f"col_{typ}": np.random.randint(0, nrows, nrows) for typ in types}


This one in contrast is cast to the appropriate type.

python/cudf/cudf/tests/test_parquet.py

wence- · 2023-05-15T09:51:14Z

python/cudf/cudf/tests/test_rank.py

@@ -125,7 +125,7 @@ def test_rank_error_arguments(pdf):
    np.full((3,), np.inf),
    np.full((3,), -np.inf),
 ]
-sort_dtype_args = [np.int32, np.int64, np.float32, np.float64]
+sort_dtype_args = [np.float32, np.float64]


This means we now don't run some tests with integer dtypes. Is it that they don't make sense any more?

wence- · 2023-05-15T09:54:55Z

python/cudf/cudf/tests/test_unaops.py

    slr_device = cudf.Scalar(slr, dtype=dtype)

+    if op.__name__ == "neg" and np.dtype(dtype).kind == "u":
+        # TODO: what do we want to do here?
+        return


Numpy is fine with this right? Right? Negation of unsigned integers is totally well-defined.

Well:

In [2]: -np.uint16(1) <ipython-input-2-5156426c8f88>:1: RuntimeWarning: overflow encountered in scalar negative -np.uint16(1) Out[2]: 65535

Should we just go ahead and ignore that warning in this test? (I've resorted to doing that in most other cases)

wence- · 2023-05-15T10:11:38Z

python/cudf/cudf/utils/queryutils.py

@@ -137,7 +137,7 @@ def query_compile(expr):
        key "args" is a sequence of name of the arguments.
    """

-    funcid = f"queryexpr_{np.uintp(hash(expr)):x}"
+    funcid = f"queryexpr_{np.uintp(abs(hash(expr))):x}"


This is incorrect, hash returns in the semi-open interval [-2**63, 2**63), but abs folds this to the closed interval [0, 2**63] (so you alias just shy of 50% of the values). Instead, you want to shift, I suspect, and then you don't need numpy in the loop at all:

Suggested change

funcid = f"queryexpr_{np.uintp(abs(hash(expr))):x}"

funcid = f"queryexpr_{hash(expr) + 2**63:x}"

That said, strings are hashable, so this seems like a weird way of constructing a cache key (it's somehow deliberately making it more likely that you get hash collisions and produce the wrong value).

I would have thought that this would do the trick:

@functools.cache def query_compile(expr): name = "queryexpr" # these are only looked up locally so names can collide info = query_parser(expr) fn = query_builder(info, name) args = info["args"] devicefn = cudf.jit(device=True)(fn) kernel = _wrap_query_expr(f"kernel_{name}", devicefn, args) info["kernel"] = kernel return info

vyasr · 2023-05-23T02:23:28Z

@shwina Is this PR waiting on a re-review from me and/or @wence- ?

ajschmidt8

Approving ops-codeowner file changes

shwina · 2023-05-23T20:56:42Z

@vyasr just resolved the conflicts here. Yeah I think it could use a quick skim

wence-

LGTM, thanks.

shwina · 2023-05-25T14:46:26Z

/merge

shwina added 3 commits May 8, 2023 10:13

json

4ad5832

Start making changes to support np 1.24

809cc6e

More

4e4ad82

github-actions bot added the Python Affects Python cuDF API. label May 10, 2023

shwina added tech debt non-breaking Non-breaking change and removed Python Affects Python cuDF API. labels May 10, 2023

Merge branch 'branch-23.06' into numpy-1.24

f8da5dc

github-actions bot added the Python Affects Python cuDF API. label May 10, 2023

shwina commented May 10, 2023

View reviewed changes

shwina added the improvement Improvement / enhancement to an existing function label May 10, 2023

shwina and others added 3 commits May 11, 2023 10:25

Merge branch 'branch-23.06' into numpy-1.24

e79b8cd

Special-case inf

b94427f

Merge branch 'numpy-1.24' of github.com:shwina/cudf into numpy-1.24

66365b3

shwina marked this pull request as ready for review May 11, 2023 21:49

shwina requested a review from a team as a code owner May 11, 2023 21:49

shwina requested review from wence- and charlesbluca May 11, 2023 21:49

shwina commented May 11, 2023

View reviewed changes

Remove numpy upper bounds

ba3ede7

shwina requested a review from a team as a code owner May 11, 2023 21:54

github-actions bot added the conda label May 11, 2023

One more place to unpin numpy

f884eb7

vyasr force-pushed the numpy-1.24 branch from 7a762f5 to f884eb7 Compare May 11, 2023 22:33

vyasr requested a review from a team as a code owner May 11, 2023 22:33

vyasr reviewed May 11, 2023

View reviewed changes

wence- requested changes May 15, 2023

View reviewed changes

shwina added 2 commits May 15, 2023 07:46

filter warning

42d6a74

Merge branch 'numpy-1.24' of github.com:shwina/cudf into numpy-1.24

0eb153c

shwina and others added 7 commits May 15, 2023 07:50

Actually cast columns

d75986b

Fix parquet casting

2dfb939

Use a filterwarning instead

860f439

More filterwarnings

8221628

Shift, not abs

abdc026

More filterwarnings

37c9514

Merge branch 'branch-23.06' into numpy-1.24

c4bccf4

shwina self-assigned this May 18, 2023

ajschmidt8 approved these changes May 23, 2023

View reviewed changes

Merge branch 'branch-23.06' into numpy-1.24

e776efc

shwina and others added 3 commits May 24, 2023 13:13

Merge branch 'branch-23.06' of github.com:rapidsai/cudf into numpy-1.24

0bff932

Style

e4b43ac

Merge branch 'branch-23.06' into numpy-1.24

9a68604

wence- approved these changes May 25, 2023

View reviewed changes

rapids-bot bot merged commit c3dd1d6 into rapidsai:branch-23.06 May 25, 2023

pentschev mentioned this pull request May 26, 2023

[BUG] dask scheduler not working: dask_cuda/cli.py AttributeError: 'function' object has no attribute 'command' rapidsai/dask-cuda#1179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to support Numpy >= 1.24 #13325

Changes to support Numpy >= 1.24 #13325

shwina commented May 10, 2023

shwina May 10, 2023

vyasr May 11, 2023

wence- May 15, 2023

shwina May 15, 2023

wence- May 15, 2023

shwina May 11, 2023

vyasr May 11, 2023

vyasr May 11, 2023

vyasr commented May 11, 2023

vyasr left a comment

vyasr May 11, 2023

vyasr May 11, 2023

vyasr May 11, 2023

shwina May 12, 2023

vyasr May 11, 2023

vyasr May 11, 2023

wence- left a comment

wence- May 15, 2023

wence- May 15, 2023

wence- May 15, 2023

wence- May 15, 2023

wence- May 15, 2023

wence- May 15, 2023

shwina May 15, 2023

wence- May 15, 2023

vyasr commented May 23, 2023

ajschmidt8 left a comment

shwina commented May 23, 2023

wence- left a comment

shwina commented May 25, 2023

		@@ -96,6 +97,7 @@

		_setup_numba_linker(_PTX_FILE)

		patch_numba_codegen_if_needed()

		@@ -194,6 +194,7 @@ def test_to_numeric_downcast_int(data, downcast):
		assert_eq(expected, got)


		@pytest.mark.filterwarnings("ignore:invalid value encountered in cast")

	funcid = f"queryexpr_{np.uintp(abs(hash(expr))):x}"
	funcid = f"queryexpr_{hash(expr) + 2**63:x}"

Changes to support Numpy >= 1.24 #13325

Changes to support Numpy >= 1.24 #13325

Conversation

shwina commented May 10, 2023

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr commented May 11, 2023

vyasr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wence- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr commented May 23, 2023

ajschmidt8 left a comment

Choose a reason for hiding this comment

shwina commented May 23, 2023

wence- left a comment

Choose a reason for hiding this comment

shwina commented May 25, 2023