`cudf.dtype` function #8949

shwina · 2021-08-04T13:32:48Z

galipremsagar

Overall changes look good, some comments..

galipremsagar · 2021-08-04T16:44:40Z

python/cudf/cudf/api/types.py

+        # no NumPy type corresponding to this type
+        # always object?
+        return np.dtype("object")


This seems reasonable to me, as pandas is moving towards object as default type if no dtype is provided:

>>> pd.Series() <stdin>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. Series([], dtype: float64)

galipremsagar · 2021-08-04T16:46:37Z

python/cudf/cudf/api/types.py

+            np_dtype = np.dtype("<m8[ns]")
+        elif np_dtype.str == "<M8":
+            np_dtype = np.dtype("<M8[ns]")
+        return np_dtype


If someone does a cudf.dtype('complex'), I think we would end up returning np.dtype('complex') here, should we validate if the dtype exists in our cudf type map before returning?

>>> np.dtype('complex') dtype('complex128')

Fixed - let me know if you think the way I'm handling unsupported NumPy types is OK

galipremsagar · 2021-08-04T16:48:36Z

python/cudf/cudf/core/column/column.py

                    dtype = pd.api.types.pandas_dtype(dtype)
-                    np_type = np.dtype(dtype).type
+                    np_type = cudf.dtype(dtype).type


Can we now squeeze these two separate dtype calls into a single cudf.dtype call? or is there something specific about calling pd.api.types.pandas_dtype first?

np_type = cudf.dtype(dtype).type

python/cudf/cudf/tests/test_dtypes.py

…cudf-dtype-function

Co-authored-by: GALI PREM SAGAR <[email protected]>

…f-dtype-function

galipremsagar · 2021-08-09T21:45:07Z

python/cudf/cudf/api/types.py

+    except TypeError:
+        pass
+    else:
+        if np_dtype.kind not in "biufUOMm":


Suggested change

if np_dtype.kind not in "biufUOMm":

if np_dtype not in cudf._lib.types.np_to_cudf_types:

To make this maintainable should we just lookup our np<->libcudf type-map here? This was any new dtype support added will automatically be supported here by cudf.dtype.

Agree -- but it would be nicer if the source of truth was in a more obiously named constant. For exmaple, something like: cudf._lib.types.SUPPORTED_NUMPY_TYPES.

+1 to have a cudf._lib.types.SUPPORTED_NUMPY_TYPES

There's a slight problem here where <M8 is an acceptable return type here, but it's not a SUPPORTED_NUMPY_TYPE (supported types are <M8[unit]).

vyasr

Mostly looks good, some small suggestions. We also should replace all instances of np.dtype throughout cudf if possible.

python/cudf/cudf/api/types.py

vyasr · 2021-08-09T21:57:34Z

python/cudf/cudf/tests/test_dtypes.py

+
+
+@pytest.mark.parametrize(
+    "in_dtype,expect",


We probably want to test more inputs that don't translate to numpy dtypes, specifically more cudf- and pandas-specific extension types.

The cuDF specific types (i.e., instances of cudf._BaseDtype) are less interesting since we just return those as-is. But I did add a few more tests.

Maybe also worth testing pandas interval/datetime/timedelta dtypes.

Added interval cases, but as far as I know, Pandas uses numpy datetime/timedelta types as their dtype for DatetimeIndex/TimedeltaIndex.

Co-authored-by: Vyas Ramasubramani <[email protected]>

codecov · 2021-08-09T23:39:33Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@2e980b8). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head e826428 differs from pull request most recent head 2a684be. Consider uploading reports for the commit 2a684be to get more accurate results

@@               Coverage Diff               @@
##             branch-21.10    #8949   +/-   ##
===============================================
  Coverage                ?   10.59%           
===============================================
  Files                   ?      114           
  Lines                   ?    19080           
  Branches                ?        0           
===============================================
  Hits                    ?     2022           
  Misses                  ?    17058           
  Partials                ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e980b8...2a684be. Read the comment docs.

…f-dtype-function

…cudf-dtype-function

shwina · 2021-08-11T20:25:23Z

rerun tests

…cudf-dtype-function

vyasr

Looks like something you did got you stuck triggering isort/circular import issues? Anyway, the import changes generally look good along with the main changes. I had a couple of minor additional comments, but nothing pressing.

vyasr · 2021-08-11T23:21:09Z

python/cudf/cudf/_lib/copying.pyx

@@ -787,12 +787,13 @@ cdef class _CPackedColumns:
        """
        Construct a ``PackedColumns`` object from a ``cudf.DataFrame``.
        """
-        from cudf.core import RangeIndex, dtypes
+        import cudf.core.dtypes


Why not just import _BaseIndex? Not a big deal either way, just curious.

In [17]: %%timeit ...: import cudf.core.dtypes ...: cudf.core.dtypes._BaseDtype ...: ...: 407 ns ± 3.89 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [19]: %%timeit ...: from cudf.core.dtypes import _BaseDtype ...: _BaseDtype ...: ...: 875 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Oh sure it's for performance works for me.

vyasr · 2021-08-11T23:29:11Z

python/cudf/cudf/tests/test_dtypes.py

+
+
+@pytest.mark.parametrize(
+    "in_dtype,expect",


Maybe also worth testing pandas interval/datetime/timedelta dtypes.

python/cudf/cudf/_lib/copying.pyx

python/cudf/cudf/_lib/transform.pyx

python/cudf/cudf/core/frame.py

shwina · 2021-08-12T15:59:33Z

@gpucibot merge

galipremsagar · 2021-08-12T16:41:04Z

Thanks for working on this @shwina ! This greatly helps other dtype related APIs especially with the changes coming up on the cuIO side.

…cudf-dtype-function

shwina · 2021-08-13T13:31:29Z

@gpucibot merge

shwina added 2 commits August 4, 2021 08:35

Replace cudf.dtype -> np.dtype

60c7c87

First stab at cudf.dtype

5e50f52

github-actions bot added the Python Affects Python cuDF API. label Aug 4, 2021

shwina added 2 commits August 4, 2021 10:12

Handle datetimes/timedeltas in cudf.dtype

367b743

Fix test

d04a5f1

galipremsagar reviewed Aug 4, 2021

View reviewed changes

shwina and others added 7 commits August 5, 2021 12:35

Handle disallowed numpy types

85351e9

Merge branch 'branch-21.10' of https://github.com/rapidsai/cudf into …

3c9dd97

…cudf-dtype-function

Update python/cudf/cudf/tests/test_dtypes.py

67cca8a

Co-authored-by: GALI PREM SAGAR <[email protected]>

Some fixes

a10eae0

Remaining failures

89ac918

Merge branch 'cudf-dtype-function' of github.com:shwina/cudf into cud…

acda2ee

…f-dtype-function

Style

64a3290

shwina marked this pull request as ready for review August 9, 2021 21:39

shwina requested a review from a team as a code owner August 9, 2021 21:39

shwina requested review from galipremsagar and isVoid August 9, 2021 21:39

shwina added non-breaking Non-breaking change tech debt improvement Improvement / enhancement to an existing function labels Aug 9, 2021

galipremsagar requested changes Aug 9, 2021

View reviewed changes

vyasr requested changes Aug 9, 2021

View reviewed changes

Update python/cudf/cudf/api/types.py

a62ab32

Co-authored-by: Vyas Ramasubramani <[email protected]>

shwina added 2 commits August 10, 2021 15:30

cudf.dtype -> np.dtype

f79e59f

Merge branch 'cudf-dtype-function' of github.com:shwina/cudf into cud…

9dceb80

…f-dtype-function

shwina requested a review from a team as a code owner August 10, 2021 19:32

Merge branch 'branch-21.10' of https://github.com/rapidsai/cudf into …

d0bef49

…cudf-dtype-function

shwina added 4 commits August 11, 2021 13:57

Progress

3eba47c

More fix

048629c

Early returns

40736c4

More tests

550c7ba

shwina added 2 commits August 11, 2021 18:12

Merge branch 'branch-21.10' of https://github.com/rapidsai/cudf into …

1cfa67c

…cudf-dtype-function

Resolve circular import issues

72d6304

shwina requested review from galipremsagar and vyasr August 11, 2021 23:05

vyasr approved these changes Aug 11, 2021

View reviewed changes

galipremsagar requested changes Aug 12, 2021

View reviewed changes

python/cudf/cudf/_lib/copying.pyx Outdated Show resolved Hide resolved

python/cudf/cudf/_lib/transform.pyx Outdated Show resolved Hide resolved

python/cudf/cudf/core/frame.py Show resolved Hide resolved

shwina added 3 commits August 12, 2021 10:48

Unused import

c8925f5

Space

26df99a

Add interval tests

fec34d9

galipremsagar approved these changes Aug 12, 2021

View reviewed changes

galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Aug 12, 2021

shwina added 3 commits August 12, 2021 13:28

:(

5fc19a9

Merge branch 'branch-21.10' of https://github.com/rapidsai/cudf into …

11156f5

…cudf-dtype-function

Merge branch 'branch-21.10' of https://github.com/rapidsai/cudf into …

2a684be

…cudf-dtype-function

quasiben approved these changes Aug 13, 2021

View reviewed changes

rapids-bot bot merged commit 2b92220 into rapidsai:branch-21.10 Aug 13, 2021

galipremsagar added breaking Breaking change and removed non-breaking Non-breaking change labels Aug 17, 2021

dantegd mentioned this pull request Aug 18, 2021

[BUG] cudaErrorInvalidValue when creating cudf.Series from float16 CuPy Series #9065

Closed

beckernick mentioned this pull request Aug 20, 2021

[BUG] Series UDFs documentation bug #9084

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`cudf.dtype` function #8949

`cudf.dtype` function #8949

shwina commented Aug 4, 2021 •

edited

Loading

galipremsagar left a comment

galipremsagar Aug 4, 2021

galipremsagar Aug 4, 2021

shwina Aug 5, 2021

galipremsagar Aug 4, 2021

shwina Aug 9, 2021

galipremsagar Aug 9, 2021

shwina Aug 9, 2021

galipremsagar Aug 9, 2021

shwina Aug 11, 2021

vyasr left a comment

vyasr Aug 9, 2021

shwina Aug 11, 2021

vyasr Aug 11, 2021

shwina Aug 12, 2021

codecov bot commented Aug 9, 2021 •

edited

Loading

shwina commented Aug 11, 2021

vyasr left a comment

vyasr Aug 11, 2021

shwina Aug 12, 2021

vyasr Aug 13, 2021

vyasr Aug 11, 2021

shwina commented Aug 12, 2021

galipremsagar commented Aug 12, 2021

shwina commented Aug 13, 2021

	if np_dtype.kind not in "biufUOMm":
	if np_dtype not in cudf._lib.types.np_to_cudf_types:



		@pytest.mark.parametrize(
		"in_dtype,expect",



		@pytest.mark.parametrize(
		"in_dtype,expect",

cudf.dtype function #8949

cudf.dtype function #8949

Conversation

shwina commented Aug 4, 2021 • edited Loading

galipremsagar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 9, 2021 • edited Loading

Codecov Report

shwina commented Aug 11, 2021

vyasr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shwina commented Aug 12, 2021

galipremsagar commented Aug 12, 2021

shwina commented Aug 13, 2021

`cudf.dtype` function #8949

`cudf.dtype` function #8949

shwina commented Aug 4, 2021 •

edited

Loading

codecov bot commented Aug 9, 2021 •

edited

Loading