Refactor cudf.Series.init #14450

mroeschke · 2023-11-18T03:01:19Z

Description

I found a few issues related to reindexing and name priority when index= and name= are given, so added unit tests for those.

Additionally added some typing in from_pandas methods

The main approach here is to collect the potential column from the if/elif branchs first, call super().__init__({name: columns}, index=index), and then apply the additional keywords to the result

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…ndas

…nto ref/from_pandas

isVoid · 2024-01-02T04:41:16Z

python/cudf/cudf/core/series.py

+            name_from_data = data.name
+            column = as_column(data, nan_as_null=nan_as_null, dtype=dtype)
+            if isinstance(data, (pd.Series, Series)):
+                index, index_from_data = as_index(data.index), index


The naming is confusing for me on this line. index is the passed in argument, index_from_data is the index extracted from data. However it seems as if the meaning swapped on this line?

Yeah I can see how this is confusing, I'll swap the naming here throughout the __init__ to make it clearer

isVoid · 2024-01-02T04:43:29Z

python/cudf/cudf/core/series.py

-                data = data.astype(dtype)
-
-        if isinstance(data, dict):
+        elif isinstance(data, dict) or data is None:


Perhaps separating these two conditions as two branches is clearer?

I can swap data=None for data={} earlier and simplify this condition

isVoid · 2024-01-02T04:47:07Z

python/cudf/cudf/core/series.py

+        if dtype is not None:
+            column = column.astype(dtype)


After passing dtype into every as_column call above, why do we still need to cast the column type here? Just curious!

This is mostly defensive for now as I am not sure if as_column currently always respects dtype casting (e.g. I found a case recently in https://github.com/rapidsai/cudf/pull/14686/files) but I think this could be removed in the future!

isVoid · 2024-01-02T05:51:04Z

python/cudf/cudf/core/series.py

+        if index_from_data is not None:
+            # TODO: This there a better way to do this?
+            index_from_data = as_index(index_from_data)
+            reindexed = self.reindex(index=index_from_data, copy=False)


There seems to be a _reindex function that can take inplace=True parameter.

I just tried this and there were some test failures around data types not being preserved during the _reindex. I guess I'll just do this for now

…ndas

isVoid

Thanks!

…ndas

mroeschke · 2024-01-08T22:21:50Z

/merge

mroeschke added 4 commits November 17, 2023 12:34

Refactor from_pandas

9010bbb

Merge remote-tracking branch 'upstream/branch-24.02' into ref/from_pa…

94dee27

…ndas

Consolidate Series.__init__

055d153

Undo dataframe.py

70747ce

mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 18, 2023

mroeschke requested a review from a team as a code owner November 18, 2023 03:01

mroeschke requested review from galipremsagar and isVoid November 18, 2023 03:01

mroeschke added 5 commits November 20, 2023 15:23

Merge remote-tracking branch 'upstream/branch-24.02' into ref/from_pa…

30d5e65

…ndas

Fix more series constructor issues

6f7adde

Merge remote-tracking branch 'upstream/branch-24.02' into ref/from_pa…

a444e1d

…ndas

Fix test_series_from_named_object_name_priority

8254f43

Merge remote-tracking branch 'upstream/branch-24.02' into ref/from_pa…

a3753a8

…ndas

mroeschke changed the title ~~REF: cudf.Series.__init__~~ cudf.Series.__init__ Nov 28, 2023

mroeschke changed the title ~~cudf.Series.__init__~~ Refactor cudf.Series.__init__ Nov 28, 2023

Merge branch 'branch-24.02' into ref/from_pandas

f0c1ecb

mroeschke mentioned this pull request Dec 29, 2023

Fix nan_as_null not being respected when passing cudf object #14687

Closed

3 tasks

mroeschke added 3 commits December 29, 2023 15:05

Merge remote-tracking branch 'upstream/branch-24.02' into ref/from_pa…

eacc27d

…ndas

Consolidate branches

d6c460e

Merge branch 'ref/from_pandas' of https://github.com/mroeschke/cudf i…

6f4f598

…nto ref/from_pandas

isVoid reviewed Jan 2, 2024

View reviewed changes

mroeschke added 3 commits January 4, 2024 17:20

Merge remote-tracking branch 'upstream/branch-24.02' into ref/from_pa…

3b31a43

…ndas

Simplify condition

7e9fa74

Less confusing index naming

177b198

isVoid approved these changes Jan 5, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/branch-24.02' into ref/from_pa…

64b560e

…ndas

rapids-bot bot merged commit c4e5e8c into rapidsai:branch-24.02 Jan 8, 2024
67 checks passed

mroeschke deleted the ref/from_pandas branch January 8, 2024 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor cudf.Series.init #14450

Refactor cudf.Series.init #14450

mroeschke commented Nov 18, 2023 •

edited

Loading

isVoid Jan 2, 2024

mroeschke Jan 5, 2024

isVoid Jan 2, 2024

mroeschke Jan 5, 2024

isVoid Jan 2, 2024

mroeschke Jan 5, 2024

isVoid Jan 2, 2024

mroeschke Jan 5, 2024

isVoid left a comment

mroeschke commented Jan 8, 2024

Refactor cudf.Series.__init__ #14450

Refactor cudf.Series.__init__ #14450

Conversation

mroeschke commented Nov 18, 2023 • edited Loading

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

isVoid left a comment

Choose a reason for hiding this comment

mroeschke commented Jan 8, 2024

Refactor cudf.Series.init #14450

Refactor cudf.Series.init #14450

mroeschke commented Nov 18, 2023 •

edited

Loading