Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Remove instances of pd.core #14421

Merged
merged 7 commits into from
Nov 20, 2023

Conversation

mroeschke
Copy link
Contributor

Description

pandas.core is technically private and methods could be moved at any time. Avoiding places in the codepace where they could be avoided

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 15, 2023
@mroeschke mroeschke requested a review from a team as a code owner November 15, 2023 23:18
@mroeschke mroeschke requested review from wence- and bdice November 15, 2023 23:18
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's target new PRs to 24.02 since we're in burndown, unless you think this is essential for 23.12.

@mroeschke
Copy link
Contributor Author

Ah OK. Not essential for 23.12 so will retarget for 24.02

@mroeschke mroeschke changed the base branch from branch-23.12 to branch-24.02 November 16, 2023 01:31
@mroeschke mroeschke requested review from a team as code owners November 16, 2023 01:31
@mroeschke mroeschke changed the base branch from branch-24.02 to branch-23.12 November 16, 2023 01:31
@mroeschke mroeschke changed the base branch from branch-23.12 to branch-24.02 November 16, 2023 03:06
@wence- wence- removed request for a team and robertmaynard November 16, 2023 14:30
Comment on lines 2313 to 2320
result = self.to_pandas().to_dict(orient=orient, into=into)
if orient == "series":
# Special case needed to avoid converting
# cudf.Series objects into pd.Series
into_c = pd.core.common.standardize_mapping(into)
return into_c((k, v) for k, v in self.items())

return self.to_pandas().to_dict(orient=orient, into=into)
# Ensure values are cudf.Series
converted = ((k, Series(v)) for k, v in result.items())
if isinstance(into, defaultdict):
return type(result)(into.default_factory, converted)
return type(result)(converted)
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: ‏This change means that the "special-case" orient="series" which previously did not induce a copy now produces both a DtoH and HtoD copy.

We should just handle the case correctly:

if orient == "series":
    if not inspect.isclass(into):
        cons = type(into)
        if isinstance(into, defaultdict):
            cons = partial(cons, into.default_factory)
    elif issubclass(into, Mapping):
        cons = into
        if issubclass(into, defaultdict):
           raise TypeError("Must provide initialised defaultdict")
    else:
       raise TypeError(...)
    return cons(self.items())

Aside, this is a mad interface, one should just be on the hook for providing something that has the same __init__ signature as dict (i.e. *args, **kwargs -> Mapping).

(I note the implementation in pandas has a bug because it doesn't preserve the type of the output if the input is a subclass of defaultdict)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now produces both a DtoH and HtoD copy.

Oof thanks for the catch! Any tips on generally knowing when a DtoH or HtoD occurs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time you call to_pandas or to_arrow or .values_host or similar to get a host-side version of the object. Then in the other direction, any time you have a host-side object.

python/cudf/cudf/core/index.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/index.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/index.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_rolling.py Show resolved Hide resolved
@wence-
Copy link
Contributor

wence- commented Nov 20, 2023

/merge

@rapids-bot rapids-bot bot merged commit 2afb784 into rapidsai:branch-24.02 Nov 20, 2023
64 of 65 checks passed
@mroeschke mroeschke deleted the ref/pandas_core branch November 20, 2023 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants