Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add GroupBy.dtypes #12783

Merged
merged 8 commits into from
Feb 16, 2023

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Feb 15, 2023

Description

This PR adds dtypes property to GroupBy, this will also fix some upstream dask breaking changes introduced in: dask/dask#9889

Issue was discovered in: #12768 (comment)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team non-breaking Non-breaking change labels Feb 15, 2023
@galipremsagar galipremsagar self-assigned this Feb 15, 2023
@galipremsagar galipremsagar requested a review from a team as a code owner February 15, 2023 18:06
@github-actions github-actions bot added the Python Affects Python cuDF API. label Feb 15, 2023
}
)
df.index = index
return df
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much more simple than the pandas implementation, but I guess it makes sense.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion to simplify, but otherwise LGTM.

python/cudf/cudf/core/groupby/groupby.py Outdated Show resolved Hide resolved
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Feb 15, 2023
@galipremsagar
Copy link
Contributor Author

Currently blocked by an unrelated error:

2023-02-15T00:56:38.2183067Z [32m┌────────────────────────┐[0m
2023-02-15T00:56:38.2183513Z [32m|    pytest custreamz    |[0m
2023-02-15T00:56:38.2184134Z [32m└────────────────────────┘[0m
2023-02-15T00:56:38.2184156Z 
2023-02-15T00:56:38.2184513Z /__w/cudf/cudf/python/custreamz /__w/cudf/cudf
2023-02-15T00:56:39.2568849Z ImportError while loading conftest '/__w/cudf/cudf/python/custreamz/custreamz/tests/conftest.py'.
2023-02-15T00:56:39.2570733Z custreamz/__init__.py:3: in <module>
2023-02-15T00:56:39.2571402Z     from .kafka import Consumer
2023-02-15T00:56:39.2572094Z custreamz/kafka.py:3: in <module>
2023-02-15T00:56:39.2572830Z     from cudf_kafka._lib.kafka import KafkaDatasource
2023-02-15T00:56:39.2574920Z E   ImportError: /opt/conda/envs/test/lib/python3.8/site-packages/cudf_kafka/_lib/kafka.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3fmt2v96detail18thousands_sep_implIcEENS1_20thousands_sep_resultIT_EENS1_10locale_refE

Discussing offline to resolve this.

@vyasr
Copy link
Contributor

vyasr commented Feb 15, 2023

In the interest of unblocking CI, I've made some small changes to patch this issue. I'll need to do a bit of a deeper investigation to figure out where cudf (or perhaps arrow) is leaking this dependency from; I'm not 100% convinced that it should be needed, but I could be mistaken (in which case my current patch won't need to be modified).

@galipremsagar
Copy link
Contributor Author

CI passed, merging this PR now

@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit d787ff2 into rapidsai:branch-23.04 Feb 16, 2023
rapids-bot bot pushed a commit that referenced this pull request Feb 16, 2023
The changes to spdlog/fmt packaging in rmm caused an undefined symbol issue in cudf_kafka. To unblock CI in #12783, I simply added fmt to the list of required libraries for the Python package (see caf7adf and [the explanation](#12783 (comment))). The underlying issue turns out to be that by default usage of fmt results in the dependent assuming that the headers are compiled in not header-only mode. When using CMake to manage the build, fmt relies on use of the appropriate target `fmt::fmt-header-only` to configure the build such that anything using fmt knows to use it in header-only mode, which sets the `FMT_HEADER_ONLY` preprocessor macro under the hood. Since the Python cudf_kafka package is not built using CMake, however, this information was not propagated to its build from its dependencies (libcudf/libcudf_kafka). As a result, it was compiled expecting an external definition of some fmt symbols rather than inlining them. This PR removes the undesirable library dependency on fmt introduced in #12783 in favor of properly telling fmt to expect inlined symbols.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Bradley Dice (https://github.com/bdice)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #12796
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants