Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Introduce the pylibcudf API and subpackage #13921

Closed
GregoryKimball opened this issue Aug 18, 2023 · 3 comments
Closed

[FEA] Introduce the pylibcudf API and subpackage #13921

GregoryKimball opened this issue Aug 18, 2023 · 3 comments
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. pylibcudf Issues specific to the pylibcudf package

Comments

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Aug 18, 2023

Background

cuDF-python is the common way for Python users to interact with libcudf, the CUDA/C++ computational core for RAPIDS dataframe and database operations. However, cuDF-python is designed to closely correspond with the pandas API, and in the process incurs some semantic overhead to libcudf algorithms. For python applications looking for accelerated dataframe operations and where API matching is not useful, "pylibcudf" provides a direct way for python ecosystem to use libcudf. Pylibcudf also makes libcudf APIs available to the python ecosystem even if they are not supported in pandas (e.g. TEXT).

In addition, we can improve the performance and design of cuDF-python by building on a "pylibcudf" foundation and refactoring extra complexity in cuDF-python's Cython layer.

Example performance

Here is an example of the API design we have in mind for pylibcudf.

    # cv1, cv2 and cv3 are ColumnView objects
    tv = pylibcudf.TableView([cv1, cv2])
    gb = pylibcudf.GroupBy(tv)

    req = pylibcudf.AggregationRequest(cv3, [pylibcudf.GroupbyAggregation.sum()])
    keys, results = gb.aggregate([req])

Here are some draft performance results from our prototype, showing good throughput and low overhead.
image

@GregoryKimball GregoryKimball added feature request New feature or request 0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. labels Aug 18, 2023
@GregoryKimball GregoryKimball moved this to Story Issue in libcudf Aug 18, 2023
rapids-bot bot pushed a commit that referenced this issue Feb 6, 2024
Contributes to #13921

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)

URL: #14972
rapids-bot bot pushed a commit that referenced this issue Feb 6, 2024
Contributes to #13921

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)

URL: #14970
rapids-bot bot pushed a commit that referenced this issue Feb 8, 2024
Contributes to #13921

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #14982
rapids-bot bot pushed a commit that referenced this issue Feb 8, 2024
Contributes to #13921

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #15005
@vyasr
Copy link
Contributor

vyasr commented Feb 13, 2024

Before we can close this issue we will also need to add comprehensive testing. For now, every pylibcudf API is being developed as a back-end for a cuDF Python API, so the existing Python test suite gives us sufficient coverage. We will want to come back and remedy this gap before actually extracting pylibcudf as a separate package.

@vyasr
Copy link
Contributor

vyasr commented Feb 27, 2024

pylibcudf development is now under way. I have created a project board for tracking as well as a number of issues to discuss more specific topics. I am therefore now closing this issue in favor of tracking using those.

@GregoryKimball
Copy link
Contributor Author

FYI #15162 is the spiritual successor to this issue

@vyasr vyasr added the pylibcudf Issues specific to the pylibcudf package label May 28, 2024
@vyasr vyasr moved this from Todo to Done in cuDF Python May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. pylibcudf Issues specific to the pylibcudf package
Projects
Status: Done
Development

No branches or pull requests

2 participants