Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Interchange dataframe protocol (#9071)
This PR is a basic implementation of the [interchange dataframe protocol](https://github.com/data-apis/dataframe-api/blob/main/protocol/dataframe_protocol.py) for cudf. As well-known, there are many dataframe libraries out there where one's weakness is handle by another. To work across these libraries, we rely on `pandas` with method like `from_pandas` and `to_pandas`. This is a bad design as libraries should maintain an additional dependency to pandas peculiarities. This protocol provides a high level API that must be implemented by dataframe libraries to allow communication between them. Thus, we get rid of the high coupling with pandas and depend only on the protocol API where each library has the freedom of its implementation details. To illustrate: - `df_obj = cudf_dataframe.__dataframe__()` `df_obj` can be consumed by any library implementing the protocol. - `df = cudf.from_dataframe(any_supported_dataframe)` here we create a `cudf dataframe` from any dataframe object supporting the protocol. So far, it supports the following: - Column dtypes: `uint8`, `int`, `float`, `bool` and `categorical`. - Missing values are handled for all these dtypes. - `string` support is on the way. Additionally, we support dataframe from CPU device like `pandas`. But it is not testable here as pandas has not yet adopted the protocol. We've tested it locally with a pandas monkey patched implementation of the protocol. Authors: - Ismaël Koné (https://github.com/iskode) - Bradley Dice (https://github.com/bdice) Approvers: - Ashwin Srinath (https://github.com/shwina) - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #9071
- Loading branch information