Skip to content

Commit

Permalink
Register read_parquet and read_csv with dask-expr (#16535)
Browse files Browse the repository at this point in the history
After dask/dask-expr#1114, Dask cuDF must register specific `read_parquet` and `read_csv` functions to be used when query-planning is enabled (the default).

**This PR is required for CI to pass with dask>2024.8.0**

**NOTE**: It probably doesn't make sense to add specific tests for this change. Once the 2014.7.1 dask pin is removed, all `dask_cudf` tests using `read_parquet` and  `read_csv` will fail without this change...

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Benjamin Zaitlen (https://github.com/quasiben)

URL: #16535
  • Loading branch information
rjzamora authored Aug 13, 2024
1 parent 3801f81 commit 5780c4d
Showing 1 changed file with 35 additions and 0 deletions.
35 changes: 35 additions & 0 deletions python/dask_cudf/dask_cudf/backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -667,6 +667,41 @@ def from_dict(
constructor=constructor,
)

@staticmethod
def read_parquet(*args, engine=None, **kwargs):
import dask_expr as dx

from dask_cudf.io.parquet import CudfEngine

return _default_backend(
dx.read_parquet, *args, engine=CudfEngine, **kwargs
)

@staticmethod
def read_csv(
path,
*args,
header="infer",
dtype_backend=None,
storage_options=None,
**kwargs,
):
import dask_expr as dx
from fsspec.utils import stringify_path

if not isinstance(path, str):
path = stringify_path(path)
return dx.new_collection(
dx.io.csv.ReadCSV(
path,
dtype_backend=dtype_backend,
storage_options=storage_options,
kwargs=kwargs,
header=header,
dataframe_backend="cudf",
)
)

@staticmethod
def read_json(*args, **kwargs):
from dask_cudf.io.json import read_json as read_json_impl
Expand Down

0 comments on commit 5780c4d

Please sign in to comment.