-
Notifications
You must be signed in to change notification settings - Fork 915
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use fsspec.parquet for improved read_parquet performance from remote …
…storage (#9589) **Important Note**: ~Marking this as WIP until the `fsspec.parquet` module is available in a filesystem_spec release~ (fsspec.parquet module is available) This PR modifies `cudf.read_parquet` and `dask_cudf.read_parquet` to leverage the new `fsspec.parquet.open_parquet_file` function for optimized data transfer/caching from remote storage. The ~long-term~ goal is to remove the temporary data-transfer optimizations that we currently use in cudf.read_parquet. **Performance Motivation**: ```python In [1]: import cudf, dask_cudf ...: path = [ ...: "gs://my-bucket/criteo-parquet/day_0.parquet", ...: "gs://my-bucket/criteo-parquet/day_1.parquet", ...: ] # cudf BEFORE In [2]: %time df = cudf.read_parquet(path, columns=["I10"], storage_options=…) CPU times: user 11.1 s, sys: 11.5 s, total: 22.6 s Wall time: 24.4 s # cudf AFTER In [2]: %time df = cudf.read_parquet(path, columns=["I10"], storage_options=…) CPU times: user 3.48 s, sys: 722 ms, total: 4.2 s Wall time: 6.32 s # (Threaded) Dask-cudf BEFORE In [2]: %time df = dask_cudf.read_parquet(path, columns=["I10"], storage_options=…).compute() CPU times: user 27.1 s, sys: 15.5 s, total: 42.6 s Wall time: 57.6 s # (Threaded) Dask-cudf AFTER In [2]: %time df = dask_cudf.read_parquet(path, columns=["I10"], storage_options=…).compute() CPU times: user 3.43 s, sys: 851 ms, total: 4.28 s Wall time: 13.1 s ``` Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - https://github.com/brandon-b-miller - Benjamin Zaitlen (https://github.com/quasiben) URL: #9589
- Loading branch information
Showing
7 changed files
with
257 additions
and
221 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.