You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
For parquet columns that are highly compressible, the page size can be very small. For example, we observe 50-100 KB pages for the NDS-H column l_shipinstruct in lineitems with ZSTD compression. This column has a cardinality of 4, encodes well with run_length, and (may) show other compression opportunities as well.
>>> df['l_shipinstruct'].unique()
0 DELIVER IN PERSON
1 TAKE BACK RETURN
2 NONE
3 COLLECT COD
When performing IO on these pages, libcudf falls back to a host read from pageable instead of a pinned read via kvikIO.
For an async memory resource, using host read from pageable is not that different than the kvikIO option. However, when using a managed memory pool resource, host read from pageable appears to be ~2x slower in the IO step.
Describe the solution you'd like
We should consider adjusting libcudf to always use kvikIO instead.
I believe we should consider refactoring datasource to not use a read threshold, and also decouple the gds labels from dispatch to kvikIO for host reads.
We've collected evidence that with a managed memory pool, the HtoD host copy from pageable to managed is slower than the copy from pageable to device, at least for these small pages.
The text was updated successfully, but these errors were encountered:
…ed (#17260)
Issue #17259
Avoid checking `_gds_read_preferred_threshold` threshold when deciding whether `device_read`/device_write` is preferred to host IO + copy. The reasons are twofold:
1. KvikIO already has an internal threshold for GDS use so we don't need to check on our end as well.
2. Without actual GDS use, kvikIO uses a pinned bounce buffer to efficiently copy to/from the device.
Authors:
- Vukasin Milovanovic (https://github.com/vuule)
Approvers:
- Tianyu Liu (https://github.com/kingcrimsontianyu)
- Basit Ayantunde (https://github.com/lamarrr)
URL: #17260
Is your feature request related to a problem? Please describe.
For parquet columns that are highly compressible, the page size can be very small. For example, we observe 50-100 KB pages for the NDS-H column
l_shipinstruct
inlineitems
with ZSTD compression. This column has a cardinality of 4, encodes well with run_length, and (may) show other compression opportunities as well.When performing IO on these pages, libcudf falls back to a host read from pageable instead of a pinned read via kvikIO.
For an async memory resource, using host read from pageable is not that different than the kvikIO option. However, when using a managed memory pool resource, host read from pageable appears to be ~2x slower in the IO step.
Describe the solution you'd like
We should consider adjusting libcudf to always use kvikIO instead.
I believe we should consider refactoring datasource to not use a read threshold, and also decouple the
gds
labels from dispatch to kvikIO for host reads.Additional context
We've collected evidence that with a managed memory pool, the HtoD host copy from
pageable to managed
is slower than the copy frompageable to device
, at least for these small pages.The text was updated successfully, but these errors were encountered: