-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting a threshold for KvikIO IO #12841
Conversation
dfe192f
to
b38ff0f
Compare
b38ff0f
to
0097878
Compare
@GregoryKimball or @vuule, can I get one of you to confirm that it fixes #12841? |
Thank you Mads for sharing this solution! One of us will take a look and get back to you soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran benchmarks locally (no GDS support); big gains on the ORC writer side, no significant impact on the Parquet reader (I assume ORC reader would behave the same way). Performance might look different on a GDS-enabled system.
Looks like a good change to merge.
Co-authored-by: Nghia Truong <[email protected]>
@ttnghia, thanks for the review. I have renamed _gds_read_preferred_threshold
_gds_write_preferred_threshold |
/merge |
Fixes #178 Adding a GDS threshold option, which is the minimum size to use GDS. In order to improve performance of small IO, `.pread()` and `.pwrite()` implements a shortcut that circumvent the threadpool and use the POSIX backend directly. This should remove the final performance regression of the KvikIO backend observed in rapidsai/cudf#12841 <details> <summary>cuDF ORC WRITE performance on a DXG-1</summary> These details _remain_ **hidden** until expanded. ### `LIBCUDF_CUFILE_POLICY=OFF` ``` CUDF_BENCHMARK_DROP_CACHE=1 LIBCUDF_CUFILE_POLICY=OFF ./ORC_WRITER_NVBENCH --devices 7 --benchmark orc_write_io_compression | io | compression | cardinality | run_length | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size | |-------------|-------------|-------------|------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------| | FILEPATH | SNAPPY | 0 | 1 | 13x | 1.176 s | 12.60% | 1.176 s | 12.60% | 456457427 | 1.670 GiB | 486.275 MiB | | FILEPATH | SNAPPY | 1000 | 1 | 13x | 1.176 s | 19.83% | 1.176 s | 19.83% | 456525931 | 1.679 GiB | 354.557 MiB | | FILEPATH | SNAPPY | 0 | 32 | 29x | 506.960 ms | 4.55% | 506.955 ms | 4.55% | 1059011363 | 1.197 GiB | 41.990 MiB | | FILEPATH | SNAPPY | 1000 | 32 | 30x | 499.540 ms | 1.22% | 499.535 ms | 1.22% | 1074740259 | 1.206 GiB | 23.796 MiB | | FILEPATH | NONE | 0 | 1 | 14x | 985.967 ms | 8.75% | 985.965 ms | 8.75% | 544512912 | 690.244 MiB | 489.816 MiB | | FILEPATH | NONE | 1000 | 1 | 13x | 1.049 s | 15.51% | 1.049 s | 15.51% | 511739947 | 684.575 MiB | 483.465 MiB | | FILEPATH | NONE | 0 | 32 | 30x | 494.754 ms | 1.76% | 494.749 ms | 1.76% | 1085137163 | 690.236 MiB | 57.157 MiB | | FILEPATH | NONE | 1000 | 32 | 31x | 487.722 ms | 1.19% | 487.717 ms | 1.19% | 1100783818 | 683.858 MiB | 49.200 MiB | | HOST_BUFFER | SNAPPY | 0 | 1 | 6x | 1.300 s | 0.50% | 1.300 s | 0.50% | 412835052 | 1.670 GiB | 486.275 MiB | | HOST_BUFFER | SNAPPY | 1000 | 1 | 5x | 1.137 s | 0.41% | 1.137 s | 0.41% | 472025812 | 1.679 GiB | 354.557 MiB | | HOST_BUFFER | SNAPPY | 0 | 32 | 32x | 481.990 ms | 1.39% | 481.984 ms | 1.39% | 1113876547 | 1.197 GiB | 41.990 MiB | | HOST_BUFFER | SNAPPY | 1000 | 32 | 32x | 475.133 ms | 1.41% | 475.127 ms | 1.41% | 1129952705 | 1.206 GiB | 23.796 MiB | | HOST_BUFFER | NONE | 0 | 1 | 5x | 1.194 s | 0.30% | 1.194 s | 0.30% | 449806715 | 690.244 MiB | 489.816 MiB | | HOST_BUFFER | NONE | 1000 | 1 | 13x | 1.231 s | 0.73% | 1.231 s | 0.73% | 436166059 | 684.575 MiB | 483.465 MiB | | HOST_BUFFER | NONE | 0 | 32 | 32x | 479.830 ms | 1.05% | 479.824 ms | 1.05% | 1118890411 | 690.236 MiB | 57.157 MiB | | HOST_BUFFER | NONE | 1000 | 32 | 33x | 467.041 ms | 3.32% | 467.036 ms | 3.32% | 1149528753 | 683.858 MiB | 49.200 MiB | | VOID | SNAPPY | 0 | 1 | 34x | 447.131 ms | 0.76% | 447.125 ms | 0.76% | 1200717349 | 1.670 GiB | 486.275 MiB | | VOID | SNAPPY | 1000 | 1 | 25x | 617.968 ms | 0.67% | 617.964 ms | 0.67% | 868774327 | 1.679 GiB | 354.557 MiB | | VOID | SNAPPY | 0 | 32 | 5x | 452.829 ms | 0.46% | 452.823 ms | 0.46% | 1185608038 | 1.197 GiB | 41.990 MiB | | VOID | SNAPPY | 1000 | 32 | 33x | 466.512 ms | 1.52% | 466.506 ms | 1.52% | 1150833558 | 1.206 GiB | 23.796 MiB | | VOID | NONE | 0 | 1 | 46x | 332.880 ms | 1.02% | 332.874 ms | 1.02% | 1612837327 | 690.244 MiB | 489.816 MiB | | VOID | NONE | 1000 | 1 | 41x | 367.183 ms | 0.95% | 367.177 ms | 0.95% | 1462157417 | 684.575 MiB | 483.465 MiB | | VOID | NONE | 0 | 32 | 36x | 421.991 ms | 1.58% | 421.985 ms | 1.58% | 1272251333 | 690.236 MiB | 57.157 MiB | | VOID | NONE | 1000 | 32 | 36x | 423.722 ms | 1.22% | 423.716 ms | 1.22% | 1267053977 | 683.858 MiB | 49.200 MiB | ``` ### `LIBCUDF_CUFILE_POLICY=KIVKIO` (with this PR) ``` CUDF_BENCHMARK_DROP_CACHE=1 LIBCUDF_CUFILE_POLICY=KVIKIO ./ORC_WRITER_NVBENCH --devices 7 --benchmark orc_write_io_compression | io | compression | cardinality | run_length | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size | |-------------|-------------|-------------|------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------| | FILEPATH | SNAPPY | 0 | 1 | 13x | 1.117 s | 6.71% | 1.117 s | 6.71% | 480440387 | 1.670 GiB | 486.275 MiB | | FILEPATH | SNAPPY | 1000 | 1 | 14x | 1.077 s | 2.63% | 1.077 s | 2.63% | 498567238 | 1.679 GiB | 354.557 MiB | | FILEPATH | SNAPPY | 0 | 32 | 30x | 501.035 ms | 1.00% | 501.030 ms | 1.00% | 1071534335 | 1.197 GiB | 41.990 MiB | | FILEPATH | SNAPPY | 1000 | 32 | 30x | 500.984 ms | 1.10% | 500.980 ms | 1.10% | 1071642316 | 1.206 GiB | 23.796 MiB | | FILEPATH | NONE | 0 | 1 | 13x | 1.152 s | 21.69% | 1.152 s | 21.70% | 466206065 | 690.244 MiB | 489.816 MiB | | FILEPATH | NONE | 1000 | 1 | 13x | 1.084 s | 13.24% | 1.084 s | 13.24% | 495359475 | 684.575 MiB | 483.465 MiB | | FILEPATH | NONE | 0 | 32 | 30x | 498.005 ms | 2.03% | 498.000 ms | 2.03% | 1078053921 | 690.236 MiB | 57.157 MiB | | FILEPATH | NONE | 1000 | 32 | 31x | 490.966 ms | 1.87% | 490.961 ms | 1.87% | 1093510944 | 683.858 MiB | 49.200 MiB | | HOST_BUFFER | SNAPPY | 0 | 1 | 5x | 1.333 s | 0.45% | 1.333 s | 0.45% | 402632204 | 1.670 GiB | 486.275 MiB | | HOST_BUFFER | SNAPPY | 1000 | 1 | 5x | 1.153 s | 0.32% | 1.153 s | 0.32% | 465578006 | 1.679 GiB | 354.557 MiB | | HOST_BUFFER | SNAPPY | 0 | 32 | 31x | 482.111 ms | 1.54% | 482.105 ms | 1.54% | 1113597063 | 1.197 GiB | 41.990 MiB | | HOST_BUFFER | SNAPPY | 1000 | 32 | 32x | 477.450 ms | 1.27% | 477.444 ms | 1.27% | 1124468186 | 1.206 GiB | 23.796 MiB | | HOST_BUFFER | NONE | 0 | 1 | 5x | 1.224 s | 0.40% | 1.224 s | 0.40% | 438723846 | 690.244 MiB | 489.816 MiB | | HOST_BUFFER | NONE | 1000 | 1 | 5x | 1.254 s | 0.34% | 1.254 s | 0.34% | 428072718 | 684.575 MiB | 483.465 MiB | | HOST_BUFFER | NONE | 0 | 32 | 31x | 483.396 ms | 1.32% | 483.391 ms | 1.32% | 1110635468 | 690.236 MiB | 57.157 MiB | | HOST_BUFFER | NONE | 1000 | 32 | 32x | 467.038 ms | 1.51% | 467.033 ms | 1.51% | 1149536489 | 683.858 MiB | 49.200 MiB | | VOID | SNAPPY | 0 | 1 | 34x | 447.051 ms | 0.94% | 447.046 ms | 0.94% | 1200929426 | 1.670 GiB | 486.275 MiB | | VOID | SNAPPY | 1000 | 1 | 5x | 617.419 ms | 0.50% | 617.415 ms | 0.50% | 869546716 | 1.679 GiB | 354.557 MiB | | VOID | SNAPPY | 0 | 32 | 34x | 445.136 ms | 1.19% | 445.131 ms | 1.19% | 1206097674 | 1.197 GiB | 41.990 MiB | | VOID | SNAPPY | 1000 | 32 | 33x | 467.527 ms | 1.77% | 467.521 ms | 1.77% | 1148335104 | 1.206 GiB | 23.796 MiB | | VOID | NONE | 0 | 1 | 45x | 333.658 ms | 1.23% | 333.652 ms | 1.23% | 1609076322 | 690.244 MiB | 489.816 MiB | | VOID | NONE | 1000 | 1 | 41x | 367.980 ms | 1.06% | 367.973 ms | 1.06% | 1458994436 | 684.575 MiB | 483.465 MiB | | VOID | NONE | 0 | 32 | 36x | 423.013 ms | 1.67% | 423.007 ms | 1.67% | 1269177781 | 690.236 MiB | 57.157 MiB | | VOID | NONE | 1000 | 32 | 36x | 424.873 ms | 1.23% | 424.868 ms | 1.23% | 1263619162 | 683.858 MiB | 49.200 MiB | ``` ### `LIBCUDF_CUFILE_POLICY=KIVKIO` (**without** this PR) ``` CUDF_BENCHMARK_DROP_CACHE=1 LIBCUDF_CUFILE_POLICY=KVIKIO ./ORC_WRITER_NVBENCH --devices 7 --benchmark orc_write_io_compression | io | compression | cardinality | run_length | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size | |-------------|-------------|-------------|------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------| | FILEPATH | SNAPPY | 0 | 1 | 12x | 1.195 s | 7.58% | 1.195 s | 7.58% | 449191663 | 1.670 GiB | 486.275 MiB | | FILEPATH | SNAPPY | 1000 | 1 | 13x | 1.113 s | 2.17% | 1.113 s | 2.17% | 482223468 | 1.679 GiB | 354.557 MiB | | FILEPATH | SNAPPY | 0 | 32 | 24x | 621.309 ms | 1.45% | 621.304 ms | 1.45% | 864102762 | 1.197 GiB | 41.990 MiB | | FILEPATH | SNAPPY | 1000 | 32 | 27x | 559.675 ms | 1.21% | 559.670 ms | 1.21% | 959263320 | 1.206 GiB | 23.796 MiB | | FILEPATH | NONE | 0 | 1 | 12x | 1.253 s | 17.82% | 1.253 s | 17.82% | 428429247 | 690.244 MiB | 489.816 MiB | | FILEPATH | NONE | 1000 | 1 | 13x | 1.154 s | 9.07% | 1.154 s | 9.07% | 465144594 | 684.575 MiB | 483.465 MiB | | FILEPATH | NONE | 0 | 32 | 23x | 655.856 ms | 1.64% | 655.852 ms | 1.64% | 818585291 | 690.236 MiB | 57.157 MiB | | FILEPATH | NONE | 1000 | 32 | 26x | 587.785 ms | 1.43% | 587.781 ms | 1.43% | 913386635 | 683.858 MiB | 49.200 MiB | | HOST_BUFFER | SNAPPY | 0 | 1 | 5x | 1.327 s | 0.21% | 1.327 s | 0.21% | 404688167 | 1.670 GiB | 486.275 MiB | | HOST_BUFFER | SNAPPY | 1000 | 1 | 5x | 1.152 s | 0.11% | 1.152 s | 0.11% | 466042735 | 1.679 GiB | 354.557 MiB | | HOST_BUFFER | SNAPPY | 0 | 32 | 32x | 482.019 ms | 1.64% | 482.012 ms | 1.65% | 1113811263 | 1.197 GiB | 41.990 MiB | | HOST_BUFFER | SNAPPY | 1000 | 32 | 5x | 473.683 ms | 0.30% | 473.677 ms | 0.30% | 1133411483 | 1.206 GiB | 23.796 MiB | | HOST_BUFFER | NONE | 0 | 1 | 5x | 1.224 s | 0.45% | 1.224 s | 0.45% | 438631758 | 690.244 MiB | 489.816 MiB | | HOST_BUFFER | NONE | 1000 | 1 | 9x | 1.254 s | 0.50% | 1.254 s | 0.50% | 427995911 | 684.575 MiB | 483.465 MiB | | HOST_BUFFER | NONE | 0 | 32 | 32x | 481.819 ms | 1.15% | 481.813 ms | 1.15% | 1114271697 | 690.236 MiB | 57.157 MiB | | HOST_BUFFER | NONE | 1000 | 32 | 5x | 462.816 ms | 0.37% | 462.810 ms | 0.37% | 1160025243 | 683.858 MiB | 49.200 MiB | | VOID | SNAPPY | 0 | 1 | 34x | 447.425 ms | 0.92% | 447.419 ms | 0.92% | 1199928350 | 1.670 GiB | 486.275 MiB | | VOID | SNAPPY | 1000 | 1 | 9x | 618.225 ms | 0.48% | 618.221 ms | 0.48% | 868412207 | 1.679 GiB | 354.557 MiB | | VOID | SNAPPY | 0 | 32 | 34x | 447.361 ms | 2.01% | 447.356 ms | 2.01% | 1200098149 | 1.197 GiB | 41.990 MiB | | VOID | SNAPPY | 1000 | 32 | 33x | 467.867 ms | 1.08% | 467.861 ms | 1.08% | 1147500067 | 1.206 GiB | 23.796 MiB | | VOID | NONE | 0 | 1 | 45x | 335.043 ms | 1.06% | 335.037 ms | 1.06% | 1602424306 | 690.244 MiB | 489.816 MiB | | VOID | NONE | 1000 | 1 | 5x | 366.788 ms | 0.24% | 366.782 ms | 0.24% | 1463733567 | 684.575 MiB | 483.465 MiB | | VOID | NONE | 0 | 32 | 36x | 422.473 ms | 1.26% | 422.467 ms | 1.26% | 1270798385 | 690.236 MiB | 57.157 MiB | | VOID | NONE | 1000 | 32 | 36x | 426.112 ms | 1.70% | 426.107 ms | 1.70% | 1259945083 | 683.858 MiB | 49.200 MiB | ``` </details> cc. @GregoryKimball Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #190
Description
For small reads and writes the overhead of using cuFile and/or KvikIO becomes significant. This PR introduces the threshold already used by the
GDS
to theKVIKIO
backend as well.Closes #12780
Future work
Let's optimize KvikIO for small reads and writes so we don't need this threshold.
Tracking here: rapidsai/kvikio#178
Checklist
cc. @GregoryKimball, @vuule