Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimize small reads and writes (#190)
Fixes #178 Adding a GDS threshold option, which is the minimum size to use GDS. In order to improve performance of small IO, `.pread()` and `.pwrite()` implements a shortcut that circumvent the threadpool and use the POSIX backend directly. This should remove the final performance regression of the KvikIO backend observed in rapidsai/cudf#12841 <details> <summary>cuDF ORC WRITE performance on a DXG-1</summary> These details _remain_ **hidden** until expanded. ### `LIBCUDF_CUFILE_POLICY=OFF` ``` CUDF_BENCHMARK_DROP_CACHE=1 LIBCUDF_CUFILE_POLICY=OFF ./ORC_WRITER_NVBENCH --devices 7 --benchmark orc_write_io_compression | io | compression | cardinality | run_length | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size | |-------------|-------------|-------------|------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------| | FILEPATH | SNAPPY | 0 | 1 | 13x | 1.176 s | 12.60% | 1.176 s | 12.60% | 456457427 | 1.670 GiB | 486.275 MiB | | FILEPATH | SNAPPY | 1000 | 1 | 13x | 1.176 s | 19.83% | 1.176 s | 19.83% | 456525931 | 1.679 GiB | 354.557 MiB | | FILEPATH | SNAPPY | 0 | 32 | 29x | 506.960 ms | 4.55% | 506.955 ms | 4.55% | 1059011363 | 1.197 GiB | 41.990 MiB | | FILEPATH | SNAPPY | 1000 | 32 | 30x | 499.540 ms | 1.22% | 499.535 ms | 1.22% | 1074740259 | 1.206 GiB | 23.796 MiB | | FILEPATH | NONE | 0 | 1 | 14x | 985.967 ms | 8.75% | 985.965 ms | 8.75% | 544512912 | 690.244 MiB | 489.816 MiB | | FILEPATH | NONE | 1000 | 1 | 13x | 1.049 s | 15.51% | 1.049 s | 15.51% | 511739947 | 684.575 MiB | 483.465 MiB | | FILEPATH | NONE | 0 | 32 | 30x | 494.754 ms | 1.76% | 494.749 ms | 1.76% | 1085137163 | 690.236 MiB | 57.157 MiB | | FILEPATH | NONE | 1000 | 32 | 31x | 487.722 ms | 1.19% | 487.717 ms | 1.19% | 1100783818 | 683.858 MiB | 49.200 MiB | | HOST_BUFFER | SNAPPY | 0 | 1 | 6x | 1.300 s | 0.50% | 1.300 s | 0.50% | 412835052 | 1.670 GiB | 486.275 MiB | | HOST_BUFFER | SNAPPY | 1000 | 1 | 5x | 1.137 s | 0.41% | 1.137 s | 0.41% | 472025812 | 1.679 GiB | 354.557 MiB | | HOST_BUFFER | SNAPPY | 0 | 32 | 32x | 481.990 ms | 1.39% | 481.984 ms | 1.39% | 1113876547 | 1.197 GiB | 41.990 MiB | | HOST_BUFFER | SNAPPY | 1000 | 32 | 32x | 475.133 ms | 1.41% | 475.127 ms | 1.41% | 1129952705 | 1.206 GiB | 23.796 MiB | | HOST_BUFFER | NONE | 0 | 1 | 5x | 1.194 s | 0.30% | 1.194 s | 0.30% | 449806715 | 690.244 MiB | 489.816 MiB | | HOST_BUFFER | NONE | 1000 | 1 | 13x | 1.231 s | 0.73% | 1.231 s | 0.73% | 436166059 | 684.575 MiB | 483.465 MiB | | HOST_BUFFER | NONE | 0 | 32 | 32x | 479.830 ms | 1.05% | 479.824 ms | 1.05% | 1118890411 | 690.236 MiB | 57.157 MiB | | HOST_BUFFER | NONE | 1000 | 32 | 33x | 467.041 ms | 3.32% | 467.036 ms | 3.32% | 1149528753 | 683.858 MiB | 49.200 MiB | | VOID | SNAPPY | 0 | 1 | 34x | 447.131 ms | 0.76% | 447.125 ms | 0.76% | 1200717349 | 1.670 GiB | 486.275 MiB | | VOID | SNAPPY | 1000 | 1 | 25x | 617.968 ms | 0.67% | 617.964 ms | 0.67% | 868774327 | 1.679 GiB | 354.557 MiB | | VOID | SNAPPY | 0 | 32 | 5x | 452.829 ms | 0.46% | 452.823 ms | 0.46% | 1185608038 | 1.197 GiB | 41.990 MiB | | VOID | SNAPPY | 1000 | 32 | 33x | 466.512 ms | 1.52% | 466.506 ms | 1.52% | 1150833558 | 1.206 GiB | 23.796 MiB | | VOID | NONE | 0 | 1 | 46x | 332.880 ms | 1.02% | 332.874 ms | 1.02% | 1612837327 | 690.244 MiB | 489.816 MiB | | VOID | NONE | 1000 | 1 | 41x | 367.183 ms | 0.95% | 367.177 ms | 0.95% | 1462157417 | 684.575 MiB | 483.465 MiB | | VOID | NONE | 0 | 32 | 36x | 421.991 ms | 1.58% | 421.985 ms | 1.58% | 1272251333 | 690.236 MiB | 57.157 MiB | | VOID | NONE | 1000 | 32 | 36x | 423.722 ms | 1.22% | 423.716 ms | 1.22% | 1267053977 | 683.858 MiB | 49.200 MiB | ``` ### `LIBCUDF_CUFILE_POLICY=KIVKIO` (with this PR) ``` CUDF_BENCHMARK_DROP_CACHE=1 LIBCUDF_CUFILE_POLICY=KVIKIO ./ORC_WRITER_NVBENCH --devices 7 --benchmark orc_write_io_compression | io | compression | cardinality | run_length | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size | |-------------|-------------|-------------|------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------| | FILEPATH | SNAPPY | 0 | 1 | 13x | 1.117 s | 6.71% | 1.117 s | 6.71% | 480440387 | 1.670 GiB | 486.275 MiB | | FILEPATH | SNAPPY | 1000 | 1 | 14x | 1.077 s | 2.63% | 1.077 s | 2.63% | 498567238 | 1.679 GiB | 354.557 MiB | | FILEPATH | SNAPPY | 0 | 32 | 30x | 501.035 ms | 1.00% | 501.030 ms | 1.00% | 1071534335 | 1.197 GiB | 41.990 MiB | | FILEPATH | SNAPPY | 1000 | 32 | 30x | 500.984 ms | 1.10% | 500.980 ms | 1.10% | 1071642316 | 1.206 GiB | 23.796 MiB | | FILEPATH | NONE | 0 | 1 | 13x | 1.152 s | 21.69% | 1.152 s | 21.70% | 466206065 | 690.244 MiB | 489.816 MiB | | FILEPATH | NONE | 1000 | 1 | 13x | 1.084 s | 13.24% | 1.084 s | 13.24% | 495359475 | 684.575 MiB | 483.465 MiB | | FILEPATH | NONE | 0 | 32 | 30x | 498.005 ms | 2.03% | 498.000 ms | 2.03% | 1078053921 | 690.236 MiB | 57.157 MiB | | FILEPATH | NONE | 1000 | 32 | 31x | 490.966 ms | 1.87% | 490.961 ms | 1.87% | 1093510944 | 683.858 MiB | 49.200 MiB | | HOST_BUFFER | SNAPPY | 0 | 1 | 5x | 1.333 s | 0.45% | 1.333 s | 0.45% | 402632204 | 1.670 GiB | 486.275 MiB | | HOST_BUFFER | SNAPPY | 1000 | 1 | 5x | 1.153 s | 0.32% | 1.153 s | 0.32% | 465578006 | 1.679 GiB | 354.557 MiB | | HOST_BUFFER | SNAPPY | 0 | 32 | 31x | 482.111 ms | 1.54% | 482.105 ms | 1.54% | 1113597063 | 1.197 GiB | 41.990 MiB | | HOST_BUFFER | SNAPPY | 1000 | 32 | 32x | 477.450 ms | 1.27% | 477.444 ms | 1.27% | 1124468186 | 1.206 GiB | 23.796 MiB | | HOST_BUFFER | NONE | 0 | 1 | 5x | 1.224 s | 0.40% | 1.224 s | 0.40% | 438723846 | 690.244 MiB | 489.816 MiB | | HOST_BUFFER | NONE | 1000 | 1 | 5x | 1.254 s | 0.34% | 1.254 s | 0.34% | 428072718 | 684.575 MiB | 483.465 MiB | | HOST_BUFFER | NONE | 0 | 32 | 31x | 483.396 ms | 1.32% | 483.391 ms | 1.32% | 1110635468 | 690.236 MiB | 57.157 MiB | | HOST_BUFFER | NONE | 1000 | 32 | 32x | 467.038 ms | 1.51% | 467.033 ms | 1.51% | 1149536489 | 683.858 MiB | 49.200 MiB | | VOID | SNAPPY | 0 | 1 | 34x | 447.051 ms | 0.94% | 447.046 ms | 0.94% | 1200929426 | 1.670 GiB | 486.275 MiB | | VOID | SNAPPY | 1000 | 1 | 5x | 617.419 ms | 0.50% | 617.415 ms | 0.50% | 869546716 | 1.679 GiB | 354.557 MiB | | VOID | SNAPPY | 0 | 32 | 34x | 445.136 ms | 1.19% | 445.131 ms | 1.19% | 1206097674 | 1.197 GiB | 41.990 MiB | | VOID | SNAPPY | 1000 | 32 | 33x | 467.527 ms | 1.77% | 467.521 ms | 1.77% | 1148335104 | 1.206 GiB | 23.796 MiB | | VOID | NONE | 0 | 1 | 45x | 333.658 ms | 1.23% | 333.652 ms | 1.23% | 1609076322 | 690.244 MiB | 489.816 MiB | | VOID | NONE | 1000 | 1 | 41x | 367.980 ms | 1.06% | 367.973 ms | 1.06% | 1458994436 | 684.575 MiB | 483.465 MiB | | VOID | NONE | 0 | 32 | 36x | 423.013 ms | 1.67% | 423.007 ms | 1.67% | 1269177781 | 690.236 MiB | 57.157 MiB | | VOID | NONE | 1000 | 32 | 36x | 424.873 ms | 1.23% | 424.868 ms | 1.23% | 1263619162 | 683.858 MiB | 49.200 MiB | ``` ### `LIBCUDF_CUFILE_POLICY=KIVKIO` (**without** this PR) ``` CUDF_BENCHMARK_DROP_CACHE=1 LIBCUDF_CUFILE_POLICY=KVIKIO ./ORC_WRITER_NVBENCH --devices 7 --benchmark orc_write_io_compression | io | compression | cardinality | run_length | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size | |-------------|-------------|-------------|------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------| | FILEPATH | SNAPPY | 0 | 1 | 12x | 1.195 s | 7.58% | 1.195 s | 7.58% | 449191663 | 1.670 GiB | 486.275 MiB | | FILEPATH | SNAPPY | 1000 | 1 | 13x | 1.113 s | 2.17% | 1.113 s | 2.17% | 482223468 | 1.679 GiB | 354.557 MiB | | FILEPATH | SNAPPY | 0 | 32 | 24x | 621.309 ms | 1.45% | 621.304 ms | 1.45% | 864102762 | 1.197 GiB | 41.990 MiB | | FILEPATH | SNAPPY | 1000 | 32 | 27x | 559.675 ms | 1.21% | 559.670 ms | 1.21% | 959263320 | 1.206 GiB | 23.796 MiB | | FILEPATH | NONE | 0 | 1 | 12x | 1.253 s | 17.82% | 1.253 s | 17.82% | 428429247 | 690.244 MiB | 489.816 MiB | | FILEPATH | NONE | 1000 | 1 | 13x | 1.154 s | 9.07% | 1.154 s | 9.07% | 465144594 | 684.575 MiB | 483.465 MiB | | FILEPATH | NONE | 0 | 32 | 23x | 655.856 ms | 1.64% | 655.852 ms | 1.64% | 818585291 | 690.236 MiB | 57.157 MiB | | FILEPATH | NONE | 1000 | 32 | 26x | 587.785 ms | 1.43% | 587.781 ms | 1.43% | 913386635 | 683.858 MiB | 49.200 MiB | | HOST_BUFFER | SNAPPY | 0 | 1 | 5x | 1.327 s | 0.21% | 1.327 s | 0.21% | 404688167 | 1.670 GiB | 486.275 MiB | | HOST_BUFFER | SNAPPY | 1000 | 1 | 5x | 1.152 s | 0.11% | 1.152 s | 0.11% | 466042735 | 1.679 GiB | 354.557 MiB | | HOST_BUFFER | SNAPPY | 0 | 32 | 32x | 482.019 ms | 1.64% | 482.012 ms | 1.65% | 1113811263 | 1.197 GiB | 41.990 MiB | | HOST_BUFFER | SNAPPY | 1000 | 32 | 5x | 473.683 ms | 0.30% | 473.677 ms | 0.30% | 1133411483 | 1.206 GiB | 23.796 MiB | | HOST_BUFFER | NONE | 0 | 1 | 5x | 1.224 s | 0.45% | 1.224 s | 0.45% | 438631758 | 690.244 MiB | 489.816 MiB | | HOST_BUFFER | NONE | 1000 | 1 | 9x | 1.254 s | 0.50% | 1.254 s | 0.50% | 427995911 | 684.575 MiB | 483.465 MiB | | HOST_BUFFER | NONE | 0 | 32 | 32x | 481.819 ms | 1.15% | 481.813 ms | 1.15% | 1114271697 | 690.236 MiB | 57.157 MiB | | HOST_BUFFER | NONE | 1000 | 32 | 5x | 462.816 ms | 0.37% | 462.810 ms | 0.37% | 1160025243 | 683.858 MiB | 49.200 MiB | | VOID | SNAPPY | 0 | 1 | 34x | 447.425 ms | 0.92% | 447.419 ms | 0.92% | 1199928350 | 1.670 GiB | 486.275 MiB | | VOID | SNAPPY | 1000 | 1 | 9x | 618.225 ms | 0.48% | 618.221 ms | 0.48% | 868412207 | 1.679 GiB | 354.557 MiB | | VOID | SNAPPY | 0 | 32 | 34x | 447.361 ms | 2.01% | 447.356 ms | 2.01% | 1200098149 | 1.197 GiB | 41.990 MiB | | VOID | SNAPPY | 1000 | 32 | 33x | 467.867 ms | 1.08% | 467.861 ms | 1.08% | 1147500067 | 1.206 GiB | 23.796 MiB | | VOID | NONE | 0 | 1 | 45x | 335.043 ms | 1.06% | 335.037 ms | 1.06% | 1602424306 | 690.244 MiB | 489.816 MiB | | VOID | NONE | 1000 | 1 | 5x | 366.788 ms | 0.24% | 366.782 ms | 0.24% | 1463733567 | 684.575 MiB | 483.465 MiB | | VOID | NONE | 0 | 32 | 36x | 422.473 ms | 1.26% | 422.467 ms | 1.26% | 1270798385 | 690.236 MiB | 57.157 MiB | | VOID | NONE | 1000 | 32 | 36x | 426.112 ms | 1.70% | 426.107 ms | 1.70% | 1259945083 | 683.858 MiB | 49.200 MiB | ``` </details> cc. @GregoryKimball Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #190
- Loading branch information