Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting a threshold for KvikIO IO #12841

Merged
merged 14 commits into from
Mar 13, 2023

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Feb 24, 2023

Description

For small reads and writes the overhead of using cuFile and/or KvikIO becomes significant. This PR introduces the threshold already used by the GDS to the KVIKIO backend as well.

Closes #12780

Future work

Let's optimize KvikIO for small reads and writes so we don't need this threshold.
Tracking here: rapidsai/kvikio#178

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

cc. @GregoryKimball, @vuule

@madsbk madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 24, 2023
@github-actions github-actions bot added conda libcudf Affects libcudf (C++/CUDA) code. labels Feb 24, 2023
@madsbk madsbk marked this pull request as ready for review February 24, 2023 14:21
@madsbk madsbk requested a review from a team as a code owner February 24, 2023 14:21
@madsbk
Copy link
Member Author

madsbk commented Feb 24, 2023

@GregoryKimball or @vuule, can I get one of you to confirm that it fixes #12841?

@GregoryKimball
Copy link
Contributor

Thank you Mads for sharing this solution! One of us will take a look and get back to you soon.
😄

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran benchmarks locally (no GDS support); big gains on the ORC writer side, no significant impact on the Parquet reader (I assume ORC reader would behave the same way). Performance might look different on a GDS-enabled system.
Looks like a good change to merge.

@madsbk
Copy link
Member Author

madsbk commented Mar 13, 2023

@ttnghia, thanks for the review. I have renamed _gds_io_preferred_threshold to:

_gds_read_preferred_threshold
_gds_write_preferred_threshold

@vuule
Copy link
Contributor

vuule commented Mar 13, 2023

/merge

@rapids-bot rapids-bot bot merged commit 3584739 into rapidsai:branch-23.04 Mar 13, 2023
rapids-bot bot pushed a commit to rapidsai/kvikio that referenced this pull request Apr 14, 2023
Fixes #178

Adding a GDS threshold option, which is the minimum size to use GDS. In order to improve performance of small IO, `.pread()` and `.pwrite()` implements a shortcut that circumvent the threadpool and use the POSIX backend directly.

This should remove the final performance regression of the KvikIO backend observed in rapidsai/cudf#12841





<details>
<summary>cuDF ORC WRITE performance on a DXG-1</summary>

These details _remain_ **hidden** until expanded.

### `LIBCUDF_CUFILE_POLICY=OFF`
```
CUDF_BENCHMARK_DROP_CACHE=1 LIBCUDF_CUFILE_POLICY=OFF ./ORC_WRITER_NVBENCH --devices 7 --benchmark orc_write_io_compression
|     io      | compression | cardinality | run_length | Samples |  CPU Time  | Noise  |  GPU Time  | Noise  | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-------------|-------------|------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------|
|    FILEPATH |      SNAPPY |           0 |          1 |     13x |    1.176 s | 12.60% |    1.176 s | 12.60% |        456457427 |         1.670 GiB |       486.275 MiB |
|    FILEPATH |      SNAPPY |        1000 |          1 |     13x |    1.176 s | 19.83% |    1.176 s | 19.83% |        456525931 |         1.679 GiB |       354.557 MiB |
|    FILEPATH |      SNAPPY |           0 |         32 |     29x | 506.960 ms |  4.55% | 506.955 ms |  4.55% |       1059011363 |         1.197 GiB |        41.990 MiB |
|    FILEPATH |      SNAPPY |        1000 |         32 |     30x | 499.540 ms |  1.22% | 499.535 ms |  1.22% |       1074740259 |         1.206 GiB |        23.796 MiB |
|    FILEPATH |        NONE |           0 |          1 |     14x | 985.967 ms |  8.75% | 985.965 ms |  8.75% |        544512912 |       690.244 MiB |       489.816 MiB |
|    FILEPATH |        NONE |        1000 |          1 |     13x |    1.049 s | 15.51% |    1.049 s | 15.51% |        511739947 |       684.575 MiB |       483.465 MiB |
|    FILEPATH |        NONE |           0 |         32 |     30x | 494.754 ms |  1.76% | 494.749 ms |  1.76% |       1085137163 |       690.236 MiB |        57.157 MiB |
|    FILEPATH |        NONE |        1000 |         32 |     31x | 487.722 ms |  1.19% | 487.717 ms |  1.19% |       1100783818 |       683.858 MiB |        49.200 MiB |
| HOST_BUFFER |      SNAPPY |           0 |          1 |      6x |    1.300 s |  0.50% |    1.300 s |  0.50% |        412835052 |         1.670 GiB |       486.275 MiB |
| HOST_BUFFER |      SNAPPY |        1000 |          1 |      5x |    1.137 s |  0.41% |    1.137 s |  0.41% |        472025812 |         1.679 GiB |       354.557 MiB |
| HOST_BUFFER |      SNAPPY |           0 |         32 |     32x | 481.990 ms |  1.39% | 481.984 ms |  1.39% |       1113876547 |         1.197 GiB |        41.990 MiB |
| HOST_BUFFER |      SNAPPY |        1000 |         32 |     32x | 475.133 ms |  1.41% | 475.127 ms |  1.41% |       1129952705 |         1.206 GiB |        23.796 MiB |
| HOST_BUFFER |        NONE |           0 |          1 |      5x |    1.194 s |  0.30% |    1.194 s |  0.30% |        449806715 |       690.244 MiB |       489.816 MiB |
| HOST_BUFFER |        NONE |        1000 |          1 |     13x |    1.231 s |  0.73% |    1.231 s |  0.73% |        436166059 |       684.575 MiB |       483.465 MiB |
| HOST_BUFFER |        NONE |           0 |         32 |     32x | 479.830 ms |  1.05% | 479.824 ms |  1.05% |       1118890411 |       690.236 MiB |        57.157 MiB |
| HOST_BUFFER |        NONE |        1000 |         32 |     33x | 467.041 ms |  3.32% | 467.036 ms |  3.32% |       1149528753 |       683.858 MiB |        49.200 MiB |
|        VOID |      SNAPPY |           0 |          1 |     34x | 447.131 ms |  0.76% | 447.125 ms |  0.76% |       1200717349 |         1.670 GiB |       486.275 MiB |
|        VOID |      SNAPPY |        1000 |          1 |     25x | 617.968 ms |  0.67% | 617.964 ms |  0.67% |        868774327 |         1.679 GiB |       354.557 MiB |
|        VOID |      SNAPPY |           0 |         32 |      5x | 452.829 ms |  0.46% | 452.823 ms |  0.46% |       1185608038 |         1.197 GiB |        41.990 MiB |
|        VOID |      SNAPPY |        1000 |         32 |     33x | 466.512 ms |  1.52% | 466.506 ms |  1.52% |       1150833558 |         1.206 GiB |        23.796 MiB |
|        VOID |        NONE |           0 |          1 |     46x | 332.880 ms |  1.02% | 332.874 ms |  1.02% |       1612837327 |       690.244 MiB |       489.816 MiB |
|        VOID |        NONE |        1000 |          1 |     41x | 367.183 ms |  0.95% | 367.177 ms |  0.95% |       1462157417 |       684.575 MiB |       483.465 MiB |
|        VOID |        NONE |           0 |         32 |     36x | 421.991 ms |  1.58% | 421.985 ms |  1.58% |       1272251333 |       690.236 MiB |        57.157 MiB |
|        VOID |        NONE |        1000 |         32 |     36x | 423.722 ms |  1.22% | 423.716 ms |  1.22% |       1267053977 |       683.858 MiB |        49.200 MiB |
```

### `LIBCUDF_CUFILE_POLICY=KIVKIO` (with this PR)
```
CUDF_BENCHMARK_DROP_CACHE=1 LIBCUDF_CUFILE_POLICY=KVIKIO ./ORC_WRITER_NVBENCH --devices 7 --benchmark orc_write_io_compression

|     io      | compression | cardinality | run_length | Samples |  CPU Time  | Noise  |  GPU Time  | Noise  | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-------------|-------------|------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------|
|    FILEPATH |      SNAPPY |           0 |          1 |     13x |    1.117 s |  6.71% |    1.117 s |  6.71% |        480440387 |         1.670 GiB |       486.275 MiB |
|    FILEPATH |      SNAPPY |        1000 |          1 |     14x |    1.077 s |  2.63% |    1.077 s |  2.63% |        498567238 |         1.679 GiB |       354.557 MiB |
|    FILEPATH |      SNAPPY |           0 |         32 |     30x | 501.035 ms |  1.00% | 501.030 ms |  1.00% |       1071534335 |         1.197 GiB |        41.990 MiB |
|    FILEPATH |      SNAPPY |        1000 |         32 |     30x | 500.984 ms |  1.10% | 500.980 ms |  1.10% |       1071642316 |         1.206 GiB |        23.796 MiB |
|    FILEPATH |        NONE |           0 |          1 |     13x |    1.152 s | 21.69% |    1.152 s | 21.70% |        466206065 |       690.244 MiB |       489.816 MiB |
|    FILEPATH |        NONE |        1000 |          1 |     13x |    1.084 s | 13.24% |    1.084 s | 13.24% |        495359475 |       684.575 MiB |       483.465 MiB |
|    FILEPATH |        NONE |           0 |         32 |     30x | 498.005 ms |  2.03% | 498.000 ms |  2.03% |       1078053921 |       690.236 MiB |        57.157 MiB |
|    FILEPATH |        NONE |        1000 |         32 |     31x | 490.966 ms |  1.87% | 490.961 ms |  1.87% |       1093510944 |       683.858 MiB |        49.200 MiB |
| HOST_BUFFER |      SNAPPY |           0 |          1 |      5x |    1.333 s |  0.45% |    1.333 s |  0.45% |        402632204 |         1.670 GiB |       486.275 MiB |
| HOST_BUFFER |      SNAPPY |        1000 |          1 |      5x |    1.153 s |  0.32% |    1.153 s |  0.32% |        465578006 |         1.679 GiB |       354.557 MiB |
| HOST_BUFFER |      SNAPPY |           0 |         32 |     31x | 482.111 ms |  1.54% | 482.105 ms |  1.54% |       1113597063 |         1.197 GiB |        41.990 MiB |
| HOST_BUFFER |      SNAPPY |        1000 |         32 |     32x | 477.450 ms |  1.27% | 477.444 ms |  1.27% |       1124468186 |         1.206 GiB |        23.796 MiB |
| HOST_BUFFER |        NONE |           0 |          1 |      5x |    1.224 s |  0.40% |    1.224 s |  0.40% |        438723846 |       690.244 MiB |       489.816 MiB |
| HOST_BUFFER |        NONE |        1000 |          1 |      5x |    1.254 s |  0.34% |    1.254 s |  0.34% |        428072718 |       684.575 MiB |       483.465 MiB |
| HOST_BUFFER |        NONE |           0 |         32 |     31x | 483.396 ms |  1.32% | 483.391 ms |  1.32% |       1110635468 |       690.236 MiB |        57.157 MiB |
| HOST_BUFFER |        NONE |        1000 |         32 |     32x | 467.038 ms |  1.51% | 467.033 ms |  1.51% |       1149536489 |       683.858 MiB |        49.200 MiB |
|        VOID |      SNAPPY |           0 |          1 |     34x | 447.051 ms |  0.94% | 447.046 ms |  0.94% |       1200929426 |         1.670 GiB |       486.275 MiB |
|        VOID |      SNAPPY |        1000 |          1 |      5x | 617.419 ms |  0.50% | 617.415 ms |  0.50% |        869546716 |         1.679 GiB |       354.557 MiB |
|        VOID |      SNAPPY |           0 |         32 |     34x | 445.136 ms |  1.19% | 445.131 ms |  1.19% |       1206097674 |         1.197 GiB |        41.990 MiB |
|        VOID |      SNAPPY |        1000 |         32 |     33x | 467.527 ms |  1.77% | 467.521 ms |  1.77% |       1148335104 |         1.206 GiB |        23.796 MiB |
|        VOID |        NONE |           0 |          1 |     45x | 333.658 ms |  1.23% | 333.652 ms |  1.23% |       1609076322 |       690.244 MiB |       489.816 MiB |
|        VOID |        NONE |        1000 |          1 |     41x | 367.980 ms |  1.06% | 367.973 ms |  1.06% |       1458994436 |       684.575 MiB |       483.465 MiB |
|        VOID |        NONE |           0 |         32 |     36x | 423.013 ms |  1.67% | 423.007 ms |  1.67% |       1269177781 |       690.236 MiB |        57.157 MiB |
|        VOID |        NONE |        1000 |         32 |     36x | 424.873 ms |  1.23% | 424.868 ms |  1.23% |       1263619162 |       683.858 MiB |        49.200 MiB |
```


### `LIBCUDF_CUFILE_POLICY=KIVKIO` (**without** this PR)
```
CUDF_BENCHMARK_DROP_CACHE=1 LIBCUDF_CUFILE_POLICY=KVIKIO ./ORC_WRITER_NVBENCH --devices 7 --benchmark orc_write_io_compression
|     io      | compression | cardinality | run_length | Samples |  CPU Time  | Noise  |  GPU Time  | Noise  | bytes_per_second | peak_memory_usage | encoded_file_size |
|-------------|-------------|-------------|------------|---------|------------|--------|------------|--------|------------------|-------------------|-------------------|
|    FILEPATH |      SNAPPY |           0 |          1 |     12x |    1.195 s |  7.58% |    1.195 s |  7.58% |        449191663 |         1.670 GiB |       486.275 MiB |
|    FILEPATH |      SNAPPY |        1000 |          1 |     13x |    1.113 s |  2.17% |    1.113 s |  2.17% |        482223468 |         1.679 GiB |       354.557 MiB |
|    FILEPATH |      SNAPPY |           0 |         32 |     24x | 621.309 ms |  1.45% | 621.304 ms |  1.45% |        864102762 |         1.197 GiB |        41.990 MiB |
|    FILEPATH |      SNAPPY |        1000 |         32 |     27x | 559.675 ms |  1.21% | 559.670 ms |  1.21% |        959263320 |         1.206 GiB |        23.796 MiB |
|    FILEPATH |        NONE |           0 |          1 |     12x |    1.253 s | 17.82% |    1.253 s | 17.82% |        428429247 |       690.244 MiB |       489.816 MiB |
|    FILEPATH |        NONE |        1000 |          1 |     13x |    1.154 s |  9.07% |    1.154 s |  9.07% |        465144594 |       684.575 MiB |       483.465 MiB |
|    FILEPATH |        NONE |           0 |         32 |     23x | 655.856 ms |  1.64% | 655.852 ms |  1.64% |        818585291 |       690.236 MiB |        57.157 MiB |
|    FILEPATH |        NONE |        1000 |         32 |     26x | 587.785 ms |  1.43% | 587.781 ms |  1.43% |        913386635 |       683.858 MiB |        49.200 MiB |
| HOST_BUFFER |      SNAPPY |           0 |          1 |      5x |    1.327 s |  0.21% |    1.327 s |  0.21% |        404688167 |         1.670 GiB |       486.275 MiB |
| HOST_BUFFER |      SNAPPY |        1000 |          1 |      5x |    1.152 s |  0.11% |    1.152 s |  0.11% |        466042735 |         1.679 GiB |       354.557 MiB |
| HOST_BUFFER |      SNAPPY |           0 |         32 |     32x | 482.019 ms |  1.64% | 482.012 ms |  1.65% |       1113811263 |         1.197 GiB |        41.990 MiB |
| HOST_BUFFER |      SNAPPY |        1000 |         32 |      5x | 473.683 ms |  0.30% | 473.677 ms |  0.30% |       1133411483 |         1.206 GiB |        23.796 MiB |
| HOST_BUFFER |        NONE |           0 |          1 |      5x |    1.224 s |  0.45% |    1.224 s |  0.45% |        438631758 |       690.244 MiB |       489.816 MiB |
| HOST_BUFFER |        NONE |        1000 |          1 |      9x |    1.254 s |  0.50% |    1.254 s |  0.50% |        427995911 |       684.575 MiB |       483.465 MiB |
| HOST_BUFFER |        NONE |           0 |         32 |     32x | 481.819 ms |  1.15% | 481.813 ms |  1.15% |       1114271697 |       690.236 MiB |        57.157 MiB |
| HOST_BUFFER |        NONE |        1000 |         32 |      5x | 462.816 ms |  0.37% | 462.810 ms |  0.37% |       1160025243 |       683.858 MiB |        49.200 MiB |
|        VOID |      SNAPPY |           0 |          1 |     34x | 447.425 ms |  0.92% | 447.419 ms |  0.92% |       1199928350 |         1.670 GiB |       486.275 MiB |
|        VOID |      SNAPPY |        1000 |          1 |      9x | 618.225 ms |  0.48% | 618.221 ms |  0.48% |        868412207 |         1.679 GiB |       354.557 MiB |
|        VOID |      SNAPPY |           0 |         32 |     34x | 447.361 ms |  2.01% | 447.356 ms |  2.01% |       1200098149 |         1.197 GiB |        41.990 MiB |
|        VOID |      SNAPPY |        1000 |         32 |     33x | 467.867 ms |  1.08% | 467.861 ms |  1.08% |       1147500067 |         1.206 GiB |        23.796 MiB |
|        VOID |        NONE |           0 |          1 |     45x | 335.043 ms |  1.06% | 335.037 ms |  1.06% |       1602424306 |       690.244 MiB |       489.816 MiB |
|        VOID |        NONE |        1000 |          1 |      5x | 366.788 ms |  0.24% | 366.782 ms |  0.24% |       1463733567 |       684.575 MiB |       483.465 MiB |
|        VOID |        NONE |           0 |         32 |     36x | 422.473 ms |  1.26% | 422.467 ms |  1.26% |       1270798385 |       690.236 MiB |        57.157 MiB |
|        VOID |        NONE |        1000 |         32 |     36x | 426.112 ms |  1.70% | 426.107 ms |  1.70% |       1259945083 |       683.858 MiB |        49.200 MiB |
```





</details>




cc. @GregoryKimball

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #190
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Writing ORC files with KvikIO is 5x slower
4 participants