Refactor pinned memory vector and ORC+Parquet writers #13206

ttnghia · 2023-04-24T17:37:17Z

Currently, cudf::detail::pinned_allocator is used in various places to implement a pinned host vector. This standardizes such usage, removing cudf::detail::pinned_allocator from the usage sites and replacing its usage by a standard cudf::detail::pinned_host_vector instead.

Some small changes are also made for ORC/Parquet writer classes, replacing bool _single_write_mode by SingleWriteMode _single_write_mode.

This is needed for #13076.

This reverts commit 01d35ea.

ajschmidt8

Approving ops-codeowner file changes

PointKernel · 2023-04-24T17:44:13Z

cpp/include/cudf/detail/utilities/pinned_host_vector.hpp

IIUC, pinned memory is always on the host side thus not sure this renaming is really needed.

IMO the name *_host_vector is better expressing its purpose, similar to having thurst::host_vector instead of just thrust::vector.

I read the name as host_vector in pinned memory, so the name looks good.

cpp/src/io/orc/writer_impl.cu

Co-authored-by: Yunsong Wang <[email protected]>

vuule · 2023-04-24T19:14:38Z

cpp/src/io/orc/writer_impl.cu

- * @brief Helper for pinned host memory
- */
-template <typename T>
-using pinned_buffer = std::unique_ptr<T, decltype(&cudaFreeHost)>;


This looks good, with one minor concern. For places that already used host_vector, there's no change. But, for this use case, we are introducing initialization into the memory that does not need to be initialized (and previously wasn't). If you don't mind, please run ORC or Parquet writer benchmarks.

Sorry I didn't see where is the initialization that you mentioned? The new pinned_host_vector uses allocator that also doesn't initialize the internal buffer:

__host__ inline pointer allocate(size_type cnt, const_pointer /*hint*/ = 0) { if (cnt > this->max_size()) { throw std::bad_alloc(); } // end if pointer result(0); CUDF_CUDA_TRY(cudaMallocHost(reinterpret_cast<void**>(&result), cnt * sizeof(value_type))); return result; }

Please see benchmarks here: #13206 (comment)

I thought that host_vector has its own initialization outside of the allocator (same as std::vector). Either way, it does not seem to impact the overall performance.

ttnghia · 2023-04-24T20:26:21Z

@vuule Here is some bechmarks for ORC writer:

['./before.json', 'after.json']
# orc_write_encode

## [0] Quadro RTX 6000

|    data_type    |  cardinality  |  run_length  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-----------------|---------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
| INTEGRAL_SIGNED |       0       |      1       | 127.838 ms |       3.63% | 126.039 ms |       2.71% | -1799.682 us |  -1.41% |   PASS   |
| INTEGRAL_SIGNED |     1000      |      1       | 254.980 ms |       1.54% | 257.532 ms |       1.94% |     2.552 ms |   1.00% |   PASS   |
| INTEGRAL_SIGNED |       0       |      32      |  52.873 ms |       3.76% |  54.257 ms |       4.01% |     1.385 ms |   2.62% |   PASS   |
| INTEGRAL_SIGNED |     1000      |      32      |  55.094 ms |       1.51% |  55.554 ms |       2.56% |   459.615 us |   0.83% |   PASS   |
|     STRING      |       0       |      1       | 545.909 ms |       0.02% | 547.088 ms |       0.37% |     1.178 ms |   0.22% |   FAIL   |
|     STRING      |     1000      |      1       | 813.042 ms |       0.11% | 813.765 ms |       0.15% |   722.949 us |   0.09% |   PASS   |
|     STRING      |       0       |      32      | 545.776 ms |       0.07% | 546.026 ms |       0.10% |   250.000 us |   0.05% |   PASS   |
|     STRING      |     1000      |      32      | 650.413 ms |       0.07% | 650.709 ms |       0.12% |   295.630 us |   0.05% |   PASS   |
|      LIST       |       0       |      1       | 133.945 ms |       0.31% | 134.238 ms |       0.35% |   293.109 us |   0.22% |   PASS   |
|      LIST       |     1000      |      1       | 336.120 ms |       0.10% | 337.486 ms |       0.82% |     1.367 ms |   0.41% |   FAIL   |
|      LIST       |       0       |      32      |  88.915 ms |       0.35% |  88.975 ms |       0.40% |    60.099 us |   0.07% |   PASS   |
|      LIST       |     1000      |      32      | 104.201 ms |       0.34% | 104.000 ms |       0.30% |  -201.286 us |  -0.19% |   PASS   |
|     STRUCT      |       0       |      1       | 547.717 ms |       0.16% | 548.105 ms |       0.15% |   387.793 us |   0.07% |   PASS   |
|     STRUCT      |     1000      |      1       | 902.261 ms |       0.43% | 901.978 ms |       0.08% |  -282.324 us |  -0.03% |   PASS   |
|     STRUCT      |       0       |      32      | 527.460 ms |       0.08% | 528.633 ms |       0.25% |     1.173 ms |   0.22% |   FAIL   |
|     STRUCT      |     1000      |      32      | 607.662 ms |       0.27% | 608.766 ms |       0.52% |     1.103 ms |   0.18% |   PASS   |

# orc_write

## [0] Quadro RTX 6000

|  num_columns  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|---------------|------------|-------------|------------|-------------|------------|---------|----------|
|       8       | 165.981 ms |       0.30% | 166.784 ms |       0.49% | 803.009 us |   0.48% |   FAIL   |
|      64       |    1.558 s |       1.41% |    1.591 s |       3.23% |  32.855 ms |   2.11% |   FAIL   |

# orc_chunked_write

## [0] Quadro RTX 6000

|  num_columns  |  num_chunks  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |          Diff |   %Diff |  Status  |
|---------------|--------------|------------|-------------|------------|-------------|---------------|---------|----------|
|       8       |      8       | 393.974 ms |       0.51% | 425.155 ms |       2.94% |     31.181 ms |   7.91% |   FAIL   |
|      64       |      8       |    3.969 s |        inf% |    4.332 s |        inf% |    362.998 ms |   9.15% |   PASS   |
|       8       |      64      |    1.831 s |       0.12% |    1.976 s |       0.95% |    144.625 ms |   7.90% |   FAIL   |
|      64       |      64      |    7.401 s |        inf% |    7.366 s |        inf% | -34970.378 us |  -0.47% |   PASS   |

vuule

Thank you for taking the time to make these improvements.

ttnghia · 2023-04-25T02:54:49Z

/merge

Similar to #12949, this refactors Parquet writer to support retry mechanism. The internal `writer::impl::write()` function is rewritten such that it is separated into multiple pieces: * A free function that performs compressing/encoding the input table into intermediate results. These intermediate results are totally independent of the writer. * After having the intermediate results in the previous step, these results will be actually applied to the output data sink to start the actual data writing. Closes: * #13042 Depends on: * #13206 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - https://github.com/nvdbaranec URL: #13076

ttnghia added 9 commits April 24, 2023 09:35

Add pinned_host_vector

01d35ea

Revert "Add pinned_host_vector"

e2393ed

This reverts commit 01d35ea.

Move pinned_host_vector into pinned_allocator.hpp

7aa1d75

Switch to use pinned_host_vector

ca18ce9

Adopt pinned_host_vector

d7070e7

Fix bug in type alias

7358c78

Fix bug and rename variable

a266de6

Switch to use SingleWriteMode

4d43687

Merge branch 'branch-23.06' into refactor_pinned_allocator

f6c6422

ttnghia added 3 - Ready for Review Ready for review by team libcudf blocker libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 24, 2023

ttnghia requested a review from a team as a code owner April 24, 2023 17:37

ttnghia self-assigned this Apr 24, 2023

ttnghia requested a review from a team as a code owner April 24, 2023 17:37

ttnghia requested review from robertmaynard, vyasr and vuule April 24, 2023 17:37

github-actions bot added the conda label Apr 24, 2023

ttnghia changed the title ~~Refactor pinned memory vector~~ Refactor pinned memory vector and ORC+Parquet writers Apr 24, 2023

ttnghia mentioned this pull request Apr 24, 2023

Refactor Parquet chunked writer #13076

Merged

ajschmidt8 approved these changes Apr 24, 2023

View reviewed changes

PointKernel reviewed Apr 24, 2023

View reviewed changes

ttnghia and others added 4 commits April 24, 2023 12:08

Update cpp/src/io/orc/writer_impl.cu

d33ad7d

Co-authored-by: Yunsong Wang <[email protected]>

Rename variable

155f99f

Rename SingleWriteMode into single_write_mode

4d110d1

Fix copyright year

21c3139

ttnghia added 3 commits April 24, 2023 11:46

Switch to use host_span

2b01e44

Reorganize parameters

2230619

More span adoption

9170fef

vuule reviewed Apr 24, 2023

View reviewed changes

ttnghia added 3 commits April 24, 2023 13:36

Fix docs

9cc92dd

Reorder parameters again

4282b94

Simplify code by changing device_buffer into device_uvector

156463c

vuule approved these changes Apr 24, 2023

View reviewed changes

ttnghia requested a review from PointKernel April 25, 2023 02:15

PointKernel approved these changes Apr 25, 2023

View reviewed changes

rapids-bot bot merged commit 848dd5a into rapidsai:branch-23.06 Apr 25, 2023

ttnghia deleted the refactor_pinned_allocator branch April 25, 2023 02:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor pinned memory vector and ORC+Parquet writers #13206

Refactor pinned memory vector and ORC+Parquet writers #13206

ttnghia commented Apr 24, 2023 •

edited

Loading

ajschmidt8 left a comment

PointKernel Apr 24, 2023

ttnghia Apr 25, 2023 •

edited

Loading

vuule Apr 25, 2023

vuule Apr 24, 2023

ttnghia Apr 24, 2023

ttnghia Apr 24, 2023

vuule Apr 24, 2023 •

edited

Loading

ttnghia commented Apr 24, 2023

vuule left a comment

ttnghia commented Apr 25, 2023

Refactor pinned memory vector and ORC+Parquet writers #13206

Refactor pinned memory vector and ORC+Parquet writers #13206

Conversation

ttnghia commented Apr 24, 2023 • edited Loading

ajschmidt8 left a comment

Choose a reason for hiding this comment

PointKernel Apr 24, 2023

Choose a reason for hiding this comment

ttnghia Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

vuule Apr 25, 2023

Choose a reason for hiding this comment

vuule Apr 24, 2023

Choose a reason for hiding this comment

ttnghia Apr 24, 2023

Choose a reason for hiding this comment

ttnghia Apr 24, 2023

Choose a reason for hiding this comment

vuule Apr 24, 2023 • edited Loading

Choose a reason for hiding this comment

ttnghia commented Apr 24, 2023

vuule left a comment

Choose a reason for hiding this comment

ttnghia commented Apr 25, 2023

ttnghia commented Apr 24, 2023 •

edited

Loading

ttnghia Apr 25, 2023 •

edited

Loading

vuule Apr 24, 2023 •

edited

Loading