Reduce peak memory use when writing compressed ORC files. #12963

vuule · 2023-03-17T06:41:29Z

Description

This PR changes how the buffer for encoded data is allocated in the ORC writer. Instead of a single buffer for the whole table, each stream of each stripe is allocated separately.
Since size of the encoded data is not known in advance, buffers are over sized in most cases (decimal types and dictionary encoded data being the exceptions). Resizing these buffers to the exact encoded data size before compression reduces peak memory usage.
The resizing of the encoded buffers is done in the step where row groups are gathered to make contiguous encoded stripe in memory. This way we don't incur additional copies (compared to previous approach to gather_stripes).

Other changes:
Removed compute_offsets because it is not needed with separate buffers for each stripe/stream.
Refactored parts of encode_columns to initialize data buffers + stream descriptors one stripe at a time, allowing future separation into per-stripe processing (for e.g. pipelining).

Impact: internal benchmarks show average reduction of peak memory use of 14% when SNAPPY compression is enabled, with minimal impact on performance.
cub::DeviceMemcpy::Batched can now be used in the ORC writer.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…reduce-orc-writer-mem

vuule · 2023-03-17T22:35:40Z

This does increase memory usage when compression is not used.
Overview:
With SNAPPY, average peak memory use goes from 1.66GB to 1.43GB. 👍
Without compression, average peak memory use goes from 867MB to 1GB. 👎

…into reduce-orc-writer-mem

…reduce-orc-writer-mem

vuule · 2023-03-24T22:51:15Z

cpp/src/io/orc/stripe_enc.cu

  }
-  if (!t) { strm_desc[stripe_id][stream_id].stream_size = dst_ptr - strm0.data_ptrs[cid]; }


no need to set the stream size, its been computed on the host

vuule · 2023-03-24T23:22:58Z

cpp/src/io/orc/writer_impl.cu

+            // Allow extra space for alignment
+            stripe_size += strm.lengths[strm_type] + uncomp_block_align - 1;


This is a potential bug fix for extreme corner cases where alignment can push the writing of encoded data into the next stream.

Potential bug as in we haven't seen this before in practice? Do we have a test for it? Should we?

It would be nice to have this test, but it's not trivial to come up with the failing input.
Maybe a decimal column + ZSTD, since we use the exact size (and it doesn't have to be a multiple of 4, unlike floats)? I'll look into this, just not for this PR :)

cpp/src/io/orc/writer_impl.cu

…reduce-orc-writer-mem

PointKernel

The new logic is much simpler. Only some non-blocking suggestions.

Thanks for the work!

cpp/src/io/orc/orc_gpu.hpp

cpp/src/io/orc/writer_impl.hpp

cpp/src/io/orc/stripe_enc.cu

PointKernel · 2023-03-29T18:41:00Z

cpp/src/io/orc/writer_impl.cu

-        auto const& ck    = chunks[col_idx][rg_idx];
-        auto& strm        = col_streams[rg_idx];
+  // per-stripe, per-stream owning buffers
+  std::vector<std::vector<rmm::device_uvector<uint8_t>>> encoded_data(segmentation.num_stripes());


Not in this PR, but this reminds me that we could extract the owning buffer strong type here (

cudf/cpp/include/cudf/io/datasource.hpp

Line 352 in e37bddb

class owning_buffer : public buffer {

) and apply it across cuIO code. Thus the above code would be a 2D vector of owning_buffer. This also helps unify data type by using std::byte consistently.

Also, owning buffer could be enforced via unique ptr.

I don't think I get this one. rmm::device_uvector already implies an owning buffer. We know that this type deallocates the memory in the destructor.
In the datasource::buffer we have the distinction between owning and non-owning buffers because we use the abstract buffer type, which does not imply ownership, only the access API to the contained data.

rmm::device_uvector already implies an owning buffer

My intention was to use a strong type like:

struct owning_buffer{ ... std::unique_ptr<rmm::device_uvector<uint8_t>> data; };

to explicitly represent owning buffers in cuIO.

cpp/src/io/orc/writer_impl.cu

ttnghia · 2023-03-29T22:07:30Z

cpp/src/io/orc/writer_impl.cu

+std::vector<StripeInformation> gather_stripes(size_t num_index_streams,
+                                              file_segmentation const& segmentation,
+                                              encoded_data* enc_data,
+                                              hostdevice_2dvector<gpu::StripeStream>* strm_desc,


Nit: For non-iterator parameter, try to avoid pointer.

Suggested change

hostdevice_2dvector<gpu::StripeStream>* strm_desc,

hostdevice_2dvector<gpu::StripeStream>& strm_desc,

We used a pointer here to make it obvious at the call site that the parameter will the modified. I personally prefer references for non-optional parameters.
Edit: to clarify - the use of a pointer here was requested in the code review, I initially used a reference.

Co-authored-by: Nghia Truong <[email protected]>

…into reduce-orc-writer-mem

vuule · 2023-03-29T23:11:20Z

Ran unit tests with cuda-memcheck on this branch, no errors found 👍

hyperbolic2346

Looks good to me. Have a question about the bug fix, but if we want to do something there I would suggest a new issue.

hyperbolic2346 · 2023-03-30T04:01:06Z

cpp/src/io/orc/stripe_enc.cu

-// blockDim {1024,1,1}
-__global__ void __launch_bounds__(1024)
+// blockDim {compact_streams_block_size,1,1}
+__global__ void __launch_bounds__(compact_streams_block_size)


hyperbolic2346 · 2023-03-30T04:09:05Z

cpp/src/io/orc/writer_impl.cu

+            // Allow extra space for alignment
+            stripe_size += strm.lengths[strm_type] + uncomp_block_align - 1;


Potential bug as in we haven't seen this before in practice? Do we have a test for it? Should we?

vuule · 2023-04-03T15:25:56Z

/merge

karthikeyann · 2023-03-31T19:25:32Z

cpp/src/io/orc/writer_impl.cu

+    for (size_t col_idx = 0; col_idx < num_columns; col_idx++) {
+      for (int strm_type = 0; strm_type < gpu::CI_NUM_STREAMS; ++strm_type) {


why mix post-increment and pre-increment here?

karthikeyann · 2023-03-31T19:28:56Z

cpp/src/io/orc/writer_impl.cu

+  // gathered stripes - per-stripe, per-stream (same as encoded_data.data)
+  std::vector<std::vector<rmm::device_uvector<uint8_t>>> gathered_stripes(enc_data->data.size());
+  for (auto& stripe_data : gathered_stripes) {
+    std::generate_n(std::back_inserter(stripe_data), enc_data->data[0].size(), [&]() {


would a stripe_data.reserve(enc_data->data[0].size()); help here?

vuule added 14 commits February 1, 2023 01:04

top iter stripes, bottom row groups

f05525e

separate sizes from offsets calc

66feb06

Merge branch 'branch-23.04' of https://github.com/rapidsai/cudf into …

5e057c7

…reduce-orc-writer-mem

per-stream enc data buffer

a5a1ba0

per-stripe buffer

71f4c4f

Merge branch 'branch-23.04' of https://github.com/rapidsai/cudf into …

fc3fc2a

…reduce-orc-writer-mem

Merge branch 'branch-23.04' of https://github.com/rapidsai/cudf into …

beea72a

…reduce-orc-writer-mem

merge

7694c1d

Merge branch 'branch-23.04' of https://github.com/rapidsai/cudf into …

5bee01e

…reduce-orc-writer-mem

Merge branch 'branch-23.04' of https://github.com/rapidsai/cudf into …

866f827

…reduce-orc-writer-mem

POC

8ed0857

optimization

d4c5075

Merge branch 'branch-23.04' of https://github.com/rapidsai/cudf into …

c120e08

…reduce-orc-writer-mem

todo comment

579fd42

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 17, 2023

vuule added cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Mar 17, 2023

style

94e18b7

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 17, 2023

vuule added 6 commits March 17, 2023 00:16

remove compute_offsets

4a2f652

slight clean up

f6a8765

minor kernel simplification

bfcc351

bit o' cleanup

bdfa0b6

TODO

a919cbb

Merge branch 'branch-23.04' into reduce-orc-writer-mem

f358232

vuule marked this pull request as ready for review March 19, 2023 07:41

vuule requested a review from a team as a code owner March 19, 2023 07:41

vuule added 3 commits March 21, 2023 18:12

Merge branch 'reduce-orc-writer-mem' of https://github.com/vuule/cudf …

5097a23

…into reduce-orc-writer-mem

Merge branch 'branch-23.04' of https://github.com/rapidsai/cudf into …

8372172

…reduce-orc-writer-mem

Merge branch 'branch-23.04' into reduce-orc-writer-mem

b48fbf7

vuule commented Mar 24, 2023

View reviewed changes

add comment

43aa1f7

ttnghia reviewed Mar 27, 2023

View reviewed changes

cpp/src/io/orc/writer_impl.cu Show resolved Hide resolved

ttnghia reviewed Mar 27, 2023

View reviewed changes

cpp/src/io/orc/writer_impl.cu Show resolved Hide resolved

vuule added 2 commits March 28, 2023 00:01

Merge branch 'branch-23.04' of https://github.com/rapidsai/cudf into …

25af66e

…reduce-orc-writer-mem

doc update

60a23ed

vuule requested a review from ttnghia March 28, 2023 07:03

GregoryKimball requested a review from PointKernel March 28, 2023 18:47

PointKernel approved these changes Mar 29, 2023

View reviewed changes

ttnghia reviewed Mar 29, 2023

View reviewed changes

cpp/src/io/orc/writer_impl.cu Outdated Show resolved Hide resolved

ttnghia reviewed Mar 29, 2023

View reviewed changes

ttnghia approved these changes Mar 29, 2023

View reviewed changes

vuule and others added 4 commits March 29, 2023 15:26

pass bool comment

4e30777

Co-authored-by: Nghia Truong <[email protected]>

remove magic number

9efde71

Merge branch 'reduce-orc-writer-mem' of https://github.com/vuule/cudf …

12655fc

…into reduce-orc-writer-mem

style

0522524

vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Mar 29, 2023

hyperbolic2346 approved these changes Mar 30, 2023

View reviewed changes

vuule changed the base branch from branch-23.04 to branch-23.06 March 30, 2023 18:29

vuule added 2 commits March 30, 2023 11:29

Merge branch 'branch-23.06' into reduce-orc-writer-mem

ba4c078

Merge branch 'branch-23.06' into reduce-orc-writer-mem

313131c

rapids-bot bot merged commit 09b114e into rapidsai:branch-23.06 Apr 3, 2023

vuule deleted the reduce-orc-writer-mem branch April 3, 2023 15:26

karthikeyann reviewed Apr 4, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce peak memory use when writing compressed ORC files. #12963

Reduce peak memory use when writing compressed ORC files. #12963

vuule commented Mar 17, 2023 •

edited

Loading

vuule commented Mar 17, 2023

vuule Mar 24, 2023

vuule Mar 24, 2023

hyperbolic2346 Mar 30, 2023

vuule Mar 30, 2023 •

edited

Loading

PointKernel left a comment

PointKernel Mar 29, 2023

vuule Mar 29, 2023

PointKernel Mar 29, 2023

ttnghia Mar 29, 2023

vuule Mar 29, 2023 •

edited

Loading

vuule commented Mar 29, 2023

hyperbolic2346 left a comment

hyperbolic2346 Mar 30, 2023

hyperbolic2346 Mar 30, 2023

vuule commented Apr 3, 2023

karthikeyann Mar 31, 2023

karthikeyann Mar 31, 2023

		}
		if (!t) { strm_desc[stripe_id][stream_id].stream_size = dst_ptr - strm0.data_ptrs[cid]; }

		// Allow extra space for alignment
		stripe_size += strm.lengths[strm_type] + uncomp_block_align - 1;

	hostdevice_2dvector<gpu::StripeStream>* strm_desc,
	hostdevice_2dvector<gpu::StripeStream>& strm_desc,

		for (size_t col_idx = 0; col_idx < num_columns; col_idx++) {
		for (int strm_type = 0; strm_type < gpu::CI_NUM_STREAMS; ++strm_type) {

Reduce peak memory use when writing compressed ORC files. #12963

Reduce peak memory use when writing compressed ORC files. #12963

Conversation

vuule commented Mar 17, 2023 • edited Loading

Description

Checklist

vuule commented Mar 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vuule Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

PointKernel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vuule Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

vuule commented Mar 29, 2023

hyperbolic2346 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vuule commented Apr 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vuule commented Mar 17, 2023 •

edited

Loading

vuule Mar 30, 2023 •

edited

Loading

vuule Mar 29, 2023 •

edited

Loading