-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial clean up of ORC writer #7324
Partial clean up of ORC writer #7324
Conversation
…ug-orc-null-bool
…ug-orc-null-bool
…ug-orc-null-bool
…efactor-orc-writer-host
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7324 +/- ##
===============================================
+ Coverage 81.80% 81.86% +0.05%
===============================================
Files 101 101
Lines 16695 16884 +189
===============================================
+ Hits 13658 13822 +164
- Misses 3037 3062 +25
Continue to review full report at Codecov.
|
…efactor-orc-writer-host
…efactor-orc-writer-host
Added @cwharris in case he wants to review the span related changes. |
std::accumulate(stripe_bounds.front().cbegin(), | ||
stripe_bounds.back().cend(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This look odd, maybe I should rename these APIs.
add missing newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
…efactor-orc-writer-host
|
||
operator cudf::detail::host_span<T>() { return {h_data, max_elements}; } | ||
operator cudf::detail::host_span<T const>() const { return {h_data, max_elements}; } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also make sense to store the device side data of hostdevice_vector
in a rmm::device_uvector
instead of a device_buffer
. This will probably re-use the device_span
constructor
template <typename C, std::enable_if_t<is_device_span_supported_container<C>::value>* = nullptr>
constexpr device_span(C& in) : base(thrust::raw_pointer_cast(in.data()), in.size())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, we would still need the conversion operators to device_span
here, but we wouldn't have to pass the size separately. I do like this change in general, we should look into polishing hostdevice_vector
at some point.
@gpucibot merge |
Issue rapidsai#6763 Clean up of the code surrounding the column data encode in the ORC writer: 1. Add a 2D version of `hostdevice_vector` (single allocation); 2. Add 2D versions of `host_span` and `device_span`; 3. Add implicit conversions from `hostdevice_vector` to `host_span` and `device_span`. 4. Use the new types to represent collections that currently use flattened `hostdevice_vectors`; 5. Separated a part of `EncChunk` into a separate class, `encoder_chunk_streams`, as this is the only part used after data encode; 6. Add `orc_streams` to represent per-column streams and compute offsets. 7. Partial `writer_impl.cu` code "modernization". 8. Removed redundant size parameters (since 2dspan and 2dvector hold the size info). 9. use `device_uvector` instead of `device_vector`. Authors: - Vukasin Milovanovic (@vuule) Approvers: - Jake Hemstad (@jrhemstad) - Kumar Aatish (@kaatish) - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) URL: rapidsai#7324
Issue #6763
Clean up of the code surrounding the column data encode in the ORC writer:
hostdevice_vector
(single allocation);host_span
anddevice_span
;hostdevice_vector
tohost_span
anddevice_span
.hostdevice_vectors
;EncChunk
into a separate class,encoder_chunk_streams
, as this is the only part used after data encode;orc_streams
to represent per-column streams and compute offsets.writer_impl.cu
code "modernization".device_uvector
instead ofdevice_vector
.