-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort dictionary data alphabetically in the ORC writer #14295
Conversation
auto const is_str_dict = | ||
ck.type_kind == TypeKind::STRING and ck.encoding_kind == DICTIONARY_V2; | ||
ck.dict_index = is_str_dict ? column.host_stripe_dict(stripe.id).index.data() : nullptr; | ||
ck.dict_data_order = | ||
is_str_dict ? column.host_stripe_dict(stripe.id).data_order.data() : nullptr; | ||
ck.dtype_len = (ck.type_kind == TypeKind::STRING) ? 1 : column.type_width(); | ||
ck.scale = column.scale(); | ||
ck.decimal_offsets = | ||
(ck.type_kind == TypeKind::DECIMAL) ? column.decimal_offsets() : nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these were left uninitialized when unused, changed to always initialize.
…bug-sort-orc-dict
…o bug-sort-orc-dict
For my internal test, our diff vs the CPU went from 22% to 5%, which is really impressive. Thanks for working on this.
Do you expect a 3% slow down to the write because of the sort for dictionary encoded data? |
The slowdown is up to 22% unfortunately. Sorting is not cheap :( |
…bug-sort-orc-dict
Co-authored-by: David Wendt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one question
cpp/src/io/orc/writer_impl.cu
Outdated
stripe_dicts.host_to_device_async(stream); | ||
|
||
// Sort stripe dictionaries alphabetically | ||
auto streams = cudf::detail::fork_streams(stream, std::min<size_t>(dict_order_owner.size(), 8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 8 streams an empirical choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is
I tried powers of two up to 32 and 8 was the fastest one. There wasn't a big difference compared to other 4+ values, though.
Before we merge, mind if I run orc benchmarks on our stuff? I should be able to get these back to you tomorrow. @vuule |
With the above we believe default=on makes sense but we really like having the flag you added @vuule, because it allows us to experiment easily and you never know what pathological cases we may run into. Thank you!! |
/merge |
… to ORC (#14595) Changes in #14295 introduced a synchronization issue in `build_dictionaries`. After stripe_dicts are initialized on the host, we copy them to the device and then launch kernels that read the dicts (device copy). However, after these kernels we deallocate buffers that are not longer needed and clear the dicts' views to these buffers on the host. The problem is that, without synchronization after the H2D copy, the host modification can be done before the H2D copy is performed, and we run the kernels with the altered state. This PR adds a sync point to make sure the copy is done before host-side modification. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Alessandro Bellina (https://github.com/abellina) - Bradley Dice (https://github.com/bdice)
… to ORC (rapidsai#14595) Changes in rapidsai#14295 introduced a synchronization issue in `build_dictionaries`. After stripe_dicts are initialized on the host, we copy them to the device and then launch kernels that read the dicts (device copy). However, after these kernels we deallocate buffers that are not longer needed and clear the dicts' views to these buffers on the host. The problem is that, without synchronization after the H2D copy, the host modification can be done before the H2D copy is performed, and we run the kernels with the altered state. This PR adds a sync point to make sure the copy is done before host-side modification. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Alessandro Bellina (https://github.com/abellina) - Bradley Dice (https://github.com/bdice)
Description
Strings in the dictionary data streams are now sorted alphabetically.
Reduces file size in some cases because compression can be more efficient.
Reduces throughput up to 22% when writing strings columns (3% speedup when dictionary encoding is not used, though!).
Benchmark data does not demonstrate the compression difference, but we have some user data that compresses almost 30% better.
Checklist