Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for list type in ORC writer #8723

Merged
merged 96 commits into from
Jul 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
1bf7e93
fail a bit later :)
vuule May 24, 2021
ed5c971
rename some "id" variables
vuule May 25, 2021
c526d1a
add column_view to orc_column_view
vuule May 26, 2021
6032dfc
preorder flatten on device too
vuule May 27, 2021
ca6c03c
stop using table_view directly - decimal_chunk_sizes
vuule May 27, 2021
e5b25b4
stop using table_view - encode_columns
vuule May 28, 2021
830f445
stop using table_view - init_dictionaries
vuule May 28, 2021
afed17b
style fix
vuule May 28, 2021
838199f
remove `num_columns` because it became ambigous
vuule May 28, 2021
32b94b0
finish removing num_columns
vuule May 28, 2021
938bd2a
dict - allow string column to have different sizes
vuule May 29, 2021
98d6cb1
add orc_table_view
vuule Jun 1, 2021
14e5566
first steps in rowgroup size computation
vuule Jun 2, 2021
31ea8e0
style; remove div_by_rowgroups
vuule Jun 2, 2021
6813822
fix cudf_column; add basic rg size comp
vuule Jun 2, 2021
90e0094
use transform
vuule Jun 2, 2021
a7d7f43
finish rowgroup range computation
vuule Jun 3, 2021
7a722d5
use struct for the range of rows in rowgroup
vuule Jun 3, 2021
e6e6a6d
use rowgroup sizes in init_dictionaries
vuule Jun 3, 2021
da9b9d5
move orc_column_device_view to orc.h
vuule Jun 3, 2021
c6c1a84
use rowgroup ranges in gather_stripe_info
vuule Jun 3, 2021
21c256d
use rowgroup sizes in decimal_chunk_sizes
vuule Jun 5, 2021
9f2a604
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jun 5, 2021
3edc675
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jun 14, 2021
d325b2c
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jun 15, 2021
1c0c66e
start refactoring stream creation
vuule Jun 17, 2021
dd96056
add helper for RLE stream size
vuule Jun 19, 2021
c07ddc0
use enums
vuule Jun 19, 2021
c7bd5b9
don't reuse column stream len for chunk stream len
vuule Jun 19, 2021
05c5120
use full encoded column size in column streams, calc separately in ch…
vuule Jun 19, 2021
ce7e2d7
use rowgroup boundaries in gather_stripes pt 1
vuule Jun 21, 2021
1e021b9
merge 21.08
vuule Jun 22, 2021
1718de9
finish gather_stripes
vuule Jun 22, 2021
502c703
remove last wrong use of row_index_stride_
vuule Jun 22, 2021
6aff602
fix for gather_statistic_blobs w/ nested columns
vuule Jun 23, 2021
1b93b34
nested types statistics
vuule Jun 24, 2021
45d1e70
footer
vuule Jun 24, 2021
7594858
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jun 25, 2021
3189b89
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jun 25, 2021
4753e2c
towards the rowgroup size fix
vuule Jun 26, 2021
8070577
calc only string size for stripes
vuule Jun 29, 2021
b5e4758
rowgroup segmentation POC
vuule Jun 29, 2021
ed09f3c
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jun 29, 2021
ee28f69
recalc rowgroups and use them
vuule Jun 29, 2021
4992709
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jun 30, 2021
1e5369e
list encoding
vuule Jul 1, 2021
b1ebf99
fix stats
vuule Jul 2, 2021
b2ba442
remove incorrect (unused) rowgroup logic
vuule Jul 2, 2021
097b8d6
fix column naming with nested columns
vuule Jul 3, 2021
52b456b
fix rowgroup size for nested string columns
vuule Jul 3, 2021
54f43b1
replace statically sized str dict indices
vuule Jul 5, 2021
d6518d8
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jul 6, 2021
8b64446
use zero initialized buffers for string dictionaries
vuule Jul 8, 2021
bad7557
disable dict encoding for columns with large rowgroups
vuule Jul 9, 2021
3d63e7f
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jul 13, 2021
1b9ee47
missed a file in merge
vuule Jul 13, 2021
3a35d01
tests
vuule Jul 13, 2021
f07a0e0
rename flat_index
vuule Jul 13, 2021
73fc43c
rename d_orc_columns (type already device_span, no need for prefix)
vuule Jul 13, 2021
70dbd6a
style; renames
vuule Jul 13, 2021
76b6c94
bit more renaming
vuule Jul 13, 2021
d06ade9
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jul 13, 2021
d15198e
style fix maybe?
vuule Jul 13, 2021
9f9a765
docs
vuule Jul 13, 2021
4b06752
Merge branch 'fea-orc-write-list' of https://github.com/vuule/cudf in…
vuule Jul 13, 2021
c1eff06
docs fixes
vuule Jul 13, 2021
42856b2
style fix, third attempt
vuule Jul 13, 2021
71cdeb8
kernel clean up
vuule Jul 13, 2021
dc29389
replace map with vector
vuule Jul 13, 2021
dc1af0c
use optional instead of `-1` flag
vuule Jul 14, 2021
091d5d4
tidy up orc_column_view
vuule Jul 14, 2021
83d680e
use uint32_t for indexes; replace raw loops
vuule Jul 14, 2021
573c416
add tests; minor clean up; fix for valid_mask logic
vuule Jul 15, 2021
f052855
fix an issue with valid_buf in columns encode
vuule Jul 15, 2021
7e0121b
review suggestions pt1
vuule Jul 15, 2021
ad6fea8
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jul 17, 2021
1b946b8
isort style fix
vuule Jul 17, 2021
fe5baac
remove optionals from append_orc_column
vuule Jul 17, 2021
12671f9
typo fix
vuule Jul 17, 2021
7faaf75
review feedback 2
vuule Jul 17, 2021
26295d0
add test with many rows in child column (over stripe row limit)
vuule Jul 17, 2021
470cd4e
yet another style fix
vuule Jul 17, 2021
7c319fb
move type_kind out of Stream to stay consistent with the spec
vuule Jul 19, 2021
7855372
make a few lines more readable
vuule Jul 19, 2021
a5d16f1
add nulls to child columns in a test
vuule Jul 19, 2021
bb69c90
refactor tests
vuule Jul 19, 2021
42bcd75
limit reduction ot valid count
vuule Jul 19, 2021
8ecc7a9
noexcept + formatting
vuule Jul 19, 2021
bc0cce7
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jul 19, 2021
b3d47d4
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jul 19, 2021
a48df92
2dvector
vuule Jul 20, 2021
ecd8d22
magic numbers
vuule Jul 20, 2021
0c308aa
add list coverage to ORC benchmarks
vuule Jul 20, 2021
31485b9
Merge branch 'branch-21.08' of https://github.com/rapidsai/cudf into …
vuule Jul 20, 2021
6140691
update type support spreadsheet
vuule Jul 20, 2021
9589cf5
typo fix
vuule Jul 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cpp/benchmarks/io/orc/orc_reader_benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ RD_BENCHMARK_DEFINE_ALL_SOURCES(ORC_RD_BM_INPUTS_DEFINE, integral, type_group_id
RD_BENCHMARK_DEFINE_ALL_SOURCES(ORC_RD_BM_INPUTS_DEFINE, floats, type_group_id::FLOATING_POINT);
RD_BENCHMARK_DEFINE_ALL_SOURCES(ORC_RD_BM_INPUTS_DEFINE, timestamps, type_group_id::TIMESTAMP);
RD_BENCHMARK_DEFINE_ALL_SOURCES(ORC_RD_BM_INPUTS_DEFINE, string, cudf::type_id::STRING);
RD_BENCHMARK_DEFINE_ALL_SOURCES(ORC_RD_BM_INPUTS_DEFINE, list, cudf::type_id::LIST);

BENCHMARK_DEFINE_F(OrcRead, column_selection)
(::benchmark::State& state) { BM_orc_read_varying_options(state); }
Expand Down
1 change: 1 addition & 0 deletions cpp/benchmarks/io/orc/orc_writer_benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ WR_BENCHMARK_DEFINE_ALL_SINKS(ORC_WR_BM_INOUTS_DEFINE, integral, type_group_id::
WR_BENCHMARK_DEFINE_ALL_SINKS(ORC_WR_BM_INOUTS_DEFINE, floats, type_group_id::FLOATING_POINT);
WR_BENCHMARK_DEFINE_ALL_SINKS(ORC_WR_BM_INOUTS_DEFINE, timestamps, type_group_id::TIMESTAMP);
WR_BENCHMARK_DEFINE_ALL_SINKS(ORC_WR_BM_INOUTS_DEFINE, string, cudf::type_id::STRING);
WR_BENCHMARK_DEFINE_ALL_SINKS(ORC_WR_BM_INOUTS_DEFINE, list, cudf::type_id::LIST);

BENCHMARK_DEFINE_F(OrcWrite, writer_options)
(::benchmark::State& state) { BM_orc_write_varying_options(state); }
Expand Down
5 changes: 4 additions & 1 deletion cpp/include/cudf/utilities/span.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -258,11 +258,14 @@ class base_2dspan {
return row * size.second + column;
}

constexpr RowType<T, dynamic_extent> operator[](size_t row)
constexpr RowType<T, dynamic_extent> operator[](size_t row) const
{
return {this->data() + flatten_index(row, 0, this->size()), this->size().second};
}

constexpr RowType<T, dynamic_extent> front() const { return (*this)[0]; }
constexpr RowType<T, dynamic_extent> back() const { return (*this)[size().first - 1]; }

constexpr base_2dspan subspan(size_t first_row, size_t num_rows) const noexcept
{
return base_2dspan(
Expand Down
176 changes: 87 additions & 89 deletions cpp/src/io/orc/dict_enc.cu
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,14 @@ namespace cudf {
namespace io {
namespace orc {
namespace gpu {
constexpr uint32_t max_dict_entries = default_row_index_stride;
constexpr int init_hash_bits = 12;
constexpr int init_hash_bits = 12;

struct dictinit_state_s {
uint32_t nnz;
uint32_t total_dupes;
DictionaryChunk chunk;
volatile uint32_t scratch_red[32];
uint32_t dict[max_dict_entries];
uint32_t* dict;
union {
uint16_t u16[1 << (init_hash_bits)];
uint32_t u32[1 << (init_hash_bits - 1)];
Expand Down Expand Up @@ -113,20 +112,17 @@ static __device__ void LoadNonNullIndices(volatile dictinit_state_s* s,

/**
* @brief Gather all non-NULL string rows and compute total character data size
*
* @param[in] chunks DictionaryChunk device array [rowgroup][column]
* @param[in] num_columns Number of string columns
*/
// blockDim {block_size,1,1}
template <int block_size>
__global__ void __launch_bounds__(block_size, 2)
gpuInitDictionaryIndices(DictionaryChunk* chunks,
const table_device_view view,
uint32_t* dict_data,
uint32_t* dict_index,
size_t row_index_stride,
size_type* str_col_ids,
uint32_t num_columns)
gpuInitDictionaryIndices(device_2dspan<DictionaryChunk> chunks,
device_span<orc_column_device_view const> orc_columns,
device_span<device_span<uint32_t>> dict_data,
device_span<device_span<uint32_t>> dict_index,
device_span<device_span<uint32_t>> tmp_indices,
device_2dspan<rowgroup_rows const> rowgroup_bounds,
device_span<uint32_t const> str_col_indexes)
{
__shared__ __align__(16) dictinit_state_s state_g;

Expand All @@ -139,22 +135,23 @@ __global__ void __launch_bounds__(block_size, 2)
} temp_storage;

dictinit_state_s* const s = &state_g;
uint32_t col_id = blockIdx.x;
uint32_t group_id = blockIdx.y;
// Index of the column in the `str_col_indexes` array
uint32_t const str_col_idx = blockIdx.x;
vuule marked this conversation as resolved.
Show resolved Hide resolved
// Index of the column in the `orc_columns` array
auto const col_idx = str_col_indexes[str_col_idx];
uint32_t group_id = blockIdx.y;
auto const num_str_cols = str_col_indexes.size();
uint32_t nnz, start_row, dict_char_count;
int t = threadIdx.x;

if (t == 0) {
column_device_view* leaf_column_view = view.begin() + str_col_ids[col_id];
s->chunk = chunks[group_id * num_columns + col_id];
s->chunk.leaf_column = leaf_column_view;
s->chunk.dict_data =
dict_data + col_id * leaf_column_view->size() + group_id * row_index_stride;
s->chunk.dict_index = dict_index + col_id * leaf_column_view->size();
s->chunk.start_row = group_id * row_index_stride;
s->chunk.num_rows =
min(row_index_stride,
max(static_cast<size_t>(leaf_column_view->size() - s->chunk.start_row), size_t{0}));
s->chunk = chunks[group_id][str_col_idx];
s->chunk.leaf_column = &orc_columns[col_idx].cudf_column;
s->chunk.dict_data = dict_data[str_col_idx].data() + rowgroup_bounds[group_id][col_idx].begin;
s->chunk.dict_index = dict_index[str_col_idx].data();
s->chunk.start_row = rowgroup_bounds[group_id][col_idx].begin;
s->chunk.num_rows = rowgroup_bounds[group_id][col_idx].size();
s->dict = tmp_indices[str_col_idx].data() + s->chunk.start_row;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for some intermediate results when the string dictionaries are built.
I don't know all details (yet), definitely need to dig into this part, add comments, and improve naming (e.g. multiple things are called "dict index").

}
for (uint32_t i = 0; i < sizeof(s->map) / sizeof(uint32_t); i += block_size) {
if (i + t < sizeof(s->map) / sizeof(uint32_t)) s->map.u32[i + t] = 0;
Expand All @@ -168,9 +165,9 @@ __global__ void __launch_bounds__(block_size, 2)
s->chunk.string_char_count = 0;
s->total_dupes = 0;
}
nnz = s->nnz;
dict_data = s->chunk.dict_data;
start_row = s->chunk.start_row;
nnz = s->nnz;
auto t_dict_data = s->chunk.dict_data;
start_row = s->chunk.start_row;
for (uint32_t i = 0; i < nnz; i += block_size) {
uint32_t ck_row = 0;
uint32_t hash = 0;
Expand All @@ -185,7 +182,7 @@ __global__ void __launch_bounds__(block_size, 2)
if (t == 0) s->chunk.string_char_count += len;
if (i + t < nnz) {
atomicAdd(&s->map.u32[hash >> 1], 1 << ((hash & 1) ? 16 : 0));
dict_data[i + t] = start_row + ck_row;
t_dict_data[i + t] = start_row + ck_row;
}
__syncthreads();
}
Expand Down Expand Up @@ -216,10 +213,13 @@ __global__ void __launch_bounds__(block_size, 2)
}
// Put the indices back in hash order
for (uint32_t i = 0; i < nnz; i += block_size) {
uint32_t ck_row = 0, pos = 0, hash = 0, pos_old, pos_new, sh, colliding_row;
bool collision;
uint32_t ck_row = 0;
uint32_t hash = 0;
uint32_t pos = 0;
uint32_t pos_old = 0;
uint32_t sh = 0;
if (i + t < nnz) {
ck_row = dict_data[i + t] - start_row;
ck_row = t_dict_data[i + t] - start_row;
string_view string_val = s->chunk.leaf_column->element<string_view>(ck_row + start_row);
hash = hash_string(string_val);
sh = (hash & 1) ? 16 : 0;
Expand All @@ -233,7 +233,9 @@ __global__ void __launch_bounds__(block_size, 2)
s->dict[pos] = ck_row;
}
__syncthreads();
collision = false;
bool collision = false;
uint32_t colliding_row = 0;
uint32_t pos_new = 0;
if (i + t < nnz) {
pos_new = s->map.u16[hash];
collision = (pos != pos_old && pos_new > pos_old + 1);
Expand Down Expand Up @@ -273,7 +275,7 @@ __global__ void __launch_bounds__(block_size, 2)
if (!t) { s->total_dupes += dupes_in_block; }
if (i + t < nnz) {
if (!is_dupe) {
dict_data[i + t - dupes_before] = ck_row + start_row;
t_dict_data[i + t - dupes_before] = ck_row + start_row;
} else {
s->chunk.dict_index[ck_row + start_row] = (ck_row_ref + start_row) | (1u << 31);
}
Expand All @@ -283,16 +285,16 @@ __global__ void __launch_bounds__(block_size, 2)
// while making any future changes.
dict_char_count = block_reduce(temp_storage.reduce_storage).Sum(dict_char_count);
if (!t) {
chunks[group_id * num_columns + col_id].num_strings = nnz;
chunks[group_id * num_columns + col_id].string_char_count = s->chunk.string_char_count;
chunks[group_id * num_columns + col_id].num_dict_strings = nnz - s->total_dupes;
chunks[group_id * num_columns + col_id].dict_char_count = dict_char_count;
chunks[group_id * num_columns + col_id].leaf_column = s->chunk.leaf_column;
chunks[group_id][str_col_idx].num_strings = nnz;
chunks[group_id][str_col_idx].string_char_count = s->chunk.string_char_count;
chunks[group_id][str_col_idx].num_dict_strings = nnz - s->total_dupes;
chunks[group_id][str_col_idx].dict_char_count = dict_char_count;
chunks[group_id][str_col_idx].leaf_column = s->chunk.leaf_column;

chunks[group_id * num_columns + col_id].dict_data = s->chunk.dict_data;
chunks[group_id * num_columns + col_id].dict_index = s->chunk.dict_index;
chunks[group_id * num_columns + col_id].start_row = s->chunk.start_row;
chunks[group_id * num_columns + col_id].num_rows = s->chunk.num_rows;
chunks[group_id][str_col_idx].dict_data = s->chunk.dict_data;
chunks[group_id][str_col_idx].dict_index = s->chunk.dict_index;
chunks[group_id][str_col_idx].start_row = s->chunk.start_row;
chunks[group_id][str_col_idx].num_rows = s->chunk.num_rows;
}
}

Expand All @@ -305,9 +307,8 @@ __global__ void __launch_bounds__(block_size, 2)
*/
// blockDim {1024,1,1}
extern "C" __global__ void __launch_bounds__(1024)
gpuCompactChunkDictionaries(StripeDictionary* stripes,
DictionaryChunk const* chunks,
uint32_t num_columns)
gpuCompactChunkDictionaries(device_2dspan<StripeDictionary> stripes,
device_2dspan<DictionaryChunk const> chunks)
{
__shared__ __align__(16) StripeDictionary stripe_g;
__shared__ __align__(16) DictionaryChunk chunk_g;
Expand All @@ -321,16 +322,16 @@ extern "C" __global__ void __launch_bounds__(1024)
const uint32_t* src;
uint32_t* dst;

if (t == 0) stripe_g = stripes[stripe_id * num_columns + col_id];
if (t == 0) stripe_g = stripes[stripe_id][col_id];
__syncthreads();
if (!stripe_g.dict_data) { return; }
if (t == 0) chunk_g = chunks[stripe_g.start_chunk * num_columns + col_id];
if (t == 0) chunk_g = chunks[stripe_g.start_chunk][col_id];
__syncthreads();
dst = stripe_g.dict_data + chunk_g.num_dict_strings;
for (uint32_t g = 1; g < stripe_g.num_chunks; g++) {
if (!t) {
src = chunks[(stripe_g.start_chunk + g) * num_columns + col_id].dict_data;
chunk_len = chunks[(stripe_g.start_chunk + g) * num_columns + col_id].num_dict_strings;
src = chunks[stripe_g.start_chunk + g][col_id].dict_data;
chunk_len = chunks[stripe_g.start_chunk + g][col_id].num_dict_strings;
ck_curptr_g = src;
ck_curlen_g = chunk_len;
}
Expand Down Expand Up @@ -365,7 +366,7 @@ struct build_state_s {
// blockDim {1024,1,1}
template <int block_size>
__global__ void __launch_bounds__(block_size)
gpuBuildStripeDictionaries(StripeDictionary* stripes, uint32_t num_columns)
gpuBuildStripeDictionaries(device_2dspan<StripeDictionary> stripes)
{
__shared__ __align__(16) build_state_s state_g;
using block_reduce = cub::BlockReduce<uint32_t, block_size>;
Expand All @@ -383,7 +384,7 @@ __global__ void __launch_bounds__(block_size)
uint32_t dict_char_count;
int t = threadIdx.x;

if (t == 0) s->stripe = stripes[stripe_id * num_columns + col_id];
if (t == 0) s->stripe = stripes[stripe_id][col_id];
if (t == 31 * 32) { s->total_dupes = 0; }
__syncthreads();
num_strings = s->stripe.num_strings;
Expand Down Expand Up @@ -419,63 +420,60 @@ __global__ void __launch_bounds__(block_size)
}
dict_char_count = block_reduce(temp_storage.reduce_storage).Sum(dict_char_count);
if (t == 0) {
stripes[stripe_id * num_columns + col_id].num_strings = num_strings - s->total_dupes;
stripes[stripe_id * num_columns + col_id].dict_char_count = dict_char_count;
stripes[stripe_id][col_id].num_strings = num_strings - s->total_dupes;
stripes[stripe_id][col_id].dict_char_count = dict_char_count;
}
}

/**
* @copydoc cudf::io::orc::gpu::InitDictionaryIndices
*/
void InitDictionaryIndices(const table_device_view& view,
DictionaryChunk* chunks,
uint32_t* dict_data,
uint32_t* dict_index,
size_t row_index_stride,
size_type* str_col_ids,
uint32_t num_columns,
uint32_t num_rowgroups,
void InitDictionaryIndices(device_span<orc_column_device_view const> orc_columns,
device_2dspan<DictionaryChunk> chunks,
device_span<device_span<uint32_t>> dict_data,
device_span<device_span<uint32_t>> dict_index,
device_span<device_span<uint32_t>> tmp_indices,
device_2dspan<rowgroup_rows const> rowgroup_bounds,
device_span<uint32_t const> str_col_indexes,
rmm::cuda_stream_view stream)
{
static constexpr int block_size = 512;
dim3 dim_block(block_size, 1);
dim3 dim_grid(num_columns, num_rowgroups);
dim3 dim_grid(str_col_indexes.size(), rowgroup_bounds.size().first);
gpuInitDictionaryIndices<block_size><<<dim_grid, dim_block, 0, stream.value()>>>(
chunks, view, dict_data, dict_index, row_index_stride, str_col_ids, num_columns);
chunks, orc_columns, dict_data, dict_index, tmp_indices, rowgroup_bounds, str_col_indexes);
}

/**
* @copydoc cudf::io::orc::gpu::BuildStripeDictionaries
*/
void BuildStripeDictionaries(StripeDictionary* stripes,
StripeDictionary* stripes_host,
DictionaryChunk const* chunks,
uint32_t num_stripes,
uint32_t num_rowgroups,
uint32_t num_columns,
void BuildStripeDictionaries(device_2dspan<StripeDictionary> d_stripes_dicts,
host_2dspan<StripeDictionary const> h_stripe_dicts,
device_2dspan<DictionaryChunk const> chunks,
rmm::cuda_stream_view stream)
{
auto const num_stripes = h_stripe_dicts.size().first;
auto const num_columns = h_stripe_dicts.size().second;

dim3 dim_block(1024, 1); // 1024 threads per chunk
dim3 dim_grid_build(num_columns, num_stripes);
gpuCompactChunkDictionaries<<<dim_grid_build, dim_block, 0, stream.value()>>>(
stripes, chunks, num_columns);
for (uint32_t i = 0; i < num_stripes * num_columns; i++) {
if (stripes_host[i].dict_data != nullptr) {
thrust::device_ptr<uint32_t> dict_data_ptr =
thrust::device_pointer_cast(stripes_host[i].dict_data);
column_device_view* string_column = stripes_host[i].leaf_column;
// NOTE: Requires the --expt-extended-lambda nvcc flag
thrust::sort(rmm::exec_policy(stream),
dict_data_ptr,
dict_data_ptr + stripes_host[i].num_strings,
[string_column] __device__(const uint32_t& lhs, const uint32_t& rhs) {
return string_column->element<string_view>(lhs) <
string_column->element<string_view>(rhs);
});
gpuCompactChunkDictionaries<<<dim_grid_build, dim_block, 0, stream.value()>>>(d_stripes_dicts,
chunks);
for (uint32_t stripe_idx = 0; stripe_idx < num_stripes; ++stripe_idx) {
for (auto const& stripe_dict : h_stripe_dicts[stripe_idx]) {
if (stripe_dict.dict_data != nullptr) {
auto const dict_data_ptr = thrust::device_pointer_cast(stripe_dict.dict_data);
auto const string_column = stripe_dict.leaf_column;
// NOTE: Requires the --expt-extended-lambda nvcc flag
thrust::sort(rmm::exec_policy(stream),
dict_data_ptr,
dict_data_ptr + stripe_dict.num_strings,
[string_column] __device__(const uint32_t& lhs, const uint32_t& rhs) {
return string_column->element<string_view>(lhs) <
string_column->element<string_view>(rhs);
});
}
}
}
gpuBuildStripeDictionaries<1024>
<<<dim_grid_build, dim_block, 0, stream.value()>>>(stripes, num_columns);
<<<dim_grid_build, dim_block, 0, stream.value()>>>(d_stripes_dicts);
}

} // namespace gpu
Expand Down
20 changes: 19 additions & 1 deletion cpp/src/io/orc/orc.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "orc_common.h"

#include <io/comp/io_uncomp.h>
#include <cudf/column/column_device_view.cuh>
#include <cudf/io/datasource.hpp>
#include <cudf/io/orc_metadata.hpp>
#include <cudf/utilities/error.hpp>
Expand Down Expand Up @@ -82,7 +83,7 @@ struct FileFooter {
struct Stream {
StreamKind kind = INVALID_STREAM_KIND;
std::optional<uint32_t> column_id; // ORC column id (different from column index in the table!)
uint64_t length = 0; // the number of bytes in the file
uint64_t length = 0; // the number of bytes in the stream

// Returns index of the column in the table, if any
// Stream of the 'column 0' does not have a corresponding column in the table
Expand Down Expand Up @@ -609,6 +610,23 @@ class metadata {
mutable std::vector<std::string> column_names;
};

/**
* @brief `column_device_view` and additional, ORC specific, information on the column.
*/
struct orc_column_device_view {
column_device_view cudf_column;
thrust::optional<uint32_t> parent_index;
};

/**
* @brief Range of rows within a single rowgroup.
*/
struct rowgroup_rows {
size_type begin;
size_type end;
constexpr auto size() const noexcept { return end - begin; }
};

} // namespace orc
} // namespace io
} // namespace cudf
Loading