Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes issue with null struct columns in ORC reader #8819

Merged
merged 17 commits into from
Jul 22, 2021
Merged
14 changes: 9 additions & 5 deletions cpp/src/io/orc/orc.h
Original file line number Diff line number Diff line change
Expand Up @@ -539,10 +539,7 @@ class OrcDecompressor {
};

/**
* @brief Stores orc id for each column and its adjacent number of children
* in case of struct or number of children in case of list column.
* If list column has struct column, then all child columns of that struct are treated as child
* column of list.
* @brief Stores orc id for each column and number of children in that column.
*
* @code{.pseudo}
* Consider following data where a struct has two members and a list column
Expand All @@ -559,11 +556,18 @@ class OrcDecompressor {
*
*/
struct orc_column_meta {
// orc_column_meta(uint32_t _id, uint32_t _num_children) : id(_id), num_children(_num_children){};
uint32_t id; // orc id for the column
uint32_t num_children; // number of children at the same level of nesting in case of struct
};

/**
* @brief Stores column's validity map and null count
*/
struct column_validity_info {
uint32_t* valid_map_base;
uint32_t null_count;
};

/**
* @brief A helper class for ORC file metadata. Provides some additional
* convenience methods for initializing and accessing metadata.
Expand Down
10 changes: 10 additions & 0 deletions cpp/src/io/orc/orc_gpu.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#include <cudf/utilities/span.hpp>
#include <io/statistics/statistics.cuh>
#include <io/utilities/column_buffer.hpp>
#include "orc.h"
#include "orc_common.h"

#include <rmm/cuda_stream_view.hpp>
Expand Down Expand Up @@ -85,6 +86,14 @@ struct DictionaryEntry {
*/
constexpr int orc_decimal2float64_scale = 0x80;

/**
* Struct to store parent column mask and null count
*/
struct ParentColumnData {
rgsl888prabhu marked this conversation as resolved.
Show resolved Hide resolved
uint32_t* valid_map_base;
uint32_t null_count;
};
rgsl888prabhu marked this conversation as resolved.
Show resolved Hide resolved

/**
* @brief Struct to describe per stripe's column information
*/
Expand All @@ -109,6 +118,7 @@ struct ColumnDesc {
uint8_t dtype_len; // data type length (for types that can be mapped to different sizes)
int32_t decimal_scale; // number of fractional decimal digits for decimal type
int32_t ts_clock_rate; // output timestamp clock frequency (0=default, 1000=ms, 1000000000=ns)
column_validity_info parent_validity_info; // consists of parent column valid_map and null count
};

/**
Expand Down
Loading