Skip to content

Commit

Permalink
Parquet writer column_size() should return a size_t (#12870)
Browse files Browse the repository at this point in the history
Fixes #12867.

Bug introduced in #12685. A calculation of total bytes in a column was returned in a 32-bit `size_type` rather than 64-bit `size_t` leading to overflow for tables with many millions of rows.

Authors:
  - Ed Seidl (https://github.com/etseidl)
  - Vukasin Milovanovic (https://github.com/vuule)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Karthikeyan (https://github.com/karthikeyann)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #12870
  • Loading branch information
etseidl authored Mar 1, 2023
1 parent 195e2f7 commit 40e56c9
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions cpp/src/io/parquet/writer_impl.cu
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ parquet::Compression to_parquet_compression(compression_type compression)
}
}

size_type column_size(column_view const& column, rmm::cuda_stream_view stream)
size_t column_size(column_view const& column, rmm::cuda_stream_view stream)
{
if (column.size() == 0) { return 0; }

Expand All @@ -99,7 +99,7 @@ size_type column_size(column_view const& column, rmm::cuda_stream_view stream)
cudf::detail::get_value<size_type>(scol.offsets(), 0, stream);
} else if (column.type().id() == type_id::STRUCT) {
auto const scol = structs_column_view(column);
size_type ret = 0;
size_t ret = 0;
for (int i = 0; i < scol.num_children(); i++) {
ret += column_size(scol.get_sliced_child(i), stream);
}
Expand Down

0 comments on commit 40e56c9

Please sign in to comment.