Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process parquet bools with microkernels #17157

Merged
merged 66 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from 65 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
b5ec22e
work in progress
pmattione-nvidia Aug 12, 2024
2ca9618
Further work in list code
pmattione-nvidia Aug 16, 2024
4b5f91a
Tests working
pmattione-nvidia Aug 27, 2024
ead17b8
Revert page_decode changes
pmattione-nvidia Aug 28, 2024
cc32409
Merge branch 'branch-24.10' into parquet_list_kernel
pmattione-nvidia Aug 28, 2024
0dccec5
Add debugging
pmattione-nvidia Sep 5, 2024
e239e79
Tests working
pmattione-nvidia Sep 7, 2024
8f25453
Merge branch 'branch-24.10' into parquet_list_kernel
pmattione-nvidia Sep 7, 2024
24c9ab1
compile fixes
pmattione-nvidia Sep 9, 2024
342c2f4
No need to decode def levels if not nullable
pmattione-nvidia Sep 10, 2024
50bbc94
Manual block scan
pmattione-nvidia Sep 10, 2024
5390661
Optimize parquet reader block scans, simplify and consolidate non-nul…
pmattione-nvidia Sep 18, 2024
3ef7b0d
tweak syncing
pmattione-nvidia Sep 18, 2024
7882879
small tweaks
pmattione-nvidia Sep 18, 2024
8852839
Merge branch 'branch-24.10' into parquet_list_kernel
pmattione-nvidia Sep 18, 2024
e285fbf
Add skipping to rle_stream, use for lists (chunked reads)
pmattione-nvidia Sep 23, 2024
254f3e9
tweak scan interface for linked lists
pmattione-nvidia Sep 24, 2024
18d989c
Merge branch 'branch-24.12' into mukernels_fixedwidth_optimize
pmattione-nvidia Sep 25, 2024
8ea1e0e
style fixes
pmattione-nvidia Sep 25, 2024
326b386
Merge branch 'mukernels_fixedwidth_optimize' of https://github.com/pm…
pmattione-nvidia Sep 25, 2024
41cb982
Update cpp/src/io/parquet/decode_fixed.cu
pmattione-nvidia Sep 26, 2024
6e70554
Update cpp/src/io/parquet/decode_fixed.cu
pmattione-nvidia Sep 26, 2024
9ad4415
Update cpp/src/io/parquet/decode_fixed.cu
pmattione-nvidia Sep 26, 2024
3a1fc95
Unroll block-count loop
pmattione-nvidia Sep 26, 2024
0babf46
Merge branch 'mukernels_fixedwidth_optimize' of https://github.com/pm…
pmattione-nvidia Sep 26, 2024
5ab9829
more style fixes
pmattione-nvidia Sep 26, 2024
310d50c
Merge branch 'branch-24.12' into mukernels_fixedwidth_optimize
pmattione-nvidia Sep 27, 2024
4471022
Disable manual block scan for non-lists
pmattione-nvidia Oct 2, 2024
c0ed2cb
Update cpp/src/io/parquet/decode_fixed.cu
pmattione-nvidia Oct 4, 2024
c2139ef
Merge branch 'mukernels_fixedwidth_optimize' of https://github.com/pm…
pmattione-nvidia Oct 4, 2024
b898cba
Style fixes
pmattione-nvidia Oct 4, 2024
b0ee9fc
renaming
pmattione-nvidia Oct 7, 2024
36d026e
Merge branch 'mukernels_fixedwidth_optimize' into parquet_list_kernel
pmattione-nvidia Oct 7, 2024
4b7d1df
minor tweaks
pmattione-nvidia Oct 7, 2024
b36b3b2
delete some debug printing
pmattione-nvidia Oct 7, 2024
5b15704
Remove more prints
pmattione-nvidia Oct 7, 2024
e84af82
cleanup
pmattione-nvidia Oct 8, 2024
f200748
cleanup comments
pmattione-nvidia Oct 8, 2024
3fc76ee
style changes
pmattione-nvidia Oct 8, 2024
8c58d2b
Merge branch 'rapidsai:branch-24.12' into parquet_list_kernel
pmattione-nvidia Oct 8, 2024
ae8e193
Merge branch 'parquet_list_kernel' of https://github.com/pmattione-nv…
pmattione-nvidia Oct 8, 2024
edc56bd
constify variables
pmattione-nvidia Oct 11, 2024
e51406c
revert cmakelists change
pmattione-nvidia Oct 11, 2024
0237e5c
Merge branch 'branch-24.12' into parquet_list_kernel
pmattione-nvidia Oct 11, 2024
07ffbf2
Update cpp/src/io/parquet/rle_stream.cuh
pmattione-nvidia Oct 18, 2024
32fe8b9
refactor rle_stream
pmattione-nvidia Oct 18, 2024
d0ba422
first pass at bool decode, is working
pmattione-nvidia Oct 22, 2024
031ac6b
Use divide function
pmattione-nvidia Oct 23, 2024
50a2283
Add bool to parquet benchmark tests
pmattione-nvidia Oct 23, 2024
a82ae40
Merge branch 'branch-24.12' into parquet_list_kernel
pmattione-nvidia Oct 23, 2024
86b6074
style fixes
pmattione-nvidia Oct 23, 2024
99b9f0c
Merge branch 'branch-24.12' into mukernels_bools
pmattione-nvidia Oct 23, 2024
db15506
Merge remote-tracking branch 'origin/parquet_list_kernel' into mukern…
pmattione-nvidia Oct 24, 2024
d914303
bool list working
pmattione-nvidia Oct 24, 2024
4576f89
style fixes
pmattione-nvidia Oct 24, 2024
e6b98c5
more style fixes
pmattione-nvidia Oct 24, 2024
c9154ef
remove extra encoding
pmattione-nvidia Oct 24, 2024
86ade66
Merge branch 'branch-24.12' into mukernels_bools
pmattione-nvidia Oct 29, 2024
c039805
fix merge issues
pmattione-nvidia Oct 29, 2024
388cdbe
Reduce kernel boilerplate with switch
pmattione-nvidia Oct 30, 2024
b877ba3
Nuke more boilerplate code
pmattione-nvidia Oct 30, 2024
8984cce
Update cpp/src/io/parquet/decode_fixed.cu
pmattione-nvidia Oct 31, 2024
a0a5060
fix style
pmattione-nvidia Oct 31, 2024
9e46ddd
Update cpp/src/io/parquet/decode_fixed.cu
pmattione-nvidia Nov 6, 2024
5840ceb
Merge branch 'branch-24.12' into mukernels_bools
pmattione-nvidia Nov 6, 2024
4cccdbd
Merge branch 'branch-24.12' into mukernels_bools
pmattione-nvidia Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cpp/benchmarks/io/nvbench_helpers.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ enum class data_type : int32_t {
INTEGRAL = static_cast<int32_t>(type_group_id::INTEGRAL),
INTEGRAL_SIGNED = static_cast<int32_t>(type_group_id::INTEGRAL_SIGNED),
FLOAT = static_cast<int32_t>(type_group_id::FLOATING_POINT),
BOOL8 = static_cast<int32_t>(cudf::type_id::BOOL8),
pmattione-nvidia marked this conversation as resolved.
Show resolved Hide resolved
DECIMAL = static_cast<int32_t>(type_group_id::FIXED_POINT),
TIMESTAMP = static_cast<int32_t>(type_group_id::TIMESTAMP),
DURATION = static_cast<int32_t>(type_group_id::DURATION),
Expand All @@ -44,6 +45,7 @@ NVBENCH_DECLARE_ENUM_TYPE_STRINGS(
case data_type::INTEGRAL: return "INTEGRAL";
case data_type::INTEGRAL_SIGNED: return "INTEGRAL_SIGNED";
case data_type::FLOAT: return "FLOAT";
case data_type::BOOL8: return "BOOL8";
case data_type::DECIMAL: return "DECIMAL";
case data_type::TIMESTAMP: return "TIMESTAMP";
case data_type::DURATION: return "DURATION";
Expand Down
2 changes: 2 additions & 0 deletions cpp/benchmarks/io/parquet/parquet_reader_input.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ void BM_parquet_read_io_compression(nvbench::state& state)
{
auto const d_type = get_type_or_group({static_cast<int32_t>(data_type::INTEGRAL),
static_cast<int32_t>(data_type::FLOAT),
static_cast<int32_t>(data_type::BOOL8),
static_cast<int32_t>(data_type::DECIMAL),
static_cast<int32_t>(data_type::TIMESTAMP),
static_cast<int32_t>(data_type::DURATION),
Expand Down Expand Up @@ -298,6 +299,7 @@ void BM_parquet_read_wide_tables_mixed(nvbench::state& state)

using d_type_list = nvbench::enum_type_list<data_type::INTEGRAL,
data_type::FLOAT,
data_type::BOOL8,
data_type::DECIMAL,
data_type::TIMESTAMP,
data_type::DURATION,
Expand Down
3 changes: 2 additions & 1 deletion cpp/benchmarks/io/parquet/parquet_reader_options.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
* Copyright (c) 2022-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -66,6 +66,7 @@ void BM_parquet_read_options(nvbench::state& state,
auto const data_types =
dtypes_for_column_selection(get_type_or_group({static_cast<int32_t>(data_type::INTEGRAL),
static_cast<int32_t>(data_type::FLOAT),
static_cast<int32_t>(data_type::BOOL8),
static_cast<int32_t>(data_type::DECIMAL),
static_cast<int32_t>(data_type::TIMESTAMP),
static_cast<int32_t>(data_type::DURATION),
Expand Down
3 changes: 3 additions & 0 deletions cpp/benchmarks/io/parquet/parquet_writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ void BM_parq_write_io_compression(
{
auto const data_types = get_type_or_group({static_cast<int32_t>(data_type::INTEGRAL),
static_cast<int32_t>(data_type::FLOAT),
static_cast<int32_t>(data_type::BOOL8),
static_cast<int32_t>(data_type::DECIMAL),
static_cast<int32_t>(data_type::TIMESTAMP),
static_cast<int32_t>(data_type::DURATION),
Expand Down Expand Up @@ -143,6 +144,7 @@ void BM_parq_write_varying_options(

auto const data_types = get_type_or_group({static_cast<int32_t>(data_type::INTEGRAL_SIGNED),
static_cast<int32_t>(data_type::FLOAT),
static_cast<int32_t>(data_type::BOOL8),
static_cast<int32_t>(data_type::DECIMAL),
static_cast<int32_t>(data_type::TIMESTAMP),
static_cast<int32_t>(data_type::DURATION),
Expand Down Expand Up @@ -181,6 +183,7 @@ void BM_parq_write_varying_options(

using d_type_list = nvbench::enum_type_list<data_type::INTEGRAL,
data_type::FLOAT,
data_type::BOOL8,
data_type::DECIMAL,
data_type::TIMESTAMP,
data_type::DURATION,
Expand Down
Loading
Loading