-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet decimal128 support #9706
Closed
Closed
Changes from 1 commit
Commits
Show all changes
138 commits
Select commit
Hold shift + click to select a range
c8a171c
Initial changes
afe6ec6
More changes
43b615a
Small cleanup
ebedcad
Small cleanup
1d2e0b4
Removal of device_storage_type_id, formatting and more
2ea39fe
Formatting
606d6e3
`cudf::round` support for `__int128_t`
ee70203
Enable tests & fixes
fd6157b
Missing changes
d4506af
Scan, column_wrapper, orc, etc
791e91c
Binop changes
ad5fe35
detail::to_string
7cc9db1
Aggregation changes
5dd6874
Small fix in fixed_point.hpp
a89f958
Enable quantile
a16a2b8
Comment update
e89a9ba
REDUCTION_TEST working changes
7ef28bf
ROLLING_TEST changes
7fd4ac4
Initial changes for STRINGS_TEST
016c35a
STRINGS changes
dbd0504
Clean up
9c764e6
Merge remote-tracking branch 'upstream/branch-21.10' into decimal128
bf34d20
std::is_same_v
103a4db
is_integral & is_arithmetic
575fca7
Clean up
8549753
Fixes / cleanup
22de55a
DECIMAL128 custom reduction tests
5b69c0c
Another REDUCTION test
95667c8
numeric_limits / temporary cleanup
825ab86
More changes, 10+ files
f6c0938
Merge branch 'branch-21.10' into decimal128
321761c
Fix for TRANSFORM_TEST
02b0044
Rename FixedPointTestBothReps
95a107c
test group_by for only decimal32/64
0d8aa36
Using cuda::std:: for utility functions
73b3682
cudf::fill(_in_place) fix for decimal128
bcd1836
Remove TODOs
84f394b
Initial string conversion changes
754156a
Merge branch 'branch-21.10' into decimal128
7031551
Final string changes
ea97b9d
Enhance casting tests for decimal128
b98290c
Merge branch 'branch-21.10' into decimal128
363e0ed
Merge branch 'branch-21.10' into decimal128
655ccee
Merge conflict fixes
2a894bd
Missed STRINGS fixes
d881321
Enhance STRINGS_TEST
1380a0c
Enhance ROUND tests
b5d4493
Fix FIXED_POINT_TESTs
8715196
Enhance GROUPBY_TEST for decimal128
7952e90
Delete commented out code
3115666
Merge branch 'branch-21.10' into decimal128
932747e
Merge branch 'branch-21.10' into decimal128
10d58a3
Support hash groupby decimal128 (by making is sort) - initial change
60ce655
has_atomic_support
28aca7d
TEMPORARY - will revert later
4b52596
Merge branch 'branch-21.10' into decimal128
2951b2f
Merge branch 'branch-21.10' into decimal128
b515a93
Merge branch 'branch-21.10' into decimal128
fe446a4
Block group_by mean for decimal types
39d2573
Merge branch 'branch-21.10' into decimal128
efd0b62
Revert non-comprehensive fix
c52769a
Merge branch 'branch-21.10' into decimal128
5622a84
binary op changes
5ebd1bb
add checks to jit binary op
cb4e389
Final changes for binary ops
4c81f57
Add more binop tests
58b23cd
Temporary fix for chrono groupby min_tests
1f3284f
decimal128 comparision tests
7713bc4
Enhance decimal128 comparison tests
2de00b8
small cleanup
ea36188
cleanup
c7c0d9d
Merge branch 'decimal128' of https://gitlab-master.nvidia.com/choekst…
3bf389b
Merge branch 'branch-21.12' into decimal128
d093ae8
Fix rounding issues with DECIMAL128
revans2 4d82d30
Merge branch 'decimal128_round' into 'decimal128'
7eedaea
Use numeric::detail::abs in round.cu
892df4f
Merge branch 'branch-21.12' into decimal128
codereport a810927
Add cuda:: and if constexpr check
codereport 9286b43
Clang format :)
codereport 4ad26f4
Cleanup
codereport 3892e73
Cleanup
codereport 8e9bd90
Missing clang-format
codereport 41cc23a
digits10
codereport 921ff12
Clean up
codereport a5e4187
IO changes
codereport d87c9d4
Fix and partial test updates
codereport 3b9a611
Clean up
codereport 5bab167
Update libcudacxx
codereport a4c03e5
Fixing OrcWriterTestDecimal.Decimal64 test
codereport 976fb74
Fix rest of ORC_TEST
codereport c9c7250
ORC changes for decimal128
codereport 46bd2d8
ORC fixes for decima128
codereport 8a86d76
Binary op changes / GROUPBY_TEST working
codereport e54d3fa
Test for blog
codereport 85c52ad
Merge branch 'branch-21.12' into decimal128
codereport 92694b8
Merge conflict fix
codereport 44d0573
Temporary fix
codereport 99a82ee
Update CONTRIBUTING.md
codereport c061a54
Merge branch 'branch-21.12' into decimal128
codereport 99ad08b
Temporary
codereport abcc4db
Merge branch 'branch-21.12' into decimal128
codereport 95a2402
Sum Aggregation uses same type for accumulator
codereport 5ecd793
ORC changes
codereport f55e050
Full ORC fix
codereport 216385a
clang-format
codereport 67858b6
Merge branch 'branch-21.12' into decimal128
codereport 7ba47c7
Reapply temporary fix
codereport 1034057
Perf improvement for rescale
codereport d6e9ee8
default to dec64;make1128 slectable;fix tests;add options test
vuule 4411d8e
use paths for decimal types API; iron out generated column names
vuule 61b3677
small clean up
vuule fb04067
Merge branch 'branch-21.12' into decimal128
codereport 7c01f21
ROLLING_TEST fix
codereport 63a0004
clang-format
codereport d3c589c
Update meta.yaml
codereport 27a2e58
Cmake formatting
codereport 9e2184f
Cleaning up has_atomic_support
codereport 8634dea
Cleanup
codereport 4b5dbe2
Use has_atomic_support
codereport 420abcc
Merge branch 'branch-21.12' into decimal128
codereport 860bcbb
Fix silent failure
codereport 89004c7
docs cleanup
codereport 12e5b20
Cleanup
codereport 46368f3
Merge branch 'branch-21.12' into decimal128
codereport 3ef6a09
Additional decimal128 string tests
codereport 287cfaf
Merge branch 'branch-21.12' into decimal128
codereport ec8e74a
count_digits
codereport e365080
final string changes
codereport 0b4fd80
Merge branch 'branch-21.12' into decimal128
codereport a0d5d0c
use enable_if
codereport dd37950
clang-format
codereport fc4c1d1
Fix fix
codereport 08da157
Cleanup
codereport c23038b
Merge branch 'branch-21.12' into decimal128
codereport 201a091
is_chrono min/max identity
codereport f0afd8d
Use exp10
codereport 95ee95c
clang-format
codereport 0b7c32e
Writer changes
devavret File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -463,6 +463,58 @@ TEST_F(ParquetWriterTest, MultiColumn) | |
cudf::test::expect_metadata_equal(expected_metadata, result.metadata); | ||
} | ||
|
||
TEST_F(ParquetWriterTest, DecimalColumns) | ||
{ | ||
constexpr auto num_rows = 5; | ||
|
||
// auto col0_data = random_values<bool>(num_rows); | ||
auto col6_vals = random_values<int32_t>(num_rows); | ||
auto col7_vals = random_values<int64_t>(num_rows); | ||
auto col6_data = cudf::detail::make_counting_transform_iterator(0, [col6_vals](auto i) { | ||
return numeric::decimal32{col6_vals[i], numeric::scale_type{5}}; | ||
}); | ||
auto col7_data = cudf::detail::make_counting_transform_iterator(0, [col6_vals](auto i) { | ||
return numeric::decimal64{col6_vals[i], numeric::scale_type{5}}; | ||
}); | ||
auto col8_data = cudf::detail::make_counting_transform_iterator(0, [col6_vals](auto i) { | ||
return numeric::decimal128{i * 10000, numeric::scale_type{2}}; | ||
}); | ||
auto validity = cudf::detail::make_counting_transform_iterator(0, [](auto i) { return true; }); | ||
|
||
// column_wrapper<bool> col0{ | ||
// col0_data.begin(), col0_data.end(), validity}; | ||
column_wrapper<numeric::decimal32> col6{col6_data, col6_data + num_rows, validity}; | ||
column_wrapper<numeric::decimal64> col7{col7_data, col7_data + num_rows, validity}; | ||
column_wrapper<numeric::decimal128> col8{col8_data, col8_data + num_rows, validity}; | ||
|
||
std::vector<std::unique_ptr<column>> cols; | ||
// cols.push_back(col0.release()); | ||
cols.push_back(col6.release()); | ||
cols.push_back(col7.release()); | ||
cols.push_back(col8.release()); | ||
auto expected = std::make_unique<table>(std::move(cols)); | ||
EXPECT_EQ(3, expected->num_columns()); | ||
|
||
cudf_io::table_input_metadata expected_metadata(*expected); | ||
// expected_metadata.column_metadata[0].set_name( "bools"); | ||
expected_metadata.column_metadata[0].set_name("decimal32s").set_decimal_precision(10); | ||
expected_metadata.column_metadata[1].set_name("decimal64s").set_decimal_precision(10); | ||
expected_metadata.column_metadata[2].set_name("decimal128s").set_decimal_precision(10); | ||
|
||
auto filepath = ("MultiColumn.parquet"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test fails because the reader doesn't support reading decimal128 but the file created by this test can be read with pyarrow to confirm that the writing is correct. |
||
cudf_io::parquet_writer_options out_opts = | ||
cudf_io::parquet_writer_options::builder(cudf_io::sink_info{filepath}, expected->view()) | ||
.metadata(&expected_metadata); | ||
cudf_io::write_parquet(out_opts); | ||
|
||
cudf_io::parquet_reader_options in_opts = | ||
cudf_io::parquet_reader_options::builder(cudf_io::source_info{filepath}); | ||
auto result = cudf_io::read_parquet(in_opts); | ||
|
||
CUDF_TEST_EXPECT_TABLES_EQUAL(expected->view(), result.tbl->view()); | ||
cudf::test::expect_metadata_equal(expected_metadata, result.metadata); | ||
} | ||
|
||
TEST_F(ParquetWriterTest, MultiColumnWithNulls) | ||
{ | ||
constexpr auto num_rows = 100; | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably also check if col type is decimal128