Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decimal128 support to Parquet reader and writer #9765

Merged
merged 19 commits into from
Dec 8, 2021

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Nov 23, 2021

Closes #9566
Depends on #9804

Read decimal columns as 128bit when the input width requires it.
Write decimal128 columns as FIXED_LEN_BYTE_ARRAY.
Use the smallest viable decimal size to read FIXED_LEN_BYTE_ARRAY (used to default to decimal64, even when 32bits are sufficient).
Removes strict_decimal_types option from Parquet reader, we can now always read using the exact decimal type.

@vuule vuule added feature request New feature or request cuIO cuIO issue breaking Breaking change labels Nov 23, 2021
@vuule vuule self-assigned this Nov 23, 2021
@github-actions github-actions bot added Java Affects Java cuDF API. Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Nov 23, 2021
@codecov
Copy link

codecov bot commented Nov 24, 2021

Codecov Report

Merging #9765 (32c8549) into branch-22.02 (967a333) will decrease coverage by 0.05%.
The diff coverage is 5.48%.

❗ Current head 32c8549 differs from pull request most recent head e2c5291. Consider uploading reports for the commit e2c5291 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.02    #9765      +/-   ##
================================================
- Coverage         10.49%   10.43%   -0.06%     
================================================
  Files               119      119              
  Lines             20305    20449     +144     
================================================
+ Hits               2130     2134       +4     
- Misses            18175    18315     +140     
Impacted Files Coverage Δ
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/_base_index.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/column.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/string.py 0.00% <ø> (ø)
python/cudf/cudf/core/dataframe.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/groupby/groupby.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <ø> (ø)
python/cudf/cudf/core/indexed_frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/multiindex.py 0.00% <0.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a72f19e...e2c5291. Read the comment docs.

@vuule vuule marked this pull request as ready for review November 24, 2021 23:11
@vuule vuule requested review from a team as code owners November 24, 2021 23:11
@vuule vuule requested a review from devavret November 29, 2021 19:03
@vuule vuule requested a review from mythrocks November 30, 2021 20:12
Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for accommodating the changes. LGTM!

@vuule vuule added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 30, 2021
Copy link
Contributor

@nvdbaranec nvdbaranec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I felt the need to add one change, just to be annoying.

cpp/src/io/parquet/page_enc.cu Outdated Show resolved Hide resolved
vuule and others added 2 commits December 2, 2021 13:30
rapids-bot bot pushed a commit that referenced this pull request Dec 7, 2021
This PR adds a `decimal128` type validation in parquet reader. This is put in-place to unblock libcudf changes: #9765 and this validation will soon be removed once python side of `decimal128` changes are merged(blocked by libcudf `from_arrow` bug).

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #9804
@github-actions github-actions bot removed the Python Affects Python cuDF API. label Dec 7, 2021
@vuule vuule added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 5 - DO NOT MERGE Hold off on merging; see PR for details labels Dec 7, 2021
@vuule
Copy link
Contributor Author

vuule commented Dec 8, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ea3aff2 into rapidsai:branch-22.02 Dec 8, 2021
@vuule vuule deleted the fea-parquet-dec128 branch December 8, 2021 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change cuIO cuIO issue feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Decimal128 support in Parquet reader/writer
7 participants