Add support for reading Decimal128 Parquet columns #2821

bmcdonald3 · 2023-10-23T22:46:58Z

Previously, when reading a file with a Decimal128 column, Arkouda would silently ignore the column and the data would be dropped. This PR adds the ability to read Decimal128 Parquet columns, but they are cast to float C++ values and stored into a 64-bit real Chapel array, so some precision can be lost in the conversion.

Longer term, we would like to add support for arbitrary-precision values into Chapel, which would result in a new type, but that would be able to more accurately display Decimal128 values.

Previously, when reading a file with a Decimal128 column, Arkouda would silently ignore the column and the data would be dropped. This PR adds the ability to read Decimal128 Parquet columns, but they are cast to float C++ values and stored into a 64-bit `real` Chapel array, so some precision can be lost in the conversion. Longer term, we would like to add support for arbitrary-precision values into Chapel, which would result in a new type, but that would be able to more accurately display Decimal128 values.

bmcdonald3 · 2023-10-23T23:10:00Z

Marking as draft as I am realizing some additional checks need to be done in the C++ code that I am unsure of how to do at the moment.

bradcray · 2023-10-23T23:15:32Z

we would like to add support for arbitrary-precision values into Chapel

Capturing some discussion Ben and I were having off-repo: Another option would be to provide support for fixed-size fixed-point values in Chapel, which would be more space-efficient than arbitrary-precision values since no heap storage would be required.

stress-tess

Looks good, just want to verify that big vs little endian doesn't matter before approving

src/ArrowFunctions.cpp

bmcdonald3 marked this pull request as draft October 23, 2023 23:09

bmcdonald3 added 2 commits October 25, 2023 09:24

Add lookup table for precision values to calculate byte length

4672685

Add test

02dd1a5

bmcdonald3 marked this pull request as ready for review October 25, 2023 16:31

Add comment

80f4fdc

stress-tess requested review from stress-tess and jaketrookman October 25, 2023 21:24

stress-tess reviewed Oct 26, 2023

View reviewed changes

src/ArrowFunctions.cpp Outdated Show resolved Hide resolved

src/ArrowFunctions.cpp Show resolved Hide resolved

Remove unnecessary batch size check based off Pierce feedback

26e5a33

jaketrookman approved these changes Oct 26, 2023

View reviewed changes

Merge branch 'master' into decimal-parquet

13370eb

stress-tess approved these changes Oct 26, 2023

View reviewed changes

stress-tess enabled auto-merge October 26, 2023 16:49

stress-tess added this pull request to the merge queue Oct 26, 2023

Merged via the queue into Bears-R-Us:master with commit 3366c72 Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for reading Decimal128 Parquet columns #2821

Add support for reading Decimal128 Parquet columns #2821

bmcdonald3 commented Oct 23, 2023

bmcdonald3 commented Oct 23, 2023

bradcray commented Oct 23, 2023

stress-tess left a comment

Add support for reading Decimal128 Parquet columns #2821

Add support for reading Decimal128 Parquet columns #2821

Conversation

bmcdonald3 commented Oct 23, 2023

bmcdonald3 commented Oct 23, 2023

bradcray commented Oct 23, 2023

stress-tess left a comment

Choose a reason for hiding this comment