Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading Decimal128 Parquet columns #2821

Merged
merged 6 commits into from
Oct 26, 2023

Conversation

bmcdonald3
Copy link
Contributor

Previously, when reading a file with a Decimal128 column, Arkouda would silently ignore the column and the data would be dropped. This PR adds the ability to read Decimal128 Parquet columns, but they are cast to float C++ values and stored into a 64-bit real Chapel array, so some precision can be lost in the conversion.

Longer term, we would like to add support for arbitrary-precision values into Chapel, which would result in a new type, but that would be able to more accurately display Decimal128 values.

Previously, when reading a file with a Decimal128 column, Arkouda
would silently ignore the column and the data would be dropped.
This PR adds the ability to read Decimal128 Parquet columns,
but they are cast to float C++ values and stored into a 64-bit
`real` Chapel array, so some precision can be lost in the
conversion.

Longer term, we would like to add support for arbitrary-precision
values into Chapel, which would result in a new type, but that
would be able to more accurately display Decimal128 values.
@bmcdonald3 bmcdonald3 marked this pull request as draft October 23, 2023 23:09
@bmcdonald3
Copy link
Contributor Author

Marking as draft as I am realizing some additional checks need to be done in the C++ code that I am unsure of how to do at the moment.

@bradcray
Copy link
Contributor

we would like to add support for arbitrary-precision values into Chapel

Capturing some discussion Ben and I were having off-repo: Another option would be to provide support for fixed-size fixed-point values in Chapel, which would be more space-efficient than arbitrary-precision values since no heap storage would be required.

@bmcdonald3 bmcdonald3 marked this pull request as ready for review October 25, 2023 16:31
Copy link
Member

@stress-tess stress-tess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just want to verify that big vs little endian doesn't matter before approving

src/ArrowFunctions.cpp Outdated Show resolved Hide resolved
src/ArrowFunctions.cpp Show resolved Hide resolved
@stress-tess stress-tess enabled auto-merge October 26, 2023 16:49
@stress-tess stress-tess added this pull request to the merge queue Oct 26, 2023
Merged via the queue into Bears-R-Us:master with commit 3366c72 Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants