Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decimal type for large precisions seems to be written incorrectly to parquet #508

Closed
jorgecarleitao opened this issue Jun 29, 2021 · 0 comments · Fixed by #3164
Closed
Labels

Comments

@jorgecarleitao
Copy link
Member

jorgecarleitao commented Jun 29, 2021

The computation of the number of bytes given a precision seems incorrect, and writing decimal larger than 18 crashes?

Found while working on the corresponding in arrow2.

See parquet's definitions for details, but it seems that decimal_length_from_precision is incorrect because it is not the inverse of the maximum number of digits for a given size of parquets' FixedSizeBytes.

IMO this is the correct version:

fn decimal_length_from_precision(precision: usize) -> usize {
    // digits = floor(log_10(2^(8*n - 1) - 1))  // definition in parquet's logical types
    // ceil(digits) = log10(2^(8*n - 1) - 1)
    // 10^ceil(digits) = 2^(8*n - 1) - 1
    // 10^ceil(digits) + 1 = 2^(8*n - 1)
    // log2(10^ceil(digits) + 1) = (8*n - 1)
    // log2(10^ceil(digits) + 1) + 1 = 8*n
    // (log2(10^ceil(a) + 1) + 1) / 8 = n
    (((10.0_f64.powi(precision as i32) + 1.0).log2() + 1.0) / 8.0).ceil() as usize
}

(at least this definition causes arrow2 to write all variants of the decimal type in the generated_decimal file and roundtrip with pyarrow)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant