Decimal type for large precisions seems to be written incorrectly to parquet #508

jorgecarleitao · 2021-06-29T17:39:04Z

The computation of the number of bytes given a precision seems incorrect, and writing decimal larger than 18 crashes?

Found while working on the corresponding in arrow2.

See parquet's definitions for details, but it seems that decimal_length_from_precision is incorrect because it is not the inverse of the maximum number of digits for a given size of parquets' FixedSizeBytes.

IMO this is the correct version:

fn decimal_length_from_precision(precision: usize) -> usize {
    // digits = floor(log_10(2^(8*n - 1) - 1))  // definition in parquet's logical types
    // ceil(digits) = log10(2^(8*n - 1) - 1)
    // 10^ceil(digits) = 2^(8*n - 1) - 1
    // 10^ceil(digits) + 1 = 2^(8*n - 1)
    // log2(10^ceil(digits) + 1) = (8*n - 1)
    // log2(10^ceil(digits) + 1) + 1 = 8*n
    // (log2(10^ceil(a) + 1) + 1) / 8 = n
    (((10.0_f64.powi(precision as i32) + 1.0).log2() + 1.0) / 8.0).ceil() as usize
}

(at least this definition causes arrow2 to write all variants of the decimal type in the generated_decimal file and roundtrip with pyarrow)

The text was updated successfully, but these errors were encountered:

jorgecarleitao added the bug label Jun 29, 2021

tustvold mentioned this issue Oct 28, 2022

Arrow decimals with precision 12 can't be written to parquet #1586

Closed

psvri mentioned this issue Nov 22, 2022

Fix parquet decimal precision #3164

Merged

tustvold closed this as completed in #3164 Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decimal type for large precisions seems to be written incorrectly to parquet #508

Decimal type for large precisions seems to be written incorrectly to parquet #508

jorgecarleitao commented Jun 29, 2021 •

edited

Loading

Decimal type for large precisions seems to be written incorrectly to parquet #508

Decimal type for large precisions seems to be written incorrectly to parquet #508

Comments

jorgecarleitao commented Jun 29, 2021 • edited Loading

jorgecarleitao commented Jun 29, 2021 •

edited

Loading