Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet record API: timestamp as signed integer #3437

Merged
merged 1 commit into from
Jan 4, 2023
Merged

Parquet record API: timestamp as signed integer #3437

merged 1 commit into from
Jan 4, 2023

Conversation

ByteBaker
Copy link
Contributor

@ByteBaker ByteBaker commented Jan 3, 2023

  • Use signed integers to store Date, TimestampMillis and TimestampMicros in enum Field
  • remove timezone from string representation of Date

Which issue does this PR close?

Closes #3430 .

Rationale for this change

The Field type in parquet::record interpreted Date/Time values as unsigned integers, representing number of seconds from Unix epoch. This was wrong because values before epoch do exist. When encountered with a Row that contains such a value, calling to_json_value panicked.

What changes are included in this PR?

This commit changes the implementation of Field to now use signed integers to read timestamp. It also removes timezone from the string representation of Field::Date, which made no sense otherwise.

Are there any user-facing changes?

None. All test cases pass.

- Use signed integers to store 'Date', 'TimestampMillis' and 'TimestampMicros' in 'enum Field'
- remove timezone from string representation of Date
@github-actions github-actions bot added the parquet Changes to the parquet crate label Jan 3, 2023
@ByteBaker ByteBaker changed the title parquet record API: timestamp as signed integer Parquet record API: timestamp as signed integer Jan 3, 2023
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@tustvold tustvold added the api-change Changes to the arrow API label Jan 3, 2023
static NUM_SECONDS_IN_DAY: i64 = 60 * 60 * 24;
let dt = Utc
.timestamp_opt(value as i64 * NUM_SECONDS_IN_DAY, 0)
.unwrap();
format!("{}", dt.format("%Y-%m-%d %:z"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the %:z removed? Is it because it is always zero?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does a timezone mean for a date?

@tustvold tustvold merged commit 65ff80e into apache:master Jan 4, 2023
@ursabot
Copy link

ursabot commented Jan 4, 2023

Benchmark runs are scheduled for baseline = b82b35f and contender = 65ff80e. 65ff80e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet Record API: Cannot convert date before Unix epoch to json
4 participants