Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parquet (feature): Encode complex objects as JSON columns #3343

Merged
merged 4 commits into from
Jan 18, 2024
Merged

Conversation

xerial
Copy link
Member

@xerial xerial commented Jan 18, 2024

Previously ParquetWriter used MessagePack for embedding complex object data into a column. For compatibility with the Parquet ecosystem (e.g., DuckDB), using JSON is better.

@xerial xerial marked this pull request as ready for review January 18, 2024 08:27
Copy link

codecov bot commented Jan 18, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (1ee3560) 82.55% compared to head (46f75cd) 82.57%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3343      +/-   ##
==========================================
+ Coverage   82.55%   82.57%   +0.01%     
==========================================
  Files         355      355              
  Lines       14855    14881      +26     
  Branches     2434     2462      +28     
==========================================
+ Hits        12264    12288      +24     
- Misses       2591     2593       +2     
Files Coverage Δ
...a/wvlet/airframe/parquet/ParquetRecordReader.scala 94.82% <100.00%> (+0.82%) ⬆️
...ala/wvlet/airframe/parquet/ParquetWriteCodec.scala 91.89% <100.00%> (+0.98%) ⬆️
...n/scala/wvlet/airframe/parquet/ParquetSchema.scala 91.07% <90.90%> (-0.42%) ⬇️
.../main/scala/wvlet/airframe/msgpack/spi/Value.scala 82.31% <81.81%> (-0.09%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ee3560...46f75cd. Read the comment docs.

@xerial xerial changed the title parquet (feature): Use JSON type for complex nested objects by default parquet (feature): Encode complex objects as JSON columns Jan 18, 2024
@xerial xerial merged commit 3e4be7f into main Jan 18, 2024
17 checks passed
@xerial xerial deleted the parquet-json branch January 18, 2024 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant