[EPIC] A collection of items to improve speed of parquet metadata encoding #5853
Labels
enhancement
Any new improvement worthy of a entry in the changelog
parquet
Changes to the parquet crate
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
There have been several recent assertions that Parquet is not suitable for handling wide tables with 1000s of columns
The rationale often goes something like wide tables have “large” metadata, which takes a “long time” to decode, often longer than reading the data itself.
This has led to several proposals for new file formats such as in BtrBlocks, Lance V2. Nimble, and recent discussions on the parquet mailing list.
However, there are several ways we can improve the performance of the existing thrift decoding in parquet-rs and this ticket captures several ideas of how to do so
The text was updated successfully, but these errors were encountered: