-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate Parquet -> Arrow Schema Conversion From ArrayBuilder #1655
Labels
enhancement
Any new improvement worthy of a entry in the changelog
parquet
Changes to the parquet crate
Comments
tustvold
added
the
enhancement
Any new improvement worthy of a entry in the changelog
label
May 5, 2022
This was referenced May 6, 2022
tustvold
added a commit
to tustvold/arrow-rs
that referenced
this issue
May 9, 2022
Don't treat embedded arrow schema as authoritative (apache#1663) Fix projection of nested parquet files (apache#1652) (apache#1654)
tustvold
added a commit
to tustvold/arrow-rs
that referenced
this issue
May 9, 2022
Don't treat embedded arrow schema as authoritative (apache#1663) Fix projection of nested parquet files (apache#1652) (apache#1654) Fix schema inference for repeated fields (apache#1681) Support reading alternative list representations from parquet (apache#1680)
tustvold
added a commit
to tustvold/arrow-rs
that referenced
this issue
May 9, 2022
Don't treat embedded arrow schema as authoritative (apache#1663) Fix projection of nested parquet files (apache#1652) (apache#1654) Fix schema inference for repeated fields (apache#1681) Support reading alternative list representations from parquet (apache#1680) Consistent handling of unsupported arrow types in parquet (apache#1666)
tustvold
added a commit
to tustvold/arrow-rs
that referenced
this issue
May 11, 2022
Don't treat embedded arrow schema as authoritative (apache#1663) Fix projection of nested parquet files (apache#1652) (apache#1654) Fix schema inference for repeated fields (apache#1681) Support reading alternative list representations from parquet (apache#1680)
alamb
pushed a commit
that referenced
this issue
May 13, 2022
* Separate parquet -> arrow conversion logic (#1655) Don't treat embedded arrow schema as authoritative (#1663) Fix projection of nested parquet files (#1652) (#1654) Fix schema inference for repeated fields (#1681) Support reading alternative list representations from parquet (#1680) * Add more tests * Pass pointers by reference * More docs * Fix lint * Review feedback * Review feedback * Fix test failures related to #1697
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
Any new improvement worthy of a entry in the changelog
parquet
Changes to the parquet crate
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently ArrayBuilderContext has multiple responsibilities
The result is not only immensely confusing but also:
parquet_to_arrow_schema_by_columns
Describe the solution you'd like
Create an
ArrowSchemaConverter
which takes aFileMetaData
and an optional column projection and returnsParquetField
whereThis can then easily be used to generate the Schema or ArrayReader for the projected columns, replacing the existing logic.
As FileMetaData can easily be created, this should be significantly easier to test than the current logic.
Describe alternatives you've considered
Some of the bugs can be worked around manually but the code is getting increasingly difficult to reason about, and I think it has reached a point where we need to spend some time to refactor it.
Additional context
#1654
#1652
#1459
The text was updated successfully, but these errors were encountered: