Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Support Parquet read of pages with Encoding::PlainDictionary and non-optional values #668

Closed
mdrach opened this issue Dec 9, 2021 · 2 comments
Labels
feature A new feature no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@mdrach
Copy link
Contributor

mdrach commented Dec 9, 2021

I'm reading a parquet file generated by Snowflake and am hitting an error here:

(Encoding::Plain, _, false) => read_required(page.buffer(), additional, size, values),

Decoding "PlainDictionary"-encoded, dictionary-encoded required V1 pages is not yet implemented for FixedSizeBinary

Since PlainDictionary encoding with optional values is implemented, would it be simple to implement the non-optional version?
(Encoding::PlainDictionary, Some(dict), true) => ...

(I actually took a stab at this here but am missing a lot of context.)

@jorgecarleitao
Copy link
Owner

jorgecarleitao commented Dec 9, 2021

Hey. Thanks! Yes, that looks about right. I would just add an integration test against pyarrow here. What I usually do is:

  1. add a new column here for the type
  2. add the corresponding data here
  3. add a test like this: https://github.com/jorgecarleitao/arrow2/blob/main/tests/it/io/parquet/read.rs#L139 targeting that column

so that we prove that we can read the type when it is written by pyarrow (which uses the c++ parquet implementation)

@jorgecarleitao jorgecarleitao added feature A new feature no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog labels Dec 31, 2021
@jorgecarleitao
Copy link
Owner

Closed by #683

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature A new feature no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants