Parquet writing examples/macro/guidance #58

lyuben-todorov · 2021-10-11T11:10:41Z

Hi, I have a long-standing puzzle with parquet's presence in rust.

My end-goal is to be able to write parquet files containing my data in rust in an efficient manner (no json, senseless conversions, etc.). For this my logical approach would be to look for/make my own derive macro for Parquet writers for structs. However, the parquet_derive crate is lacking a lot of features (nested structures) and according to ASF slack isn't really actively developed at the moment. I tried implementing a derive macro for an Arrow RecordBatch writer (from the arrow crate) but I quickly ran into problems with using the arrow crate itself. And because of that experience I'm really starting to think that writing parquet in anything other than Java was not meant to be, but that is still not the theoretical case.

I'm asking the maintainer as a person with more experience with the parquet format and ecosystem, are my goals possible? If yes, could you please provide me some guidance on what would need to be done, what's the best way to approach it and maybe a code example of implementing the conversion of data (Vec) into a parquet file.

The text was updated successfully, but these errors were encountered:

jorgecarleitao · 2021-10-11T15:23:59Z

Thanks a lot for reaching out! Some questions:

Is the data columnar in nature (or is it e.g. a stream of rows)?
can you lay out your data according to the arrow format?
is the data flat (i.e. no nested structures)?

If all are yes, then I would try using arrow2 directly.

If 1 or 2 is negative, I would try out arrow2-derive to build a RecordBatch

If 3 is negative, we still do not support it in arrow2 (in the roadmap, see e.g. jorgecarleitao/arrow2#504).

Repository owner locked and limited conversation to collaborators Oct 11, 2021

jorgecarleitao closed this as completed Oct 11, 2021

jorgecarleitao added no-changelog question Further information is requested and removed no-changelog labels Oct 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Parquet writing examples/macro/guidance #58

Parquet writing examples/macro/guidance #58

lyuben-todorov commented Oct 11, 2021

jorgecarleitao commented Oct 11, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

Parquet writing examples/macro/guidance #58

Parquet writing examples/macro/guidance #58

Comments

lyuben-todorov commented Oct 11, 2021

jorgecarleitao commented Oct 11, 2021

This issue was moved to a discussion.