You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given that we serialize the Arrow schema and store it in the Parquet metadata, it becomes easier to write intervals as FixedLenBinary. On the read side, we take guidance from the Arrow schema on which IntervalUnit to use.
The problem comes if we read an interval without an Arrow schema. I think it'd be the same with the Duration type.
Given that the Duration type is not composite, how about we store it as an INT32 or INT64 depending on the resolution, then rely on ARROW::schema to roundtrip it correctly? CC @emkornfield as you've recently worked on this part of the C++ impl.
Micah Kornfield / @emkornfield:
For duration I like int64 + arrow schema for round tripping. we might want to add some extra metadata to indicate it is a duration separately (I need to review the parquet specification to see what is allowed in this area).
Jorge Leitão / @jorgecarleitao:
I do not think extra metadata is needed: store them as i64, and load them using the arrow schema seems reasonable: the schema contains the time unit, which is sufficient to guarantee a roundtrip.
P: @jorisvandenbossche / @jorgecarleitao / @emkornfield : any planned movement on this issue? Coming from the Pandas side, it's quite inconvenient having to special-case types handled by Pandas but not by Arrow/Parquet.
Currently this is not supported:
There is no direct mapping to Parquet logical types. There is an INTERVAL type, but this more matches Arrow's ( YEAR_MONTH or DAY_TIME) interval type.
But, those duration values could be stored as just integers, and based on the serialized arrow schema, it could be restored when reading back in.
Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Joris Van den Bossche / @jorisvandenbossche
PRs and other links:
Note: This issue was originally created as ARROW-6780. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: