-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Parquet][C++] Support month_day_nano_interval type in Parquet #36799
Comments
I don't know if parquet supports month/day/nano. It looks like it supports month/day/milli. CC @emkornfield who might know more details. |
Hi @westonpace |
You can:
Currently, we doesn't support write arrow |
This error was reported by me. |
Yeah, Parquet does not have an analogous type. Besides for nanos vs nanoseconds, the type in parquet has each integer unsigned. I think we have three options to try to fix this:
|
@emkornfield I do not think Rust has established a convention yet (I don't think the rust parquet writer supports writing monthdaynano intervals yet): Maybe @tustvold has an opinion on what the mapping should be |
This is document l founded: pyarrow has type month_day_nano_interval |
By the way, do we have document for arrow type and correspond parquet type mapping? That would make things more clear. |
By the way, do we have document for arrow type and correspond parquet type mapping? That would make things more clear. I don't think we have it documented anywhere other than the rust code itself, for example: |
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md is the canonical mapping, one thing worth highlighting here is that the parquet schema is authoritative, the embedded arrow schema is just a hint to provide additional information, see apache/arrow-rs#1663. As such we would need a way to represent nanosecond intervals in parquet natively, before we could add support to arrow. The upstream ticket for this is - apache/parquet-format#43 |
Oh by the way, we can refer to this doc for parquet arrow mapping: https://github.com/apache/arrow/blob/main/docs/source/cpp/parquet.rst#logical-types |
Describe the usage question you have. Please include as many useful details as possible.
I want to generate a parquet file including type month_day_nano_interval.
This is my python code:
import pyarrow as pa
import pyarrow.parquet as pq
Define Schema
schema = pa.schema([
('itv', pa.month_day_nano_interval())
])
Prepare data
itv = pa.array([(1 , 15, -30),
(0 , 0, 0),
(13,25,1000),
(13,25,1000000),
(13,25,1000000000)
],
type = pa.month_day_nano_interval())
Generate Parquet data
batch = pa.RecordBatch.from_arrays( [itv], schema = schema )
table = pa.Table.from_batches([batch])
Write Parquet file pqtpitvl.parquet
pq.write_table(table, 'pqtpitvl.parquet')
it was failed and display error:
pyarrow.lib.ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: month_day_nano_interval
The text was updated successfully, but these errors were encountered: