[FEA] Ability to control field ID used for columns during Parquet write #10376
Labels
cuIO
cuIO issue
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Spark
Functionality that helps Spark RAPIDS
Is your feature request related to a problem? Please describe.
Parquet columns include a
field_id
identifier, and Spark 3.3 recently added improved support for this feature. See SPARK-38094 for details. In order to support this Spark feature in the RAPIDS Accelerator, we need the ability to specify the ID to use for a column'sfield_id
when the Parquet file is written and these field IDs have been specified in the schema to write.Describe the solution you'd like
column_in_metadata
could have two additional fields and a new set method, e.g.:During the Parquet write, if a column_in_metadata indicates it has a field ID setting then the specified field ID is used when encoding the column metadata in the Parquet footer, otherwise a field ID is left unspecified.
Additional context
See the
field_id
in the Parquet Schema DefinitionThe text was updated successfully, but these errors were encountered: