Add Schema::project
and RecordBatch
project function to project / select a subset of columns
#1014
Labels
enhancement
Any new improvement worthy of a entry in the changelog
good first issue
Good for newcomers
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It is common to "project" (and pick a subset) of columns from a schema (and then also RecordBatch) for processing.
https://github.com/apache/arrow-datafusion/blob/299ab7d1c37c707fcd500d3428abbdbe4dc5399b/datafusion/src/datasource/empty.rs#L65-L71
https://github.com/apache/arrow-datafusion/blob/0facd4d483e8c289ee4e3a89487d0cd1ede1a110/datafusion/src/physical_plan/file_format/mod.rs#L83-L93
There are many instances of projection
Many (most) instances of projection don't handle metadata leading to bugs like apache/datafusion#1361
Describe the solution you'd like
Add projection functions to
Schema
andRecordBatch
structs in the arrow-rs crate that properly handle metadata.Proposed signatures:
Describe alternatives you've considered
Additional context
@hntd187 added this feature in DataFusion in apache/datafusion#1378
The text was updated successfully, but these errors were encountered: