-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push gather down to Parquet Encoder #2109
Conversation
let array = arrow::compute::cast(column, &ArrowDataType::Date32)?; | ||
arrow::compute::cast(&array, &ArrowDataType::Int32)? | ||
} else { | ||
arrow::compute::cast(column, &ArrowDataType::Int32)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if statement was somewhat redundant, I suspect it dates from a refactor at some point
@@ -77,6 +81,9 @@ pub trait ColumnValueEncoder { | |||
/// Write the corresponding values to this [`ColumnValueEncoder`] | |||
fn write(&mut self, values: &Self::Values, offset: usize, len: usize) -> Result<()>; | |||
|
|||
/// Write the corresponding values to this [`ColumnValueEncoder`] | |||
fn write_gather(&mut self, values: &Self::Values, indices: &[usize]) -> Result<()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally sold on this name, suggestions welcome
I intend to run the benchmarks shortly and report back |
Codecov Report
@@ Coverage Diff @@
## master #2109 +/- ##
==========================================
- Coverage 83.73% 83.70% -0.03%
==========================================
Files 225 225
Lines 59412 59442 +30
==========================================
+ Hits 49748 49758 +10
- Misses 9664 9684 +20
|
Interestingly the performance gain is extremely minor, this might suggest the bottleneck is elsewhere 🤔
Edit: See #2123 |
Marking as a draft whilst I think a bit more on this |
Going to roll this into the optimized byte array PR |
Which issue does this PR close?
Part of #1764
Rationale for this change
Data is unnecessarily copied prior to writing it out
What changes are included in this PR?
Previously the
take
operation necessary to handle nulls, lists, etc... was implemented prior to writing the data. This PR pushes it down into the encoder, avoiding an unnecessary copyAre there any user-facing changes?
No