-
Notifications
You must be signed in to change notification settings - Fork 224
remove accidental quadratic null_count #991
Conversation
@@ -234,12 +234,6 @@ impl<T: NativeType> MutablePrimitiveArray<T> { | |||
let len = self.len(); | |||
if let Some(validity) = self.validity.as_mut() { | |||
validity.extend_constant(len - validity.len(), true); | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no validity
and we extend with true
so we know that null_count == 0
no need to compute it.
oof, that is an amazing find - a lot of code uses this. |
Yes, Almost all polars expressions in the groupby context. So I am really excited about finding this one. :D |
Codecov Report
@@ Coverage Diff @@
## main #991 +/- ##
=======================================
Coverage 71.38% 71.39%
=======================================
Files 357 357
Lines 19791 19787 -4
=======================================
- Hits 14128 14126 -2
+ Misses 5663 5661 -2
Continue to review full report at Codecov.
|
- Greatly improves performance of the list builders by: jorgecarleitao/arrow2#991 - List builders now also support nested dtypes like List and Struct - Python DataFrame and Series constructor now support better nested dtype construction
* list namespace to polars-ops * Improve list builders, iteration and construction - Greatly improves performance of the list builders by: jorgecarleitao/arrow2#991 - List builders now also support nested dtypes like List and Struct - Python DataFrame and Series constructor now support better nested dtype construction * fix tests and fix struct::agg_list
* list namespace to polars-ops * Improve list builders, iteration and construction - Greatly improves performance of the list builders by: jorgecarleitao/arrow2#991 - List builders now also support nested dtypes like List and Struct - Python DataFrame and Series constructor now support better nested dtype construction * fix tests and fix struct::agg_list
* list namespace to polars-ops * Improve list builders, iteration and construction - Greatly improves performance of the list builders by: jorgecarleitao/arrow2#991 - List builders now also support nested dtypes like List and Struct - Python DataFrame and Series constructor now support better nested dtype construction * fix tests and fix struct::agg_list
* list namespace to polars-ops * Improve list builders, iteration and construction - Greatly improves performance of the list builders by: jorgecarleitao/arrow2#991 - List builders now also support nested dtypes like List and Struct - Python DataFrame and Series constructor now support better nested dtype construction * fix tests and fix struct::agg_list
I noticed that the list builders in polars were extremely slow compared to the
concat
kernels, so I took a look why.The flamegraph showed that this snippet was dominated by
null_counts
.The underlying code did a full bit count at every
extend_slice
call, leading to toO(n^2)
behavior.So the result improved drastically when I removed the redundant count.