SegArray optimization & bug fix #3021

brandon-neth · 2024-03-07T17:21:33Z

This PR should improve the dataframe benchmark performance by removing two sources of redundant computation for SegArray columns. It also fixes a bug in the benchmark that only saved the result of the final run.

Both redundant computations appear in the SegArray initializer, which in my profiling contributed the most to the dataframe benchmark execution time. The first redundant computation occurred when calculating the number of non-empty segments, which was calculated both when setting the _non_empty field and when counting them. The second redundant computation was the calculation of segment lengths, which occurs in the gen_ranges function, then again in the SegArray initializer. I added an optional argument to gen_ranges to return the lengths to pass straight to the initializer instead of calculating it again during initialization.

…ay initialization --- Signed-off-by: Brandon Neth <[email protected]>

bmcdonald3

Cool, good catch! Just curious, do you have any performance runs you can put in the description so we have a sense of what kind of improvement to expect? An example of that would be something like #2290

stress-tess

nice work!! I would also like to see the improved times, but I'll go ahead and merge in the meantime

brandon-neth · 2024-03-11T20:02:56Z

Sure, these results are from my Mac. Average of 5 runs for 10k-element-long columns.

Original version: .9077 seconds
With non-empty fix: .7495 seconds
Reusing lengths: 0.6762 seconds

bug fix to dataframe performance benchmark and optimization to segarr…

5a1ee39

…ay initialization --- Signed-off-by: Brandon Neth <[email protected]>

bmcdonald3 approved these changes Mar 8, 2024

View reviewed changes

stress-tess approved these changes Mar 11, 2024

View reviewed changes

stress-tess added this pull request to the merge queue Mar 11, 2024

Merged via the queue into Bears-R-Us:master with commit 31e2c82 Mar 11, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SegArray optimization & bug fix #3021

SegArray optimization & bug fix #3021

brandon-neth commented Mar 7, 2024

bmcdonald3 left a comment

stress-tess left a comment

brandon-neth commented Mar 11, 2024

SegArray optimization & bug fix #3021

SegArray optimization & bug fix #3021

Conversation

brandon-neth commented Mar 7, 2024

bmcdonald3 left a comment

Choose a reason for hiding this comment

stress-tess left a comment

Choose a reason for hiding this comment

brandon-neth commented Mar 11, 2024