Leverage Block#mayHaveNull() in HashSemiJoinOperator #8639

pettyjamesm · 2021-07-22T14:52:33Z

Comparable changes to prestodb/presto#16459

Consults Block#mayHaveNull() outside of the main processing loop in HashSemiJoinOperator so that Block#isNull checks can be skipped inside the loop when none are necessary.

Also adds:

LazyBlock#mayHaveNull() implementation to make the Block#mayHaveNull() checks like this one more effective
Page#getLoadedPage(int... columns) helper methods to support simultaneous column extraction and LazyBlock unwrapping

raunaqmorarka · 2021-07-22T15:09:05Z

Could you please add JMH benchmark results before and after the change showing the improvement ?

pettyjamesm · 2021-07-22T17:45:49Z

Could you please add JMH benchmark results before and after the change showing the improvement ?

It's going to take some time to put that together since no JMH benchmark exists for HashSemiJoinOperator. I tried using SqlSemiJoinInPredicateBenchmark but unfortunately that query is converted to an inner-join by the planner, so I modified that benchmark to use the query string: "SELECT orderkey FROM lineitem WHERE orderkey < 0 OR orderkey IN (SELECT orderkey FROM orders WHERE orderkey % 2 = 0)" and ran it locally with 100 warm-up iterations and 25 measured ones. Run to run variability is high and this benchmark is much less scientific than a JMH based approach would be, but the output looks something like this:

Before:

peak_memory:1410102,elapsed_millis:102,input_rows_per_second:735943,output_rows_per_second:294181,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:102147725,cpu_nanos:99034000,user_nanos:98101000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
peak_memory:1410102,elapsed_millis:127,input_rows_per_second:591470,output_rows_per_second:236430,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:127098501,cpu_nanos:109192000,user_nanos:107764000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
peak_memory:1410102,elapsed_millis:139,input_rows_per_second:538891,output_rows_per_second:215413,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:139499266,cpu_nanos:106468000,user_nanos:105621000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
peak_memory:1410102,elapsed_millis:95,input_rows_per_second:789199,output_rows_per_second:315469,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:95254783,cpu_nanos:92582000,user_nanos:91866000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
peak_memory:1410102,elapsed_millis:103,input_rows_per_second:730367,output_rows_per_second:291952,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:102927612,cpu_nanos:97680000,user_nanos:96275000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
peak_memory:1410102,elapsed_millis:114,input_rows_per_second:657643,output_rows_per_second:262882,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:114309571,cpu_nanos:106110000,user_nanos:104536000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
                    sql_semijoin_in ::  106.720 cpu ms :: 1.34MB peak memory :: in 75.2K,      0B,    704K/s,      0B/s :: out 30.1K,   264KB,    282K/s,  2.42MB/s

After:

peak_memory:1410102,elapsed_millis:78,input_rows_per_second:963263,output_rows_per_second:385049,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:78041971,cpu_nanos:77661000,user_nanos:77446000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
peak_memory:1410102,elapsed_millis:77,input_rows_per_second:970159,output_rows_per_second:387805,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:77487229,cpu_nanos:77326000,user_nanos:77172000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
peak_memory:1410102,elapsed_millis:86,input_rows_per_second:878736,output_rows_per_second:351260,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:85548977,cpu_nanos:84220000,user_nanos:83745000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
peak_memory:1410102,elapsed_millis:87,input_rows_per_second:864962,output_rows_per_second:345754,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:86911267,cpu_nanos:86610000,user_nanos:86282000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
peak_memory:1410102,elapsed_millis:81,input_rows_per_second:925078,output_rows_per_second:369785,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:81263382,cpu_nanos:81105000,user_nanos:80900000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
peak_memory:1410102,elapsed_millis:82,input_rows_per_second:911462,output_rows_per_second:364342,input_megabytes:0,input_megabytes_per_second:0,wall_nanos:82477310,cpu_nanos:82269000,user_nanos:81981000,input_rows:75175,input_bytes:0,output_rows:30050,output_bytes:270450
                    sql_semijoin_in ::   82.769 cpu ms :: 1.34MB peak memory :: in 75.2K,      0B,    908K/s,      0B/s :: out 30.1K,   264KB,    363K/s,  3.12MB/s

Logically, these changes here are extending the pattern of using mayHaveNull() to unswitch primary loop isNull checks like was done in #8437

raunaqmorarka · 2021-07-22T18:16:16Z

I think it would be nice to add a JMH benchmark if one doesn't exist while making such optimizations. The linked PR doesn't have JMH numbers either.
Maybe we could go one step further and create 2 separate even more streamlined for loops for the never-null and may-have-null cases. Having a JMH benchmark would help to guide to what extent we should make optimizations like that.

pettyjamesm · 2021-07-22T22:33:22Z

Maybe we could go one step further and create 2 separate even more streamlined for loops for the never-null and may-have-null cases. Having a JMH benchmark would help to guide to what extent we should make optimizations like that.

I chose to leave the loop unswitching decision up to the C2 JIT instead of manually unswitching the loops here because a manually unswitched version would probably want variants for #mayHaveNull() and probeHashChannel >= 0 (and maybe channelSet.hasNull() and channelSet.isEmpty() too?) which would be at least 4 different loop variants. Having done that exercise, I can say the even with just 2 different patterns, the resulting code is very difficult to read and reason about. While so while C2 may be more fickle about unswitching on both loop invariants than a manual approach, it seemed like the right compromise in this context.

I think it would be nice to add a JMH benchmark if one doesn't exist while making such optimizations. The linked PR doesn't have JMH numbers either.

I agree, but on my end its been a question of available time and the effort required to produce meaningful benchmarks. In the case of the PartitionedOutputOperator PR, I chose not to attach numbers for the existing JMH benchmarks because it would have required a lot of changes to better simulate real-world conditions, and without those changes the existing benchmark numbers would be misleading at best. For example, the existing benchmarks don't exercise null channel or any-row replication and don't vary block or PartitionFunction implementation classes, which allows the JIT to optimistically inline things deeply in ways that a real worker JVM instance would not.

core/trino-main/src/main/java/io/trino/operator/HashSemiJoinOperator.java

lukasz-stec · 2021-07-23T08:57:53Z

core/trino-main/src/main/java/io/trino/operator/HashSemiJoinOperator.java


            // update hashing strategy to use probe cursor
            for (int position = 0; position < inputPage.getPositionCount(); position++) {
-                if (probeJoinPage.getBlock(0).isNull(position)) {
+                if (probeJoinNulls != null && probeJoinNulls.isNull(position)) {


How about using a boolean variable like probeJoinCanBeNull instead of null in probeJoinNulls?
The code would be more explicit.

You could, but then you'd need a boolean probeJoinCanBeNull and Block probeJoinNulls which doesn't seem like an improvement in clarity.

pettyjamesm · 2021-07-27T15:02:24Z

Renamed getLoadedColumns to getLoadedPage and added an inline comment about why channelSet was added as a local variable instead of referring to the instance field directly based on review feedback on the PrestoDB PR. The changes should be ready for final review / merge.

cla-bot bot added the cla-signed label Jul 22, 2021

pettyjamesm requested a review from martint July 22, 2021 14:52

findepi requested review from skrzypo987 and lukasz-stec July 22, 2021 20:37

skrzypo987 reviewed Jul 23, 2021

View reviewed changes

core/trino-main/src/main/java/io/trino/operator/HashSemiJoinOperator.java Outdated Show resolved Hide resolved

lukasz-stec reviewed Jul 23, 2021

View reviewed changes

pettyjamesm force-pushed the improve-hashsemijoinoperator branch from 4b90f8e to e1deae9 Compare July 26, 2021 14:25

martint approved these changes Jul 26, 2021

View reviewed changes

pettyjamesm added 3 commits July 27, 2021 09:53

Implement LazyBlock#mayHaveNull()

5890c79

Add Page#getLoadedPage(int... columns) helper methods

4bf1467

Consult Block#mayHaveNull() outside of HashSemiJoinOperator loop

f4cb08a

pettyjamesm force-pushed the improve-hashsemijoinoperator branch from e1deae9 to f4cb08a Compare July 27, 2021 13:56

martint merged commit 41ab5e6 into trinodb:master Jul 28, 2021

martint added this to the 360 milestone Jul 28, 2021

martint mentioned this pull request Jul 28, 2021

Release notes for 360 #8455

Closed

11 tasks

pettyjamesm deleted the improve-hashsemijoinoperator branch July 28, 2021 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage Block#mayHaveNull() in HashSemiJoinOperator #8639

Leverage Block#mayHaveNull() in HashSemiJoinOperator #8639

pettyjamesm commented Jul 22, 2021 •

edited

Loading

raunaqmorarka commented Jul 22, 2021

pettyjamesm commented Jul 22, 2021

raunaqmorarka commented Jul 22, 2021

pettyjamesm commented Jul 22, 2021 •

edited

Loading

lukasz-stec Jul 23, 2021

pettyjamesm Jul 23, 2021 •

edited

Loading

pettyjamesm commented Jul 27, 2021

Leverage Block#mayHaveNull() in HashSemiJoinOperator #8639

Leverage Block#mayHaveNull() in HashSemiJoinOperator #8639

Conversation

pettyjamesm commented Jul 22, 2021 • edited Loading

raunaqmorarka commented Jul 22, 2021

pettyjamesm commented Jul 22, 2021

raunaqmorarka commented Jul 22, 2021

pettyjamesm commented Jul 22, 2021 • edited Loading

lukasz-stec Jul 23, 2021

Choose a reason for hiding this comment

pettyjamesm Jul 23, 2021 • edited Loading

Choose a reason for hiding this comment

pettyjamesm commented Jul 27, 2021

pettyjamesm commented Jul 22, 2021 •

edited

Loading

pettyjamesm commented Jul 22, 2021 •

edited

Loading

pettyjamesm Jul 23, 2021 •

edited

Loading