Fix degenerate conditional nested loop join detection [databricks] #11268

jlowe · 2024-07-29T14:25:15Z

Fixes #11266. The failure on Databricks is because the aggregate was pushed through the join, which resulted in a non-empty output from the join. The fix from #11244 was flawed in that it detected unconditional joins if the condition was true and the output is empty (i.e.: a row-count-only join), but this last condition isn't necessary. Nested loop joins are unconditional joins if the join condition is always true regardless of the outputs being produced by the join.

Signed-off-by: Jason Lowe <[email protected]>

jlowe · 2024-07-29T14:25:30Z

build

jlowe · 2024-07-29T18:43:02Z

CI failed in test_right_broadcast_nested_loop_join_without_condition_empty, which exposed that we were not properly handling empty build-side batches in unconditional nested outer loop joins. Previously we hacked around it by adding an always-true condition, but this adds support for it and removes the hack.

jlowe · 2024-07-29T18:43:05Z

build

revans2 · 2024-07-29T18:43:49Z

...rc/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastNestedLoopJoinExecBase.scala

-              joinTime = joinTime)
+
+            localJoinType match {
+              case LeftOuter if spillableBuiltBatch.numRows == 0 =>


Do we need to worry about a full outer join?

In the future yes, but we do not support FullOuter joins for broadcasted nested loop joins. Support for that is tracked by #3269.

jlowe · 2024-07-29T21:38:06Z

CI failure appears to be related to rapidsai/cudf#16426.

jlowe · 2024-07-29T21:38:09Z

build

pxLi · 2024-07-30T00:25:06Z

some regex cases (https://github.com/NVIDIA/spark-rapids/blob/branch-24.08/jenkins/spark-premerge-build.sh#L223-L224) seems to be hanging there forever in scala213 CI, like

test_regexp_replace[DATAGEN_SEED=1722297411, TZ=UTC]

executor stop producing any further logs,

[INFO] 2024-07-29 23:57:06,351 org.sparkproject.jetty.util.log initialized - Logging initialized @13791ms to org.sparkproject.jetty.util.log.Slf4jLog
2024-07-29 23:57:11 INFO     Running test 'src/main/python/regexp_test.py::test_split_regexp_disabled_fallback[DATAGEN_SEED=1722297411, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(ProjectExec,StringSplit)]'
[WARN] 2024-07-29 23:57:17,340 com.nvidia.spark.rapids.GpuOverrides logWarning - 
!Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
  @Expression <Alias> split(a#16, [:], 2) AS split(a, [:], 2)#18 could run on GPU
    !Expression <StringSplit> split(a#16, [:], 2) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [:] could run on GPU
      @Expression <Literal> 2 could run on GPU
  @Expression <Alias> split(a#16, [o:], 5) AS split(a, [o:], 5)#19 could run on GPU
    !Expression <StringSplit> split(a#16, [o:], 5) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [o:] could run on GPU
      @Expression <Literal> 5 could run on GPU
  @Expression <Alias> split(a#16, [^:], 2) AS split(a, [^:], 2)#20 could run on GPU
    !Expression <StringSplit> split(a#16, [^:], 2) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [^:] could run on GPU
      @Expression <Literal> 2 could run on GPU
  @Expression <Alias> split(a#16, [^o], 55) AS split(a, [^o], 55)#21 could run on GPU
    !Expression <StringSplit> split(a#16, [^o], 55) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [^o] could run on GPU
      @Expression <Literal> 55 could run on GPU
  @Expression <Alias> split(a#16, [o]{1,2}, 999) AS split(a, [o]{1,2}, 999)#22 could run on GPU
    !Expression <StringSplit> split(a#16, [o]{1,2}, 999) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [o]{1,2} could run on GPU
      @Expression <Literal> 999 could run on GPU
  @Expression <Alias> split(a#16, [bf], 2) AS split(a, [bf], 2)#23 could run on GPU
    !Expression <StringSplit> split(a#16, [bf], 2) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [bf] could run on GPU
      @Expression <Literal> 2 could run on GPU
  @Expression <Alias> split(a#16, [o], 5) AS split(a, [o], 5)#24 could run on GPU
    !Expression <StringSplit> split(a#16, [o], 5) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [o] could run on GPU
      @Expression <Literal> 5 could run on GPU
  ! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec
    @Expression <AttributeReference> a#16 could run on GPU

2024-07-29 23:57:17 INFO     Running test 'src/main/python/regexp_test.py::test_split_escaped_chars_in_character_class[DATAGEN_SEED=1722297411, TZ=UTC]'
[WARN] 2024-07-29 23:57:18,967 com.nvidia.spark.rapids.GpuOverrides logWarning - 
  ! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec
    @Expression <AttributeReference> a#48 could run on GPU

2024-07-29 23:57:19 INFO     Running test 'src/main/python/regexp_test.py::test_regexp_replace[DATAGEN_SEED=1722297411, TZ=UTC]'
[WARN] 2024-07-29 23:57:20,955 com.nvidia.spark.rapids.GpuOverrides logWarning - 
  ! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec
    @Expression <AttributeReference> a#76 could run on GPU

pxLi · 2024-07-30T00:48:28Z

As the above is the last case of CI, and it cannot be reproduced locally.
filed #11270 to track

Im going to merge this to unblock other changes, thanks

Fix degenerate conditional nested loop join detection

330b8b6

Signed-off-by: Jason Lowe <[email protected]>

jlowe added the bug Something isn't working label Jul 29, 2024

jlowe self-assigned this Jul 29, 2024

kuhushukla previously approved these changes Jul 29, 2024

View reviewed changes

revans2 previously approved these changes Jul 29, 2024

View reviewed changes

Add empty build batch support to unconditional nested loop join

03d3f4a

jlowe dismissed stale reviews from revans2 and kuhushukla via 03d3f4a July 29, 2024 18:40

revans2 reviewed Jul 29, 2024

View reviewed changes

revans2 approved these changes Jul 29, 2024

View reviewed changes

kuhushukla approved these changes Jul 29, 2024

View reviewed changes

pxLi mentioned this pull request Jul 30, 2024

[BUG] test_regexp_replace[DATAGEN_SEED=1722297411, TZ=UTC] hanging there forever in pre-merge CI intermittently #11270

Closed

pxLi merged commit 413c01e into NVIDIA:branch-24.08 Jul 30, 2024
42 of 43 checks passed

jlowe deleted the fix-join-constant-keys-db branch July 30, 2024 13:30

abellina mentioned this pull request Dec 6, 2024

Fix leak in GpuBroadcastNestedLoopJoinExecBase #11832

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix degenerate conditional nested loop join detection [databricks] #11268

Fix degenerate conditional nested loop join detection [databricks] #11268

jlowe commented Jul 29, 2024

jlowe commented Jul 29, 2024

jlowe commented Jul 29, 2024

jlowe commented Jul 29, 2024

revans2 Jul 29, 2024

jlowe Jul 29, 2024 •

edited

Loading

jlowe commented Jul 29, 2024

jlowe commented Jul 29, 2024

pxLi commented Jul 30, 2024

pxLi commented Jul 30, 2024

Fix degenerate conditional nested loop join detection [databricks] #11268

Fix degenerate conditional nested loop join detection [databricks] #11268

Conversation

jlowe commented Jul 29, 2024

jlowe commented Jul 29, 2024

jlowe commented Jul 29, 2024

jlowe commented Jul 29, 2024

revans2 Jul 29, 2024

Choose a reason for hiding this comment

jlowe Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

jlowe commented Jul 29, 2024

jlowe commented Jul 29, 2024

pxLi commented Jul 30, 2024

pxLi commented Jul 30, 2024

jlowe Jul 29, 2024 •

edited

Loading