Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix degenerate conditional nested loop join detection [databricks] #11268

Merged
merged 2 commits into from
Jul 30, 2024

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Jul 29, 2024

Fixes #11266. The failure on Databricks is because the aggregate was pushed through the join, which resulted in a non-empty output from the join. The fix from #11244 was flawed in that it detected unconditional joins if the condition was true and the output is empty (i.e.: a row-count-only join), but this last condition isn't necessary. Nested loop joins are unconditional joins if the join condition is always true regardless of the outputs being produced by the join.

@jlowe jlowe added the bug Something isn't working label Jul 29, 2024
@jlowe jlowe self-assigned this Jul 29, 2024
@jlowe
Copy link
Member Author

jlowe commented Jul 29, 2024

build

kuhushukla
kuhushukla previously approved these changes Jul 29, 2024
revans2
revans2 previously approved these changes Jul 29, 2024
@jlowe jlowe dismissed stale reviews from revans2 and kuhushukla via 03d3f4a July 29, 2024 18:40
@jlowe
Copy link
Member Author

jlowe commented Jul 29, 2024

CI failed in test_right_broadcast_nested_loop_join_without_condition_empty, which exposed that we were not properly handling empty build-side batches in unconditional nested outer loop joins. Previously we hacked around it by adding an always-true condition, but this adds support for it and removes the hack.

@jlowe
Copy link
Member Author

jlowe commented Jul 29, 2024

build

joinTime = joinTime)

localJoinType match {
case LeftOuter if spillableBuiltBatch.numRows == 0 =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to worry about a full outer join?

Copy link
Member Author

@jlowe jlowe Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future yes, but we do not support FullOuter joins for broadcasted nested loop joins. Support for that is tracked by #3269.

@jlowe
Copy link
Member Author

jlowe commented Jul 29, 2024

CI failure appears to be related to rapidsai/cudf#16426.

@jlowe
Copy link
Member Author

jlowe commented Jul 29, 2024

build

@pxLi
Copy link
Collaborator

pxLi commented Jul 30, 2024

some regex cases (https://github.com/NVIDIA/spark-rapids/blob/branch-24.08/jenkins/spark-premerge-build.sh#L223-L224) seems to be hanging there forever in scala213 CI, like

test_regexp_replace[DATAGEN_SEED=1722297411, TZ=UTC]

executor stop producing any further logs,

[INFO] 2024-07-29 23:57:06,351 org.sparkproject.jetty.util.log initialized - Logging initialized @13791ms to org.sparkproject.jetty.util.log.Slf4jLog
2024-07-29 23:57:11 INFO     Running test 'src/main/python/regexp_test.py::test_split_regexp_disabled_fallback[DATAGEN_SEED=1722297411, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(ProjectExec,StringSplit)]'
[WARN] 2024-07-29 23:57:17,340 com.nvidia.spark.rapids.GpuOverrides logWarning - 
!Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
  @Expression <Alias> split(a#16, [:], 2) AS split(a, [:], 2)#18 could run on GPU
    !Expression <StringSplit> split(a#16, [:], 2) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [:] could run on GPU
      @Expression <Literal> 2 could run on GPU
  @Expression <Alias> split(a#16, [o:], 5) AS split(a, [o:], 5)#19 could run on GPU
    !Expression <StringSplit> split(a#16, [o:], 5) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [o:] could run on GPU
      @Expression <Literal> 5 could run on GPU
  @Expression <Alias> split(a#16, [^:], 2) AS split(a, [^:], 2)#20 could run on GPU
    !Expression <StringSplit> split(a#16, [^:], 2) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [^:] could run on GPU
      @Expression <Literal> 2 could run on GPU
  @Expression <Alias> split(a#16, [^o], 55) AS split(a, [^o], 55)#21 could run on GPU
    !Expression <StringSplit> split(a#16, [^o], 55) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [^o] could run on GPU
      @Expression <Literal> 55 could run on GPU
  @Expression <Alias> split(a#16, [o]{1,2}, 999) AS split(a, [o]{1,2}, 999)#22 could run on GPU
    !Expression <StringSplit> split(a#16, [o]{1,2}, 999) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [o]{1,2} could run on GPU
      @Expression <Literal> 999 could run on GPU
  @Expression <Alias> split(a#16, [bf], 2) AS split(a, [bf], 2)#23 could run on GPU
    !Expression <StringSplit> split(a#16, [bf], 2) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [bf] could run on GPU
      @Expression <Literal> 2 could run on GPU
  @Expression <Alias> split(a#16, [o], 5) AS split(a, [o], 5)#24 could run on GPU
    !Expression <StringSplit> split(a#16, [o], 5) cannot run on GPU because regular expression support is disabled. Set spark.rapids.sql.regexp.enabled=true to enable it
      @Expression <AttributeReference> a#16 could run on GPU
      @Expression <Literal> [o] could run on GPU
      @Expression <Literal> 5 could run on GPU
  ! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec
    @Expression <AttributeReference> a#16 could run on GPU

2024-07-29 23:57:17 INFO     Running test 'src/main/python/regexp_test.py::test_split_escaped_chars_in_character_class[DATAGEN_SEED=1722297411, TZ=UTC]'
[WARN] 2024-07-29 23:57:18,967 com.nvidia.spark.rapids.GpuOverrides logWarning - 
  ! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec
    @Expression <AttributeReference> a#48 could run on GPU

2024-07-29 23:57:19 INFO     Running test 'src/main/python/regexp_test.py::test_regexp_replace[DATAGEN_SEED=1722297411, TZ=UTC]'
[WARN] 2024-07-29 23:57:20,955 com.nvidia.spark.rapids.GpuOverrides logWarning - 
  ! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec
    @Expression <AttributeReference> a#76 could run on GPU

@pxLi
Copy link
Collaborator

pxLi commented Jul 30, 2024

As the above is the last case of CI, and it cannot be reproduced locally.
filed #11270 to track

Im going to merge this to unblock other changes, thanks

@pxLi pxLi merged commit 413c01e into NVIDIA:branch-24.08 Jul 30, 2024
42 of 43 checks passed
@jlowe jlowe deleted the fix-join-constant-keys-db branch July 30, 2024 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] test_broadcast_hash_join_constant_keys failed in databricks runtimes
4 participants