Experimental support for BloomFilterAggregate expression in a reduction context [databricks] #8892

jlowe · 2023-07-31T22:13:50Z

Relates to #7803. Depends on #8775. Closes #8955.

Implements GPU support for BloomFilterAggregate in a reduction context. This is used by Bloom filter optimized joins which are available in Spark 3.3.0 and enabled by default in Spark 3.4.0.

Signed-off-by: Jason Lowe <[email protected]>

jlowe · 2023-08-02T20:29:49Z

build

jlowe · 2023-08-03T16:28:09Z

build

jlowe · 2023-08-03T18:40:25Z

build

jlowe · 2023-08-04T14:03:57Z

build

abellina

I missed a couple of things last time, I am mostly curious about. The last changes in the integration tests LGTM.

tests/src/test/spark330/scala/com/nvidia/spark/rapids/BloomFilterAggregateQuerySuite.scala

jlowe · 2023-08-07T14:15:49Z

build

jlowe · 2023-08-07T21:54:21Z

Converting to draft as it needs the fixes from #8944 and the need to upmerge to the new tests there.

revans2

Generally looks good.

revans2 · 2023-08-08T18:31:02Z

sql-plugin/src/main/spark330/scala/com/nvidia/spark/rapids/shims/BloomFilterShims.scala

+          (ReductionAggExprContext,
+            ContextChecks(TypeSig.BINARY, TypeSig.BINARY,
+              Seq(ParamCheck("child", TypeSig.LONG, TypeSig.LONG),
+                ParamCheck("estimatedItems", TypeSig.lit(TypeEnum.LONG), TypeSig.LONG),


nit: Technically Spark is also checking to be sure that estimatedItems and estimatedBits are literals, actually foldable and > 0 and <= the config. So we could mark those as lit too and then the docs just show that we fully support it instead of adding a comment that we do not.

jlowe · 2023-08-08T19:46:49Z

build

jlowe · 2023-08-08T23:43:25Z

build

jlowe added 7 commits July 21, 2023 15:46

Support BloomFilterMightContain expression

780156e

Signed-off-by: Jason Lowe <[email protected]>

Fix null scalar handling, add null tests

e9f4529

scalastyle fixes

24a1831

Fix overrides

0c7d711

Update to new spark-rapids-jni BloomFilter API

b7b4140

Merge branch 'branch-23.08' into might-contain

5fd7aff

Support BloomFilterAggregate expression in a reduction context

fc5e391

Signed-off-by: Jason Lowe <[email protected]>

jlowe self-assigned this Jul 31, 2023

jlowe marked this pull request as draft July 31, 2023 22:14

Merge branch 'branch-23.08' into bloom-filter-agg

26f3f5a

jlowe marked this pull request as ready for review August 2, 2023 20:29

abellina previously approved these changes Aug 2, 2023

View reviewed changes

Fix tests to skip on Databricks and check for specific classes

9202839

jlowe dismissed abellina’s stale review via 9202839 August 3, 2023 16:27

abellina reviewed Aug 4, 2023

View reviewed changes

tests/src/test/spark330/scala/com/nvidia/spark/rapids/BloomFilterAggregateQuerySuite.scala Outdated Show resolved Hide resolved

tests/src/test/spark330/scala/com/nvidia/spark/rapids/BloomFilterAggregateQuerySuite.scala Outdated Show resolved Hide resolved

abellina previously approved these changes Aug 4, 2023

View reviewed changes

Reduce test case combinations, focus most tests on CPU/GPU interop

2e5f43f

jlowe dismissed abellina’s stale review via 2e5f43f August 7, 2023 14:15

jlowe marked this pull request as draft August 7, 2023 21:53

revans2 reviewed Aug 8, 2023

View reviewed changes

sameerz added the performance A performance related task/issue label Aug 8, 2023

jlowe mentioned this pull request Aug 8, 2023

[BUG] Bloom filter join tests can fail with multiple join columns #8955

Closed

jlowe added 2 commits August 8, 2023 14:36

Merge branch 'branch-23.08' into bloom-filter-agg

25169d3

Disable Bloom filter join expressions by default

b6d9b69

jlowe changed the title ~~Support BloomFilterAggregate expression in a reduction context [databricks]~~ Experimental support for BloomFilterAggregate expression in a reduction context [databricks] Aug 8, 2023

Add literal tags for Spark type

707da45

jlowe marked this pull request as ready for review August 8, 2023 19:46

jlowe added 2 commits August 8, 2023 18:36

Fix two-column Bloom filter joins

5c35556

Add batch size limit test

b4af8ad

jlowe linked an issue Aug 8, 2023 that may be closed by this pull request

[BUG] Bloom filter join tests can fail with multiple join columns #8955

Closed

revans2 approved these changes Aug 9, 2023

View reviewed changes

jlowe merged commit c58f3c1 into NVIDIA:branch-23.08 Aug 9, 2023

jlowe deleted the bloom-filter-agg branch August 9, 2023 13:42

jlowe mentioned this pull request Aug 9, 2023

[FEA] Enable Bloom filter join acceleration by default #8965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental support for BloomFilterAggregate expression in a reduction context [databricks] #8892

Experimental support for BloomFilterAggregate expression in a reduction context [databricks] #8892

jlowe commented Jul 31, 2023 •

edited

Loading

jlowe commented Aug 2, 2023

jlowe commented Aug 3, 2023

jlowe commented Aug 3, 2023

jlowe commented Aug 4, 2023

abellina left a comment

jlowe commented Aug 7, 2023

jlowe commented Aug 7, 2023

revans2 left a comment

revans2 Aug 8, 2023

jlowe commented Aug 8, 2023

jlowe commented Aug 8, 2023

Experimental support for BloomFilterAggregate expression in a reduction context [databricks] #8892

Experimental support for BloomFilterAggregate expression in a reduction context [databricks] #8892

Conversation

jlowe commented Jul 31, 2023 • edited Loading

jlowe commented Aug 2, 2023

jlowe commented Aug 3, 2023

jlowe commented Aug 3, 2023

jlowe commented Aug 4, 2023

abellina left a comment

Choose a reason for hiding this comment

jlowe commented Aug 7, 2023

jlowe commented Aug 7, 2023

revans2 left a comment

Choose a reason for hiding this comment

revans2 Aug 8, 2023

Choose a reason for hiding this comment

jlowe commented Aug 8, 2023

jlowe commented Aug 8, 2023

jlowe commented Jul 31, 2023 •

edited

Loading