Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental support for BloomFilterAggregate expression in a reduction context [databricks] #8892

Merged
merged 15 commits into from
Aug 9, 2023

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Jul 31, 2023

Relates to #7803. Depends on #8775. Closes #8955.

Implements GPU support for BloomFilterAggregate in a reduction context. This is used by Bloom filter optimized joins which are available in Spark 3.3.0 and enabled by default in Spark 3.4.0.

@jlowe jlowe self-assigned this Jul 31, 2023
@jlowe jlowe marked this pull request as draft July 31, 2023 22:14
@jlowe jlowe marked this pull request as ready for review August 2, 2023 20:29
@jlowe
Copy link
Member Author

jlowe commented Aug 2, 2023

build

abellina
abellina previously approved these changes Aug 2, 2023
@jlowe
Copy link
Member Author

jlowe commented Aug 3, 2023

build

2 similar comments
@jlowe
Copy link
Member Author

jlowe commented Aug 3, 2023

build

@jlowe
Copy link
Member Author

jlowe commented Aug 4, 2023

build

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed a couple of things last time, I am mostly curious about. The last changes in the integration tests LGTM.

abellina
abellina previously approved these changes Aug 4, 2023
@jlowe
Copy link
Member Author

jlowe commented Aug 7, 2023

build

@jlowe jlowe marked this pull request as draft August 7, 2023 21:53
@jlowe
Copy link
Member Author

jlowe commented Aug 7, 2023

Converting to draft as it needs the fixes from #8944 and the need to upmerge to the new tests there.

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good.

(ReductionAggExprContext,
ContextChecks(TypeSig.BINARY, TypeSig.BINARY,
Seq(ParamCheck("child", TypeSig.LONG, TypeSig.LONG),
ParamCheck("estimatedItems", TypeSig.lit(TypeEnum.LONG), TypeSig.LONG),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Technically Spark is also checking to be sure that estimatedItems and estimatedBits are literals, actually foldable and > 0 and <= the config. So we could mark those as lit too and then the docs just show that we fully support it instead of adding a comment that we do not.

@sameerz sameerz added the performance A performance related task/issue label Aug 8, 2023
@jlowe jlowe changed the title Support BloomFilterAggregate expression in a reduction context [databricks] Experimental support for BloomFilterAggregate expression in a reduction context [databricks] Aug 8, 2023
@jlowe jlowe marked this pull request as ready for review August 8, 2023 19:46
@jlowe
Copy link
Member Author

jlowe commented Aug 8, 2023

build

@jlowe jlowe linked an issue Aug 8, 2023 that may be closed by this pull request
@jlowe
Copy link
Member Author

jlowe commented Aug 8, 2023

build

@jlowe jlowe merged commit c58f3c1 into NVIDIA:branch-23.08 Aug 9, 2023
@jlowe jlowe deleted the bloom-filter-agg branch August 9, 2023 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Bloom filter join tests can fail with multiple join columns [FEA] Accelerate Bloom filtered joins
4 participants