-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize filters to remove redundant IsNotNull checks #938
Comments
The |
I tested a prototype of optimizing this filter and saw a 7% improvement in filter time for this query. It seems worth implementing. |
This might work ok for tpc-h but tpc-ds data has nulls and the null check is required perhaps? |
Yes, the TPC-H data in this case is known not to contain nulls, as shown in the Parquet schema below, so the
|
What is the problem the feature request solves?
I am comparing native query plans between Comet and Ballista for TPC-H q1 and noticed a significant difference between the filter expressions
and performance:Comet (
total filter time 7.2 seconds):Ballista (
total filter time 3.3 seconds):The differences are:
We can likely improve Comet performance by eliding the redundant IsNotNull and And. I am not sure if there is a difference with the date versus int literal, but we should check.
Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: