Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look into IS DISTINCT FROM as an alternative to our current null compensation scheme #29624

Open
roji opened this issue Nov 19, 2022 · 10 comments · May be fixed by #34048
Open

Look into IS DISTINCT FROM as an alternative to our current null compensation scheme #29624

roji opened this issue Nov 19, 2022 · 10 comments · May be fixed by #34048

Comments

@roji
Copy link
Member

roji commented Nov 19, 2022

SQL Server 2022 has support for IS [NOT] DISTINCT FROM (docs); this is an equality check that treats two nulls as equal (similar to how DISTINCT works in SQL queries).

PostgreSQL has had this for a long time (docs), but last time I checked it didn't utilize indexes. If the SQL Server version does, then we can use this as a terser alternative to the equality operator, which doesn't require null compensation.

@roji
Copy link
Member Author

roji commented Dec 5, 2023

Note that especially for complex/expensive expressions, this has the potential to eliminate the duplicate inherent in null compensation, i.e. instead of WHERE complex1 = complex2 OR complex1 IS NULL AND complex2 IS NULL (where complex1 and complex2 are evaluated several times), you get WHERE complex1 IS NOT DISTINCT FROM complex2.

@roji roji added the area-perf label Dec 5, 2023
@clement911
Copy link
Contributor

Hi @roji I was watching your video on 2nd level caching and my understand is that it is only exists because different SQL needs to be generated depending on whether a parameter is NULL or not.

If the sql server provider can leverage IS DISTINCT FROM, the generated sql would be the same regardless. Does it mean that the sql server provider could then bypass the 2nd level cache?

@roji
Copy link
Member Author

roji commented Feb 29, 2024

@clement911 unfortunately not - equality is just one of many SQL expression types for which we ned to perform nullability-related rewriting. Take a look at SqlNullabilityProcessor to get an idea of the type of things this involves.

IS DISTINCT FROM can still potentially cut down a lot of null compensation, which may significantly improve performance. But note also that we need to carefully check how it performs (especially with regards to index usage) compared to normal equality.

@ranma42
Copy link
Contributor

ranma42 commented May 27, 2024

Modern versions of SQLite have the standard SQL syntax (i.e. IS [NOT] DISTINCT FROM) in addition to the IS operator that is also available on older ones.
According to the documentation, IS constraints can be used for indexing.

MySQL has the <=> operator. I have not been able to find documentation on whether it makes good use of indexes.

Related: #10514

@roji
Copy link
Member Author

roji commented May 28, 2024

@ranma42 I may be missing it, but I can't see mention of index usage specifically for IS DISTINCT FROM in the SQLite docs.

I definitely believe IS DISTINCT FROM is a promising improvement, and we can add support for it in relational and allow specific providers to opt into it where that makes sense. But careful investigation needs to happen for each databases to ensure whether indexes are used etc.

@ranma42
Copy link
Contributor

ranma42 commented May 28, 2024

@ranma42 I may be missing it, but I can't see mention of index usage specifically for IS DISTINCT FROM in the SQLite docs.

I think indexes are not used for IS DISTINCT FROM, just like they are not used for <>.
Conversely, IS is explicitly mentioned as supported:

To be usable by an index a term must usually be of one of the following forms:
column = expression
column IS expression
column > expression
column >= expression
column < expression
column <= expression
expression = column
expression IS column
[...]

also the docs states that

The IS NOT DISTINCT FROM operator is an alternative spelling for the IS operator.

which makes me believe they are treated the same way internally.
I would still recomend emitting the IS syntax, as it supports a wider range of versions:

  • IS NOT DISTINCT FROM was introduced in 3.39.0
  • the IS operator drives indexes since 3.8.11

I definitely believe IS DISTINCT FROM is a promising improvement, and we can add support for it in relational and allow specific providers to opt into it where that makes sense. But careful investigation needs to happen for each databases to ensure whether indexes are used etc.

Yes, a "strict equality" is definitely an interesting option; its support (both in term of syntax and performance) seems to be very different across providers, so that is something each provider might have to decide how to handle (and in some cases different choices even within the same provider, as SQL Server < 2022 does not seem to support IS DISTINCT FROM).

Another option worth investigating is CTE, which would be more general, but I am afraid it would also involve more extensive changes.

@ranma42
Copy link
Contributor

ranma42 commented Jun 21, 2024

https://github.com/ranma42/efcore/tree/is-not-distinct-from is a WIP branch that explores what happens to the test suite baselines when using IS to RewriteNullSemantics of [Not]Equal. Some care is already taken to avoid regressing good rewrites, but other parts mentioned above are still missing. The main issue (I think) is that it is hardcoded to always be active; instead it should be:

  • opt-in (only providers that support it and are aware of the performance implications activate it)
  • in sqlserver / sqlite, constrained by the version
  • maybe even activated with complex policies (for example: use IS NOT DISTINCT FROM on complex expressions, replace with SQL = and null comparison on simple columns/parameters)

@roji
Copy link
Member Author

roji commented Jun 21, 2024

the IS operator drives indexes since 3.8.11

I missed this original comment - this indeed seems to mean that we should seriously consider switching to IS in SQLite instead of compensating with additional IS NULL checks... For other databases, do you happen to know for sure whether SQL Server uses indexes for IS NOT DISTINCT FROM? I've never actually checked this. Similarly, for PG I checked a very long time ago, but things may very well have changed (see npgsql/efcore.pg#28 from 2016...).

One additional thought is that we should consider the same change for the negated case (i.e. translate <> to IS DISTINCT FROM). This seems more straightforward, as <> generally doesn't uses indexes as you mentioned (but we should still confirm this across databases).

@ranma42 can you please submit your branch as a draft PR rather than as a branch? This way we can have a conversation around it more easily (it's totally fine if it's still an experiment etc.).

@ErikEJ
Copy link
Contributor

ErikEJ commented Jun 21, 2024

Looks like IS DISTINCT FROM enables index seeks: https://sqlperformance.com/2022/08/sql-server-2022/additional-t-sql-improvements-in-sql-server-2022

@ranma42 ranma42 linked a pull request Jun 21, 2024 that will close this issue
@roji roji changed the title Look into SQL Server 2022 IS DISTINCT FROM Look into IS DISTINCT FROM as an alternative to our current null compensation scheme Jun 21, 2024
@roji
Copy link
Member Author

roji commented Jun 21, 2024

Looks like IS DISTINCT FROM enables index seeks [...]

Perfect, thanks @ErikEJ! So it looks like IS DISTINCT FROM should be a good general choice replacement to compensation on SQL Server 2022! And as @ranma42 mentioned in a recent conversation, compared to the current compensation it also has the advantage of not duplicating the expressions (which may be expensive scalar subqueries etc.). I'll reach out internally just to confirm that there aren't any gotchas, but otherwise I think we can definitely consider doing this. Maybe, as this is quite a fundamental query change, we'd want to preemptively include some sort of opt-out, just in case there's trouble.

FYI I've opened npgsql/efcore.pg#3206 to track this on the EFCore.PG side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants