Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Revisit Enumerable.Contains command caching #4605

Closed
mikary opened this issue Feb 19, 2016 · 6 comments
Closed

Query: Revisit Enumerable.Contains command caching #4605

mikary opened this issue Feb 19, 2016 · 6 comments
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug
Milestone

Comments

@mikary
Copy link
Contributor

mikary commented Feb 19, 2016

Different SQL is generated for Enumerable.Contains depending on the contents of the enumerable. The second level (command) caching of these commands does not currently have any kind of eviction policy and may grow out of control.

Possible solutions to the issue include:

  • Introduce an eviction policy for the command cache
  • Only add commands to the cache if the same cache key is encountered more than once
  • Do not enter Enumerable.Contains commands into the cache
  • Parameterize the Enumerable (TVP) in the query
  • Support partial command construction / SQL generation
@rowanmiller
Copy link
Contributor

@mikary we should go with the "Do not enter Enumerable.Contains commands into the cache" option

@tuespetre
Copy link
Contributor

@rowanmiller please no, I haven't compared performance between EF6 and EF7 yet but using EF6 for a moderately bulky query that includes a contains expression that is never more than ten items, about ~80-90ms is spent in code-land before the query is made. When I drop down to Dapper, maybe 10ms is spent. I believe this is more due to the 'bulk' of the query (aggregation- and join-heavy) needing to be recompiled each time merely because the Contains disqualifies it from being cached.

Some thoughts:

  • Partial command construction would be nice, if the whole query save for the 'Contains' could be compiled, and the 'Contains' would be left as 'insertion points' for later compilation. This would probably be the cleanest solution for the user of the library.

  • If that is too complex/expensive to engineer, perhaps the team would consider an approach like Dapper -- that is, instead of compiling the values from the enumerable directly into the query text, the compiled query could always just accept the IEnumerable parameter and 'expand' its placeholder in the query text.

    Edit: I see that DefaultQuerySqlGenerator contains the VisitIn and VisitNotIn implementations. Is there something that could be done there to inject a placeholder instead of the literal values, so that the query can be cached, and then before executing the cached query, perform some sort of expansion?

I'm not sure what discussions have been had around this before, at least here on GitHub.

@divega
Copy link
Contributor

divega commented Mar 3, 2016

@tuespetre Keep in mind that we have two stages at which we cache the outcome of EF Core's query compilation. At a very high level:

  1. The input query expression is first compiled into a "query plan" from which SQL can be easily generated. The first level of the query cache can match incoming query expressions against existing query plans.
  2. Multiple SQL strings can be generated for the same query plan and different combinations of parameters (e.g. we usually generate different SQL for parameter usage depending on whether the value of a parameter is null). The second level of query cache can match combinations of parameter values against already generated SQL strings.

The proposal described above is to force query plans for queries that use Enumerable.Contains not to remember the SQL they generated to avoid preserving in memory lots of variations of the SQL that have very little chances of being reused.

Note that although in principle performance wise this approach might not be as good as generating SQL with a placeholder, we expect it to be comparable because a lot of the heavy lifting in query compilation actually happens in the generation of the query plan (which is still going to be cached) as opposed to during the SQL generation.

@tuespetre
Copy link
Contributor

@divega thanks for the explanation! Is that different than what EF6 had done then?

@divega
Copy link
Contributor

divega commented Mar 3, 2016

Yes, it is different in that EF6 only had one level of query caching and so queries that used Enumerate.Contains() would always have to be recompiled from scratch. In EF Core with this proposal only the SQL generation will happen every time, but SQL generation isn't necessarily the most expensive part of compiling the LINQ query into SQL.

@tuespetre
Copy link
Contributor

Ok, well that is awesome and I have nothing else to says besides "good job team"! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug
Projects
None yet
Development

No branches or pull requests

5 participants