Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Query Cache in Doctrine ORM Causing Cache Memory Overflow #11112

Closed
alebedev80 opened this issue Dec 10, 2023 · 26 comments
Closed

Issue with Query Cache in Doctrine ORM Causing Cache Memory Overflow #11112

alebedev80 opened this issue Dec 10, 2023 · 26 comments

Comments

@alebedev80
Copy link

alebedev80 commented Dec 10, 2023

Bug Report

Q A
BC Break no
Version 2.17.1

Summary

I am experiencing an issue with Doctrine ORM's query caching mechanism, leading to Redis memory overflow. This is primarily due to the way 'limit' and 'offset' values are handled in cached queries.

Current Behavior

In my production environment, Doctrine ORM caches each query including 'limit' and 'offset' as literal values in the SQL statement, rather than as parameters. As a result, every paginated request results in a new cache entry, with the only difference being the varying 'limit' and 'offset' values. This leads to an excessive number of cache entries in Redis.

How to Reproduce

  • Configure Doctrine ORM with query caching, using Redis as the cache storage.
  • Run a query that includes 'limit' and 'offset' clauses.
  • Increment the 'offset' value (simulate pagination) and re-execute the query.
  • Observe the creation of multiple cache entries in Redis for each query variation.

See test code here

Expected Behavior

The expected behavior is for Doctrine ORM to treat 'limit' and 'offset' as parameters rather than literals in the query. This approach would allow for a single cache entry to be applicable to multiple paginated requests of the same query, thereby reducing the number of cache entries and preventing cache memory overflow.

Additional Information
Redis version: 7
PHP version: 8.2
DB: MySQL 8
Environment: Production
Addressing this issue is critical for optimizing query caching efficiency in Doctrine ORM, especially for applications that heavily rely on pagination.

@alebedev80
Copy link
Author

Hey somebody!

@greg0ire
Copy link
Member

Can you detail the "how to reproduce" section with code?

@alebedev80
Copy link
Author

alebedev80 commented Dec 20, 2023

@greg0ire
Copy link
Member

Looked into it thanks to your reproducer, and this is where the ORM does the hardcoding (actually, you can see it delegates that job to DBAL):

$sql = $this->platform->modifyLimitQuery($sql, $limit, $offset);

Now we need to figure out why it's done like this.

@greg0ire
Copy link
Member

Maybe it was just not possible to use parameter binding at the time, at least for some platforms: doctrine/dbal#3459 (comment)

@alebedev80
Copy link
Author

@beberlei please help

@greg0ire
Copy link
Member

greg0ire commented Dec 20, 2023

I think our SECURITY.md should mention https://www.doctrine-project.org/policies/security.html

I'm AFK and will be for a week, so can't help much here. The next step would be to look into implementing a DBAL API that allows to use placeholders with limit and offset, and then leverage it in ORM.

@alebedev80
Copy link
Author

I think our SECURITY.md should mention https://www.doctrine-project.org/policies/security.html

Ooops!

"If you think that you have found a security issue in Doctrine, please don't use the issue tracker in GitHub and don't publish it publicly. Instead, all security issues must be sent to..."

The next step would be to look into implementing a DBAL API that allows to use placeholders with limit and offset, and then leverage it in ORM.

So what i should to do? Create issue in doctrine/dbal?

@greg0ire
Copy link
Member

Yeah, creating an issue in the DBAL seems good. And let me delete some of the last messages

@doctrine doctrine deleted a comment from alebedev80 Dec 21, 2023
@doctrine doctrine deleted a comment from alebedev80 Dec 21, 2023
@doctrine doctrine deleted a comment from alebedev80 Dec 21, 2023
@beberlei
Copy link
Member

Its confusing, the firstResult+maxResults are in Query::getQueryCacheId, but no clue why, they are not relevant I believe. The last change on that line was just coding styles. https://github.com/doctrine/orm/blame/2.17.x/lib/Doctrine/ORM/Query.php#L819

Could you paste the output of your test.php to the README.md of that repo? then i dont have to execute this myself.

@greg0ire
Copy link
Member

greg0ire commented Dec 31, 2023

@beberlei it's like that since 04832e2, prompted by #1859

@alebedev80
Copy link
Author

@beberlei please review README.md

@alebedev80
Copy link
Author

@beberlei some news?

@mpdude
Copy link
Contributor

mpdude commented Jan 19, 2024

One challenge I see is that \Doctrine\ORM\Query\SqlWalker::walkSelectStatement returns a string, and thus we cannot make it use parameters... who would know about these parameters, who would bind their values? The parameter names and/or values would need to be returned alongside the SQL string and be dealt with by the calling code.

Maybe one way out would be to add new methods that allow to set parameter names that shall be used in LIMIT and OFFSET clauses, and then have the same in DBAL to create the platform-specific SQL accordingly?

When choosing parameter names in an automated fashion, we need to take care that they are unique in case people do subqueries/subselects, to avoid parameter name collisions. At the same time, the names must not be random, otherwise they'd mess up the query cache as well. So, like predictable, unique parameter names used in queries that may not fully be constructed at the time the name is needed?

@mpdude
Copy link
Contributor

mpdude commented Jan 19, 2024

Regarding the question why getQueryCacheId puts the offset/limit into the cache key... that's because the cache key/id is used to bypass query -> SQL compilation. When the resulting SQL contains the offset/limit as literals, we need to consider them in the lookup key.

@mpdude
Copy link
Contributor

mpdude commented Jan 19, 2024

As I've said in the DBAL issue I don't believe this can easily be fixed or even prepared in DBAL, and then we'd also need similar changes here in ORM.

So I'd like to suggest another route:

Would it be possible to put (through a dedicated, additional interface?) the output walker into a dedicated "mode" where it uses a (random) placeholder string instead of the SQL expression for limit/offset handling? Then keep the placeholder name together with the unfinished SQL statement in the DQL->SQL cache, and fill in platform-specific limit/offset expressions at a later time, after the parserResult has been taken from the cache?

Cache key computation would probably need to know if this is going to work and only conditionally include offset/limit.

Things like \Doctrine\ORM\Query::getSQL need to be kept in mind, of course those need to return the full query as previously.

Update: Due to the way \Doctrine\DBAL\Platforms\AbstractPlatform::doModifyLimitQuery works (possibly constructing new SQL queries with the old query being a subquery), it cannot be a placeholder, but rather must be a full, deferred call to that method; and that means also other things (like the lock mode specific SQL) might need to be called after it.

@mpdude
Copy link
Contributor

mpdude commented Jan 19, 2024

What cache are we talking about, the query (DQL->SQL) cache or the result cache?

@alebedev80
Copy link
Author

alebedev80 commented Jan 19, 2024

What cache are we talking about, the query (DQL->SQL) cache or the result cache?

@mpdude the query (DQL->SQL) cache

@beberlei
Copy link
Member

The query cache does not need this theoretically, because we can call modifiyLimit... on the resulting sql coming from the cache.

This should not even be a problem with paginator, because that has query hints that change the cache id

@mpdude
Copy link
Contributor

mpdude commented Jan 20, 2024

Do we consider the ParserResult class or the AbstractSqlExecutor and subclasses as part of our public API?

The SqlWalker for sure is, it is subclassed in a documentation example.

mpdude added a commit to mpdude/doctrine2 that referenced this issue Jan 27, 2024
mpdude added a commit to mpdude/doctrine2 that referenced this issue Jan 27, 2024
mpdude added a commit to mpdude/doctrine2 that referenced this issue Jan 28, 2024
mpdude added a commit to mpdude/doctrine2 that referenced this issue Jan 29, 2024
@alebedev80
Copy link
Author

Hey! how is it going?

@mpdude
Copy link
Contributor

mpdude commented Mar 22, 2024

I need to find spare time to think through Benjamin‘s review 😉

@alebedev80
Copy link
Author

@mpdude so you are only who is an able to fix it?

@alebedev80
Copy link
Author

Half year passed. do you plan to fix it?

@mpdude
Copy link
Contributor

mpdude commented May 14, 2024

I‘d love to fix it since I already put much effort into it and I think it’s feasible. Also I still have this item on my personal to-do list.

However, unless being paid for it, I’d have to do this in my spare free time, which is rather limited at the moment 😔. I’ll see what I can do, but no promises given.

beberlei pushed a commit that referenced this issue Oct 10, 2024
…1188)

* Add a test covering the #11112 issue

* Add new OutputWalker and SqlFinalizer interfaces

* Add a SingleSelectSqlFinalizer that can take care of adding offset/limit as well as locking mode statements to a given SQL query.

Add a FinalizedSelectExecutor that executes given, finalized SQL statements.

* In SqlWalker, split SQL query generation into the two parts that shall happen before and after the finalization phase.

Move the part that generates "pre-finalization" SQL into a dedicated method. Use a side channel in SingleSelectSqlFinalizer to access the "finalization" logic and avoid duplication.

* Fix CS violations

* Skip the GH11112 test while applying refactorings

* Avoid a Psalm complaint due to invalid (?) docblock syntax

* Establish alternate code path - queries can obtain the sql executor through the finalizer, parser knows about output walkers yielding finalizers

* Remove a possibly premature comment

* Re-enable the #11112 test

* Fix CS

* Make RootTypeWalker inherit from SqlOutputWalker so it becomes finalizer-aware

* Update QueryCacheTest, since first/max results no longer need extra cache entries

* Fix ParserResultSerializationTest by forcing the parser to produce a ParserResult of the old kind (with the executor already constructed)

* Fix WhereInWalkerTest

* Update lib/Doctrine/ORM/Query/Exec/PreparedExecutorFinalizer.php

Co-authored-by: Grégoire Paris <[email protected]>

* Fix tests

* Fix a Psalm complaint

* Fix a test

* Fix CS

* Make the NullSqlWalker an instance of SqlOutputWalker

* Avoid multiple cache entries caused by LimitSubqueryOutputWalker

* Fix Psalm complaints

* Fix static analysis complaints

* Remove experimental code that I committed accidentally

* Remove unnecessary baseline entry

* Make AddUnknownQueryComponentWalker subclass SqlOutputWalker

That way, we have no remaining classes in the codebase subclassing SqlWalker but not SqlOutputWalker

* Use more expressive exception classes

* Add a deprecation message

* Move SqlExecutor creation to ParserResult, to minimize public methods available on it

* Avoid keeping the SqlExecutor in the Query, since it must be generated just in time (e. g. in case Query parameters change)

* Address PHPStan complaints

* Fix tests

* Small refactorings

* Add an upgrade notice

* Small refactorings

* Update the Psalm baseline

* Add a missing namespace import

* Update Psalm baseline

* Fix CS

* Fix Psalm baseline

---------

Co-authored-by: Grégoire Paris <[email protected]>
@mpdude
Copy link
Contributor

mpdude commented Oct 10, 2024

#11188 has just been merged into 2.20.x, and I think it fixes this issue here.

@mpdude mpdude closed this as completed Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants