Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve subresources DB query performance by removing a subquery #3396

Merged

Conversation

clemherreman
Copy link
Contributor

@clemherreman clemherreman commented Feb 18, 2020

Q A
Bug fix? yes (kinda)
New feature? no
BC breaks? no
Deprecations? no
Tickets none
License MIT
Doc PR N/A

Abstract

This PR try to change DQL queries generated to fetch subresource, to improve performance on large tables.

Before

The generated SQL to query a subresource uses a WHERE ... IN ( <subquery> ), in order to be recursive when subresources also have subresources (introduced by #1608).

The subquery, on large table (at work 4M+ lines), takes a significant amount of time (2-5s), even with the proper SQL indexes (at least on MySQL).

SELECT
    DISTINCT c0_.idpurchase AS idpurchase_0,
    c0_.idpurchase AS idpurchase_1
FROM purchase c0_
LEFT JOIN customer c1_ ON c0_.idcustomer = c1_.idcustomer
-- Some other LEFT JOINs
WHERE c0_.idcustomer IN 
(  -- here please notice the usage of a subquery
    SELECT c16_.idcustomer
    FROM customer c16_
    WHERE c16_.idcustomer = :identifier
)

After

This PRs tries to optimize for the common case, having only one level of subresources, by replacing the subquery via a direct WHERE <identifier> = <identifier-alias> only on the last recursion level. This means it shouldn't break subresources that have subresources, while still improving the query for the nominal cases.

SELECT
    DISTINCT c0_.idpurchase AS idpurchase_0,
    c0_.idpurchase AS idpurchase_1
FROM purchase c0_
LEFT JOIN customer c1_ ON c0_.idcustomer = c1_.idcustomer
-- Some other LEFT JOINs
WHERE c0_.idcustomer = :identifier

This significantly reduces the SQL query time, going from 2.45s to 15ms on my e-commerce app.

BC & missing things.

I broke the unit tests, mainly the mocking, but also some tests fails as they compare the previously generated query, with the new one (which makes sense). All the Behat tests are green locally though.

I feel out of my depth here, and if someone would help me writing those, ensuring no BC is done, is would be great.

On my side, I run our rather large behat test suite against my fork, and not tests complained about wrong results, so I am fairly confident I haven't introduced some BC.

TODO

  • Fix Behat tests
  • Fix PHPUnit tests

@soyuka
Copy link
Member

soyuka commented Feb 18, 2020

All the Behat tests are green locally though.

That's good, I've covered the edge cases of this in the behat test suite. The queries you show give the same results, therefore no BC break will outcome of this change. You can change the broken phpunit tests by fixing the relevant queries.
I like the change 👍

@clemherreman clemherreman changed the title Code Pull requests 0 Actions Projects 0 Wiki Security Insights Settings Improve subresources DB query performance by removing a subquery Improve subresources DB query performance by removing a subquery Feb 18, 2020
@teohhanhui
Copy link
Contributor

Thanks for doing this! 🎉

@clemherreman clemherreman force-pushed the improve-sql-queries-of-subresources branch from 1e56eaf to d6e3f49 Compare February 19, 2020 10:53
…HERE <identifier = ..." to improve performance
@clemherreman clemherreman force-pushed the improve-sql-queries-of-subresources branch from d6e3f49 to df4ad61 Compare February 19, 2020 13:14
@clemherreman
Copy link
Contributor Author

clemherreman commented Feb 19, 2020

Fixed the PHPUnit tests, and rebased against latest 2.5.

If the comment I added after @dkarlovi feedback is enough, I'd say this is ready to be merged.

Note: Behat tests are still failing, but because of either ccode coverage generation failing (?) or call to Dummy::hasRole() failing in the serialization. Both errors seems unrelated to that pull request, isn't it?

@clemherreman clemherreman force-pushed the improve-sql-queries-of-subresources branch from 5503a74 to fa7b217 Compare February 19, 2020 14:53
@teohhanhui
Copy link
Contributor

Behat tests are still failing, but because of either ccode coverage generation failing (?) or call to Dummy::hasRole() failing in the serialization. Both errors seems unrelated to that pull request, isn't it?

Yeah, those are unrelated. Please ignore those failures. 😄

Copy link
Contributor

@teohhanhui teohhanhui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some tests for this? (How?)

Or are the existing tests sufficient?

@clemherreman
Copy link
Contributor Author

@teohhanhui Behat tests already covers the correctness of the query results (thanks to @alanpoulain)

Is there a point in checking the generated DQL itself? I doubt so. This is a perf improvement on the query level.

clemherreman and others added 2 commits February 19, 2020 18:29
Co-Authored-By: Alan Poulain <[email protected]>
Co-Authored-By: Teoh Han Hui <[email protected]>
@teohhanhui teohhanhui merged commit 083ad62 into api-platform:2.5 Feb 19, 2020
@teohhanhui
Copy link
Contributor

Thanks @clemherreman! 🎉

@clemherreman clemherreman deleted the improve-sql-queries-of-subresources branch February 19, 2020 20:11
@mxmp210
Copy link

mxmp210 commented Jun 5, 2020

Please have a look at #1542, As it breaks custom identifiers, feel free to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants