-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit case duplicate merging comparison based on creation date and archived status [3] #11465
Comments
I have tried the effect of my proposed changes by simply adjusting the SQL accordingly and analyzing the result. Before the changes it took 15 seconds to execute the query: https://explain.dalibo.com/plan/2c1f66374421cc85 With the changes it is way slower and takes 58 seconds: https://explain.dalibo.com/plan/e5659984276g047d The reason seems to be that the original query is executing using materialize: While the new query just uses index scans that are 150x underestimated: Not sure how to influence the query planer to get better results. |
Findings with @stefanspiska Increasing the limit to 200 lead to postgres deciding that using materialize is a better option taking 34 seconds (so still worse): Forcing postgres to order the persons joined for the comparison cases by id leads to a merge join in combination with materialize and is a lot faster - 8 seconds: The same can be achieved with a JOIN LATERAL. I have updated the implementation details accordingly. |
please check the query plans for the initial situation and the situation in which uses a subquery in the second person join. The queries were run on a local performance db on 631k cases. Except for the join subquery, everything else is identical.
Due to the above test results, I suggest skipping implementing the join subquery. |
…te and archived status - fix tests
…te and archived status - changes after review
…te and archived status - changes after review
…ase_duplicate_merging_on_creation_and_archive #11465 - Limit case duplicate merging comparison based on creation da…
…te and archived status - removed disease index
…ase_duplicate_merging_on_creation_and_archive #11465 - Limit case duplicate merging comparison based on creation da…
…te and archived status - alignement fix
…ase_duplicate_merging_fix_alignement #11465 - Limit case duplicate merging comparison based on creation da…
Validated on test-de with the version: 1.82.0-SNAPSHOT (2864773) |
One of the changes that improved the performance of the case duplicate merging query was the removal of the disease index from "cases" table. This change can also have an impact on other queries throughout Sormas. Below you can see an evaluation of the query performance in different situations and different relevant users. ### Test impact of disease index - performance db (seconds)
|
Problem Description
In #9054 we have improved the performance of the case duplicate merging query. This doesn't mean that the duplicate detection is fast in any case. The whole process still executes a lot of comparison logic. In the example the query had to compare 372 of the 1500 cases with each of the ~85k cases, resulting in 31 mio. comparisons. You can easily see where this grows when any of both sides grows.
Proposed Change
Decrease the comparison amount by doing the following:
Acceptance Criteria
Implementation Details
The overall resulting query should look like this (when called with the admin), with the modified sections being highlighted. Note that this is the variant when only not-archived cases are querried.
Additional Information
The text was updated successfully, but these errors were encountered: