Improve performance of some archiving queries #18797
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
We had an issue where a custom reports query took longer than 45 minutes to execute every single time and then archiving gets aborted.
Noticed there wasn't crazy much data on the instance (log_visit 5.5M, log_link_visit_action 15M, log_action 300K). And the query usually without a segment runs very fast. With a segment applied in the query it runs fast as well. Once we join it with the temporary table then it is VERY slow and aborts after 45 minutes. After adding the primary key, then the query executed fast (in one minute).
I don't know why I didn't add the primary key initially. I assume I was thinking it shouldn't be needed as the index would contain same information as the table and when it's not needed then it's faster to not do it. However, seeing in that case now that it's clearly faster to have that index. I haven't tested any other query, but would assume some other archive queries would get faster as well for segments.
Queries I used to reproduce
Explain when the primary key is there (query fast)
Explain when the primary key is not there (query slow)
notice the
56497021392
rowsPerformance change for insert query
The
insert into logtmpsegment010ccc0a9e16636173df1de1bfc6a263
had no big performance difference whether index was there or not for this data amount. It was inserting only 160K visits though. The performance difference was around 15%. Without index the query took usually around 1.1s vs with index around 1.3seconds.This is probably why we didn't add it initially to save this 15% of time.
On a different table where 450K visits were inserted it slowed it weirdly always took 3.1 seconds no mater with index or without. On a result set of 2M visits (rows) it slowed it down from 9.3 seconds to 9.8 seconds.
I executed the same queries every time many times to get an average.
Generally, with segments the rows are often smaller since the segment filters out many visits so I don't think this is any real issue.
Review