-
-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dev/core#2575 remove / improve indexes #20245
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(Standard links)
|
seamuslee001
reviewed
May 7, 2021
'UI_id' => [ | ||
'name' => 'UI_id', | ||
'field' => [ | ||
0 => 'id', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UM what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - so that is the duplicate index - duplicates the primary key
This removes a couple of indexes on the id field as they duplicate the primary key as pointed out in https://lab.civicrm.org/dev/core/-/issues/2575 In addition a couple of other indexes are removed as duplicates but the originals are fixed to be more meaningful. It's important that in a combined key the field with the lowest cardinality is the first as the combined key is searched for a complete match so on something like line_item we see the key is civicrm_contribution1 civicrm_contribution10 civicrm_contribution1111 civicrm_membership1 In this case there are only 3 possible variants for the first chunk of characters so we get a much more effective key by reversing them. (arguably the entity_table actually adds no value due to it's low cardinality). The original post https://civicrm.stackexchange.com/questions/39409/redundant-indexes also suggests that is_deleted_sort_name_id duplicates the id index That is not the case - this query would only ever be used if is_deleted is a filter. If it IS used then the id index would NOT be used as only one index is used per table. However, note that the id part of the filter would only be used if is_deleted and sort_name were too - so the id part is of little value as the last parameter & it's likely mysql would chose to use the primary key instead. However, I would argue the index as a whole is of little value. The cardinality of is_deleted is super low and so preceding sort_name with it is probably inferior to just leaving the sort_name index to be used. However, this would need testing on a site with enough data to get results. On the queue index - it's hard to imagine that having more than the queue name in the index adds any value at all. However, this is also such a small table that probably the index has almost no size and the query probably takes almost no time to run so the time we would spend testing index changes on this table we would likely never get back. Note changing these indexes will only affect new sites - or existing sites if they run the System.updateIndexes api. We do have a ticket open https://lab.civicrm.org/dev/core/-/issues/2279 with a proposal to allow sites to define their own indexes which would work in conjunction with the above api call (since indexes are not 1 size fits all - for example we have removed indexes on things like contribution_status_id due to low cardinality). We don't pro-actively run index updates on upgrade due to not wanting to alter custom index config until 2279 is resolved and also because some old DBs hit issues that we never diagnosed (and stopped trying to once we stopped pro-actively notifying them about them)
All these look good to me MOP |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
dev/core#2575 remove / improve indexes
This removes a couple of indexes on the id field as they duplicate the primary key
as pointed out in https://lab.civicrm.org/dev/core/-/issues/2575
In addition a couple of other indexes are removed as duplicates but the
originals are fixed to be more meaningful.
Before
Indexes on new installs (or installs that run the UpdateIndices api)
After
Indexes on new installs (or installs that run the UpdateIndices api)
Technical Details
It's important that in a combined
key the field with the lowest cardinality is the first as the combined key is
searched for a complete match so on something like line_item we see the
key is
civicrm_contribution1
civicrm_contribution10
civicrm_contribution1111
civicrm_membership1
In this case there are only 3 possible variants for the first chunk of
characters so we get a much more effective key by reversing them.
(arguably the entity_table actually adds no value due to it's low
cardinality).
This PR picks off the index changes that I think could be done without any performance
testing as they seem clear cut. I left out the changes that would require someone to do
performance testing (& those that affect extensions)
The original post
https://civicrm.stackexchange.com/questions/39409/redundant-indexes
also suggests that
is_deleted_sort_name_id duplicates the id index
That is not the case - this index would only ever be
used if is_deleted is a filter. If it IS used then the primary key
index would NOT be used as only one index is used per table. So it's not
a duplicate per se (an index is only a duplicate if the first fields
in the index match - ie first_name_last_name would make first_name
redundant
but not for last_name)
However, note that the id part of the filter would only be used if
is_deleted and sort_name were too - so the id part is of little value
as the last parameter & it's likely mysql would chose to use the
primary key instead.
However, I would argue the index as a whole is of little value. The
cardinality of is_deleted is super low and so preceding sort_name
with it is probably inferior to just leaving the
sort_name index to be used, unless lots of searches on is_deleted=1 are done
on a site where that has a small result set.
I would suggest the way forward with this index would be to do some
performance testing around this index on a medium+ sized site.
On the queue index - it's hard to imagine that having more than the
queue name in the index adds any value at all. However, this is also such a small
table that probably the index has almost no size and the query probably takes
almost no time to run so the time we would spend testing index changes on
this table we would likely never get back.
Note changing these indexes will only affect new sites - or existing sites if they
run the System.updateIndexes api. We do have a ticket open
https://lab.civicrm.org/dev/core/-/issues/2279 with a proposal to
allow sites to define their own indexes which would work in conjunction with the
above api call (since indexes are not 1 size fits all - for example we
have removed indexes on things like contribution_status_id due to
low cardinality). We don't pro-actively run index updates on upgrade
due to not wanting to alter custom index config until 2279 is resolved
and also because some old DBs hit issues that we never diagnosed (and stopped
trying to once we stopped pro-actively notifying them about them)
Comments