dev/core#2575 remove / improve indexes #20245

eileenmcnaughton · 2021-05-07T00:25:52Z

Overview

dev/core#2575 remove / improve indexes

This removes a couple of indexes on the id field as they duplicate the primary key
as pointed out in https://lab.civicrm.org/dev/core/-/issues/2575

In addition a couple of other indexes are removed as duplicates but the
originals are fixed to be more meaningful.

Before

Indexes on new installs (or installs that run the UpdateIndices api)

Entity file - unique index on 'UI_entity_table_entity_id_file_id' AND index on entity_table_entity_id
Financial type 'UI_id' duplicates primary key
Line item - index on entity_table, entity_id duplicates combined index

After

Indexes on new installs (or installs that run the UpdateIndices api)

- Entity file - unique index on 'UI_entity_id_entity_table_file_id' (order changed), index on entity_table_entity_id removed
Financial type 'UI_id' removed
Line item - entity_table, entity_id index removed, unique index is re-ordered to bring lower cardinality to the front

Technical Details

It's important that in a combined
key the field with the lowest cardinality is the first as the combined key is
searched for a complete match so on something like line_item we see the
key is

civicrm_contribution1
civicrm_contribution10
civicrm_contribution1111
civicrm_membership1

In this case there are only 3 possible variants for the first chunk of
characters so we get a much more effective key by reversing them.
(arguably the entity_table actually adds no value due to it's low
cardinality).

This PR picks off the index changes that I think could be done without any performance
testing as they seem clear cut. I left out the changes that would require someone to do
performance testing (& those that affect extensions)

The original post
https://civicrm.stackexchange.com/questions/39409/redundant-indexes
also suggests that

is_deleted_sort_name_id duplicates the id index

That is not the case - this index would only ever be
used if is_deleted is a filter. If it IS used then the primary key
index would NOT be used as only one index is used per table. So it's not
a duplicate per se (an index is only a duplicate if the first fields
in the index match - ie first_name_last_name would make first_name
redundant
but not for last_name)

However, note that the id part of the filter would only be used if
is_deleted and sort_name were too - so the id part is of little value
as the last parameter & it's likely mysql would chose to use the
primary key instead.

However, I would argue the index as a whole is of little value. The
cardinality of is_deleted is super low and so preceding sort_name
with it is probably inferior to just leaving the
sort_name index to be used, unless lots of searches on is_deleted=1 are done
on a site where that has a small result set.

I would suggest the way forward with this index would be to do some
performance testing around this index on a medium+ sized site.

On the queue index - it's hard to imagine that having more than the
queue name in the index adds any value at all. However, this is also such a small
table that probably the index has almost no size and the query probably takes
almost no time to run so the time we would spend testing index changes on
this table we would likely never get back.

Note changing these indexes will only affect new sites - or existing sites if they
run the System.updateIndexes api. We do have a ticket open
https://lab.civicrm.org/dev/core/-/issues/2279 with a proposal to
allow sites to define their own indexes which would work in conjunction with the
above api call (since indexes are not 1 size fits all - for example we
have removed indexes on things like contribution_status_id due to
low cardinality). We don't pro-actively run index updates on upgrade
due to not wanting to alter custom index config until 2279 is resolved
and also because some old DBs hit issues that we never diagnosed (and stopped
trying to once we stopped pro-actively notifying them about them)

Comments

civibot · 2021-05-07T00:25:54Z

(Standard links)

If this is your first pull-request for CiviCRM, please browse CONTRIBUTING.md for information about the development and testing processes.
If you are reviewing this pull-request, you may wish to consult the test sites and the Review Standards (long template, short template).

seamuslee001 · 2021-05-07T00:31:30Z

CRM/Financial/DAO/FinancialType.php

-      'UI_id' => [
-        'name' => 'UI_id',
-        'field' => [
-          0 => 'id',


Yeah - so that is the duplicate index - duplicates the primary key

This removes a couple of indexes on the id field as they duplicate the primary key as pointed out in https://lab.civicrm.org/dev/core/-/issues/2575 In addition a couple of other indexes are removed as duplicates but the originals are fixed to be more meaningful. It's important that in a combined key the field with the lowest cardinality is the first as the combined key is searched for a complete match so on something like line_item we see the key is civicrm_contribution1 civicrm_contribution10 civicrm_contribution1111 civicrm_membership1 In this case there are only 3 possible variants for the first chunk of characters so we get a much more effective key by reversing them. (arguably the entity_table actually adds no value due to it's low cardinality). The original post https://civicrm.stackexchange.com/questions/39409/redundant-indexes also suggests that is_deleted_sort_name_id duplicates the id index That is not the case - this query would only ever be used if is_deleted is a filter. If it IS used then the id index would NOT be used as only one index is used per table. However, note that the id part of the filter would only be used if is_deleted and sort_name were too - so the id part is of little value as the last parameter & it's likely mysql would chose to use the primary key instead. However, I would argue the index as a whole is of little value. The cardinality of is_deleted is super low and so preceding sort_name with it is probably inferior to just leaving the sort_name index to be used. However, this would need testing on a site with enough data to get results. On the queue index - it's hard to imagine that having more than the queue name in the index adds any value at all. However, this is also such a small table that probably the index has almost no size and the query probably takes almost no time to run so the time we would spend testing index changes on this table we would likely never get back. Note changing these indexes will only affect new sites - or existing sites if they run the System.updateIndexes api. We do have a ticket open https://lab.civicrm.org/dev/core/-/issues/2279 with a proposal to allow sites to define their own indexes which would work in conjunction with the above api call (since indexes are not 1 size fits all - for example we have removed indexes on things like contribution_status_id due to low cardinality). We don't pro-actively run index updates on upgrade due to not wanting to alter custom index config until 2279 is resolved and also because some old DBs hit issues that we never diagnosed (and stopped trying to once we stopped pro-actively notifying them about them)

seamuslee001 · 2021-05-07T01:11:05Z

All these look good to me MOP

civibot bot added the master label May 7, 2021

seamuslee001 reviewed May 7, 2021

View reviewed changes

eileenmcnaughton force-pushed the ft branch from e58e66e to 90dcf8f Compare May 7, 2021 00:33

seamuslee001 added the merge on pass label May 7, 2021

seamuslee001 merged commit 8b4e18c into civicrm:master May 7, 2021

seamuslee001 deleted the ft branch May 7, 2021 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dev/core#2575 remove / improve indexes #20245

dev/core#2575 remove / improve indexes #20245

eileenmcnaughton commented May 7, 2021 •

edited

Loading

civibot bot commented May 7, 2021

seamuslee001 May 7, 2021

eileenmcnaughton May 7, 2021

seamuslee001 commented May 7, 2021

dev/core#2575 remove / improve indexes #20245

dev/core#2575 remove / improve indexes #20245

Conversation

eileenmcnaughton commented May 7, 2021 • edited Loading

Overview

Before

After

Technical Details

Comments

civibot bot commented May 7, 2021

seamuslee001 May 7, 2021

Choose a reason for hiding this comment

eileenmcnaughton May 7, 2021

Choose a reason for hiding this comment

seamuslee001 commented May 7, 2021

eileenmcnaughton commented May 7, 2021 •

edited

Loading