Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev/core#2575 remove / improve indexes #20245

Merged
merged 1 commit into from
May 7, 2021

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented May 7, 2021

Overview

dev/core#2575 remove / improve indexes

This removes a couple of indexes on the id field as they duplicate the primary key
as pointed out in https://lab.civicrm.org/dev/core/-/issues/2575

In addition a couple of other indexes are removed as duplicates but the
originals are fixed to be more meaningful.

Before

Indexes on new installs (or installs that run the UpdateIndices api)

  • Entity file - unique index on 'UI_entity_table_entity_id_file_id' AND index on entity_table_entity_id
  • Financial type 'UI_id' duplicates primary key
  • Line item - index on entity_table, entity_id duplicates combined index

After

Indexes on new installs (or installs that run the UpdateIndices api)

    • Entity file - unique index on 'UI_entity_id_entity_table_file_id' (order changed), index on entity_table_entity_id removed
  • Financial type 'UI_id' removed
  • Line item - entity_table, entity_id index removed, unique index is re-ordered to bring lower cardinality to the front

Technical Details

It's important that in a combined
key the field with the lowest cardinality is the first as the combined key is
searched for a complete match so on something like line_item we see the
key is

civicrm_contribution1
civicrm_contribution10
civicrm_contribution1111
civicrm_membership1

In this case there are only 3 possible variants for the first chunk of
characters so we get a much more effective key by reversing them.
(arguably the entity_table actually adds no value due to it's low
cardinality).

This PR picks off the index changes that I think could be done without any performance
testing as they seem clear cut. I left out the changes that would require someone to do
performance testing (& those that affect extensions)

The original post
https://civicrm.stackexchange.com/questions/39409/redundant-indexes
also suggests that

is_deleted_sort_name_id duplicates the id index

That is not the case - this index would only ever be
used if is_deleted is a filter. If it IS used then the primary key
index would NOT be used as only one index is used per table. So it's not
a duplicate per se (an index is only a duplicate if the first fields
in the index match - ie first_name_last_name would make first_name
redundant
but not for last_name)

However, note that the id part of the filter would only be used if
is_deleted and sort_name were too - so the id part is of little value
as the last parameter & it's likely mysql would chose to use the
primary key instead.

However, I would argue the index as a whole is of little value. The
cardinality of is_deleted is super low and so preceding sort_name
with it is probably inferior to just leaving the
sort_name index to be used, unless lots of searches on is_deleted=1 are done
on a site where that has a small result set.

I would suggest the way forward with this index would be to do some
performance testing around this index on a medium+ sized site.

On the queue index - it's hard to imagine that having more than the
queue name in the index adds any value at all. However, this is also such a small
table that probably the index has almost no size and the query probably takes
almost no time to run so the time we would spend testing index changes on
this table we would likely never get back.

Note changing these indexes will only affect new sites - or existing sites if they
run the System.updateIndexes api. We do have a ticket open
https://lab.civicrm.org/dev/core/-/issues/2279 with a proposal to
allow sites to define their own indexes which would work in conjunction with the
above api call (since indexes are not 1 size fits all - for example we
have removed indexes on things like contribution_status_id due to
low cardinality). We don't pro-actively run index updates on upgrade
due to not wanting to alter custom index config until 2279 is resolved
and also because some old DBs hit issues that we never diagnosed (and stopped
trying to once we stopped pro-actively notifying them about them)

Comments

@civibot
Copy link

civibot bot commented May 7, 2021

(Standard links)

@civibot civibot bot added the master label May 7, 2021
'UI_id' => [
'name' => 'UI_id',
'field' => [
0 => 'id',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UM what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - so that is the duplicate index - duplicates the primary key

This removes a couple of indexes on the id field as they duplicate the primary key
as pointed out in https://lab.civicrm.org/dev/core/-/issues/2575

In addition a couple of other indexes are removed as duplicates but the
originals are fixed to be more meaningful. It's important that in a combined
key the field with the lowest cardinality is the first as the combined key is
searched for a complete match so on something like line_item we see the
key is

civicrm_contribution1
civicrm_contribution10
civicrm_contribution1111
civicrm_membership1

In this case there are only 3 possible variants for the first chunk of
characters so we get a much more effective key by reversing them.
(arguably the entity_table actually adds no value due to it's low
cardinality).

The original post
https://civicrm.stackexchange.com/questions/39409/redundant-indexes
also suggests that

is_deleted_sort_name_id duplicates the id index

That is not the case - this query would only ever be
used if is_deleted is a filter. If it IS used then the id
index would NOT be used as only one index is used per table.

However, note that the id part of the filter would only be used if
is_deleted and sort_name were too - so the id part is of little value
as the last parameter & it's likely mysql would chose to use the
primary key instead.

However, I would argue the index as a whole is of little value. The
cardinality of is_deleted is super low and so preceding sort_name
with it is probably inferior to just leaving the
sort_name index to be used. However, this would need testing on
a site with enough data to get results.

On the queue index - it's hard to imagine that having more than the
queue name in the index adds any value at all. However, this is also such a small
table that probably the index has almost no size and the query probably takes
almost no time to run so the time we would spend testing index changes on
this table we would likely never get back.

Note changing these indexes will only affect new sites - or existing sites if they
run the System.updateIndexes api. We do have a ticket open
https://lab.civicrm.org/dev/core/-/issues/2279 with a proposal to
allow sites to define their own indexes which would work in conjunction with the
above api call (since indexes are not 1 size fits all - for example we
have removed indexes on things like contribution_status_id due to
low cardinality). We don't pro-actively run index updates on upgrade
due to not wanting to alter custom index config until 2279 is resolved
and also because some old DBs hit issues that we never diagnosed (and stopped
trying to once we stopped pro-actively notifying them about them)
@seamuslee001
Copy link
Contributor

All these look good to me MOP

@seamuslee001 seamuslee001 merged commit 8b4e18c into civicrm:master May 7, 2021
@seamuslee001 seamuslee001 deleted the ft branch May 7, 2021 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants