Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of default GP_Translation->for_translation() query #376

Closed
ocean90 opened this issue Mar 31, 2016 · 9 comments
Closed
Assignees
Labels
[Priority] High [Status] In Progress Tracking issues with work in progress [Type] Enhancement A suggestion for improvement. [Type] Performance
Milestone

Comments

@ocean90
Copy link
Member

ocean90 commented Mar 31, 2016

The query for https://translate.wordpress.org/projects/wp/dev/admin/de/formal takes between 8 and 10 seconds:

SELECT SQL_CALC_FOUND_ROWS t.*, o.*, t.id as id, o.id as original_id, t.status as translation_status, o.status as original_status, t.date_added as translation_added, o.date_added as original_added
            FROM translate_originals as o
            LEFT JOIN translate_translations AS t ON o.id = t.original_id AND t.translation_set_id = 3898 AND t.status != "rejected" AND t.status != "old" AND (t.status = 'current' OR t.status = 'waiting' OR t.status = 'fuzzy')
            WHERE o.project_id = 78 AND o.status LIKE '+%'  ORDER BY t.date_added DESC LIMIT 20 OFFSET 0;
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE o range project_id_status project_id_status 262 NULL 1741 100.00 Using where; Using temporary; Using filesort
1 SIMPLE t ref original_id,translation_set_id,translation_set_id_status original_id 5 translate.o.id 55 100.00

The query for https://translate.wordpress.org/projects/wp/dev/admin/de/formal?filters%5Bstatus%5D=untranslated&sort%5Bby%5D=priority&sort%5Bhow%5D=desc takes between 8 and 15 seconds:

SELECT SQL_CALC_FOUND_ROWS t.*, o.*, t.id as id, o.id as original_id, t.status as translation_status, o.status as original_status, t.date_added as translation_added, o.date_added as original_added
            FROM translate_originals as o
            LEFT JOIN translate_translations AS t ON o.id = t.original_id AND t.translation_set_id = 3898 AND t.status != "rejected" AND t.status != "old"
            WHERE o.project_id = 78 AND o.status LIKE '+%' AND t.translation_0 IS NULL ORDER BY o.priority DESC, o.date_added DESC LIMIT 20 OFFSET 0;
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE o range project_id_status project_id_status 262 NULL 1741 100.00 Using where; Using filesort
1 SIMPLE t ref original_id,translation_set_id,translation_set_id_status original_id 5 translate.o.id 55 100.00 Using where; Not exists

The query for https://translate.wordpress.org/projects/wp/dev/admin/de/formal?filters%5Btranslated%5D=yes&filters%5Bstatus%5D=current takes between 500ms and 2 seconds:

SELECT SQL_CALC_FOUND_ROWS t.*, o.*, t.id as id, o.id as original_id, t.status as translation_status, o.status as original_status, t.date_added as translation_added, o.date_added as original_added
            FROM translate_originals as o
            INNER JOIN translate_translations AS t ON o.id = t.original_id AND t.translation_set_id = 3898 AND (t.status = 'current')
            WHERE o.project_id = 78 AND o.status LIKE '+%'  ORDER BY t.date_added DESC LIMIT 20 OFFSET 0;
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t ref original_id,translation_set_id,translation_set_id_status translation_set_id_status 27 const,const 1214 100.00 Using where; Using filesort
1 SIMPLE o eq_ref PRIMARY,project_id_status PRIMARY 4 translate.t.original_id 1 100.00 Using where
@ocean90 ocean90 added [Type] Enhancement A suggestion for improvement. [Priority] High labels Mar 31, 2016
@ocean90
Copy link
Member Author

ocean90 commented Mar 31, 2016

@toolstack
Copy link
Contributor

I don't think we can use an inner join in those cases as it will return different results if no translation row exists.

For example, when you create a new translation set, no rows are added to the translations table, so the inner join would return 0 rows (there are no rows common between the originals and translations tables), where as the left join would return all the rows from the originals table.

@akirk
Copy link
Member

akirk commented Apr 1, 2016

I think a compound index on the translations table for (original_id, translation_set_id, status) should speed up things. Right now only the original_id index is being used and the filtering of the other fields has to be done without an index.

@ocean90 ocean90 modified the milestone: 2.1 Apr 1, 2016
@akirk
Copy link
Member

akirk commented Apr 1, 2016

I now took a closer look at the tables and there are a couple of things to optimize. Usually multiple single key indices indicate that somebody misunderstood something about MySQL keys (no offense to anyone, just an observation). It might make sense to remove some of those indices (for example date_added on translations, unless there is a query somewhere that displays all translations in the table sorted or filtered by date_added).

For example the keys in the translations table

  KEY `translation_set_id` (`translation_set_id`),
  KEY `translation_set_id_status` (`translation_set_id`,`status`),

are redundant because translation_set_id_status can also be used when just translation_set_id is being queried.

The best way to approach this is to optimize the keys based on the most common queries. I'll assume the queries you mention are the most important ones.

Query 1: optimize by adding an index (translation_set_id, date_added) to the translations table
screen shot 2016-04-01 at 16 04 16
Explanation: MySQL needs to fetch all rows from table translations with the given translation_set_id and sort them by date_added. Normally you'd also include the field status between translation_set_id and date_added but it cannot be used in this case because there are multiple possible values for it.

One interesting aspect to this query: MySQL opts to first query the translations table and join the originals table later, using the primary key (because it needs a 1:1 match anyway) and then filters out the wrong stati while fetching the rows.

Query 2: this is weird one: this is the definition translation_0 TEXT NOT NULL yet the query contains a t.translation_0 IS NULL. So this can be removed from the query. Also the o.status LIKE '+%' really hurts, especially as there is actually just one possible value: +active. So this should be changed to o.status = '+active'.

So to optimize now: add a key (original_id, translation_set_id, status) to translations and a key (project_id, status, priority, date_added) to originals

screen shot 2016-04-01 at 16 35 42

Query 3: this is already running quite fast, the key from Query 1 can be used here to improve things a little too. I would also recommend to change this to o.status = '+active'

@ocean90
Copy link
Member Author

ocean90 commented Apr 1, 2016

Thanks for your comprehensive analysis, @akirk!

It might make sense to remove some of those indices (for example date_added on translations, unless there is a query somewhere that displays all translations in the table sorted or filtered by date_added).

I took a look at when those were introduced:

Query 2: this is weird one: this is the definition translation_0 TEXT NOT NULL yet the query contains a t.translation_0 IS NULL.

Isn't that because all originals are mapped with a translation and then the rows without a translation (all fields are NULL) are selected?

@akirk
Copy link
Member

akirk commented Apr 1, 2016

Isn't that because all originals are mapped with a translation and then the rows without a translation (all fields are NULL) are selected?

Ah, you are right. My bad, I didn't consider that the row could come in as NULL from the join.

@ocean90
Copy link
Member Author

ocean90 commented Apr 6, 2016

Proposed indexes in SQL:

ALTER TABLE translate_translations ADD INDEX test1(translation_set_id, date_added);
ALTER TABLE translate_translations ADD INDEX test2(original_id, translation_set_id, status);
ALTER TABLE translate_originals ADD INDEX test3(project_id, status, priority, date_added);

@ocean90
Copy link
Member Author

ocean90 commented Apr 6, 2016

Looking at the index for (original_id, translation_set_id, status) and noticed that the current schema allows an empty ID for the original and translation set. That doesn't make sense IMO. But would it enhance the index generation if we change it?

@akirk
Copy link
Member

akirk commented Apr 11, 2016

Here are my findings, running the queries on a big test database on a quite fast machine (SSD, Core i7 3GHz, 16GB RAM):

  • Index test1 turns out not useful for these queries.
  • o.status ='+active' instead of o.status LIKE '+%' already improves a lot
  • not having a production system I ran the queries multiple times -- the visible effects (14x, 5x improvement don't have to be the same on a production system)

Query 1

mysql> EXPLAIN SELECT SQL_CALC_FOUND_ROWS t., o., t.id as id, o.id as original_id, t.status as translation_status, o.status as original_status, t.date_added as translation_added, o.date_added as original_added FROM translate_originals as o LEFT JOIN translate_translations AS t ON o.id = t.original_id AND t.translation_set_id = 3898 AND t.status != "rejected" AND t.status != "old" AND (t.status = 'current' OR t.status = 'waiting' OR t.status = 'fuzzy') WHERE o.project_id = 78 AND o.status ='+active' ORDER BY t.date_added DESC LIMIT 20 OFFSET 0\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: o
partitions: NULL
type: ref
possible_keys: project_id_status,test3
key: project_id_status
key_len: 262
ref: const,const
rows: 1948
filtered: 100.00
Extra: Using temporary; Using filesort <- because of ORDER BY t.date_added (no big deal, small result size)
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t
partitions: NULL
type: ref
possible_keys: original_id,translation_set_id,translation_set_id_status,test1,test2
key: test2 <- new key being used
key_len: 10
ref: translate.o.id,const
rows: 1
filtered: 100.00
Extra: Using where
2 rows in set, 1 warning (0.00 sec)

Real query: 20 rows in set (0.05/0.05/0.05/0.06/0.04 sec)
Old query: 20 rows in set (3.46/0.74/0.75/0.81/0.73 sec)

Query 2
mysql> EXPLAIN SELECT SQL_CALC_FOUND_ROWS t., o., t.id as id, o.id as original_id, t.status as translation_status, o.status as original_status, t.date_added as translation_added, o.date_added as original_added FROM translate_originals as o LEFT JOIN translate_translations AS t ON o.id = t.original_id AND t.translation_set_id = 3898 AND t.status != "rejected" AND t.status != "old" WHERE o.project_id = 14 AND o.status ='+active' AND t.translation_0 IS NULL ORDER BY o.priority DESC, o.date_added DESC LIMIT 20 OFFSET 0 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: o
partitions: NULL
type: ref
possible_keys: project_id_status,test3
key: test3 <- new key being used
key_len: 262
ref: const,const
rows: 1548
filtered: 100.00
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t
partitions: NULL
type: ref
possible_keys: original_id,translation_set_id,translation_set_id_status,test1,test2
key: test2 <- new key being used
key_len: 10
ref: translate.o.id,const
rows: 1
filtered: 10.00
Extra: Using where; Not exists
2 rows in set, 1 warning (0.00 sec)

Real query: 20 rows in set (0.06/0.07/0.07/0.07/0.06 sec)
Old query: 20 rows in set (0.95/0.26/0.27/0.26/0.25 sec)

Query 3
mysql> EXPLAIN SELECT SQL_CALC_FOUND_ROWS t., o., t.id as id, o.id as original_id, t.status as translation_status, o.status as original_status, t.date_added as translation_added, o.date_added as original_added FROM translate_originals as o INNER JOIN translate_translations AS t ON o.id = t.original_id AND t.translation_set_id = 3898 AND (t.status = 'current') WHERE o.project_id = 78 AND o.status = '+active' ORDER BY t.date_added DESC LIMIT 20 OFFSET 0\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
partitions: NULL
type: ref
possible_keys: original_id,translation_set_id,translation_set_id_status,test1,test2
key: translation_set_id_status
key_len: 27
ref: const,const
rows: 1285
filtered: 100.00
Extra: Using index condition; Using where; Using filesort <- because of ORDER BY t.date_added (no big deal, small result size)
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: o
partitions: NULL
type: eq_ref
possible_keys: PRIMARY,project_id_status,test3
key: PRIMARY <- best key to use
key_len: 4
ref: translate.t.original_id
rows: 1
filtered: 5.00
Extra: Using where
2 rows in set, 1 warning (0.00 sec)

Real query: 20 rows in set (0.04/0.04/0.04/0.03/0.04 sec)
Old query: 20 rows in set (0.05/0.05/0.04/0.05/0.05 sec)

ocean90 added a commit that referenced this issue Apr 20, 2016
There are only two status for originals: `+active` and `-original`.  There is no need to perform a fuzzy search which affects the performance of the GP_Translation->for_translation()` query.

See #376.
ocean90 added a commit that referenced this issue Apr 20, 2016
ocean90 added a commit that referenced this issue Apr 20, 2016
ocean90 added a commit that referenced this issue Apr 25, 2016
toolstack pushed a commit that referenced this issue Apr 28, 2016
yoavf pushed a commit to Automattic/GlotPress-WP that referenced this issue May 17, 2016
There are only two status for originals: `+active` and `-original`.  There is no need to perform a fuzzy search which affects the performance of the GP_Translation->for_translation()` query.

See GlotPress#376.
toolstack pushed a commit that referenced this issue May 17, 2016
ocean90 added a commit that referenced this issue May 22, 2016
@ocean90 ocean90 self-assigned this May 22, 2016
@ocean90 ocean90 added the [Status] In Progress Tracking issues with work in progress label May 31, 2016
yoavf pushed a commit to Automattic/GlotPress-WP that referenced this issue Jun 5, 2016
yoavf pushed a commit to Automattic/GlotPress-WP that referenced this issue Jun 5, 2016
ocean90 added a commit that referenced this issue Jun 7, 2016
toolstack pushed a commit that referenced this issue Jun 20, 2016
@ocean90 ocean90 closed this as completed Jun 29, 2016
toolstack pushed a commit that referenced this issue Jul 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Priority] High [Status] In Progress Tracking issues with work in progress [Type] Enhancement A suggestion for improvement. [Type] Performance
Projects
None yet
Development

No branches or pull requests

3 participants