diff --git a/TOC.md b/TOC.md index 7fbb82843c359..e617086ff575e 100644 --- a/TOC.md +++ b/TOC.md @@ -370,6 +370,7 @@ - [SQL Plan Management](/sql-plan-management.md) - [The Blocklist of Optimization Rules and Expression Pushdown](/blocklist-control-plan.md) - [Optimizer Fix Controls](/optimizer-fix-controls.md) + - [Index Advisor](/index-advisor.md) - Tutorials - [Multiple Availability Zones in One Region Deployment](/multi-data-centers-in-one-city-deployment.md) - [Three Availability Zones in Two Regions Deployment](/three-data-centers-in-two-cities-deployment.md) diff --git a/index-advisor.md b/index-advisor.md new file mode 100644 index 0000000000000..a3d0a401094e6 --- /dev/null +++ b/index-advisor.md @@ -0,0 +1,213 @@ +--- +title: Index Advisor +summary: TiDB Index Advisor. +--- + +# Index Advisor Overview + +TiDB's Index Advisor helps users optimize their workload by recommending indexes to improve query performance. The new SQL instruction, `RECOMMEND INDEX`, allows users to generate index recommendations for a single query or an entire workload. To avoid the resource-intensive process of physically creating indexes for evaluation, TiDB supports hypothetical indexes—logical indexes that are not materialized. The syntax and usage of Hypo Indexes are detailed in the Appendix. + +The Index Advisor analyzes queries to identify indexable columns from relevant clauses (for example, `WHERE`, `GROUP BY`, `ORDER BY`) and generates index candidates. Using the Hypo Index feature, it estimates the performance benefits of these candidates and employs a genetic search algorithm to select the optimal set of indexes. This algorithm begins with single-column indexes and iteratively explores multi-column indexes, leveraging a `What-If` analysis to evaluate potential indexes based on their impact on optimizer plan costs. Indexes are recommended if they reduce the overall cost compared to executing queries without them. + +In addition to recommending new indexes, TiDB also offers a feature to suggest dropping inactive indexes, ensuring efficient index management. + +# Recommend Index command + +SQL command `RECOMMEND INDEX` is introduced for index advisor tasks. Sub command `RUN` explores historical workloads and saves the recommendations in system tables. With option `FOR`, the command targets particular SQL statement even if it was not executed in the past. The command also accepts extra options for advance control. + +```sql +Recommend Index Run [ For ] [] +``` + +## Single Query Option + +Below is an example of a single query, assuming 5,000 rows in table `t` (we omit the insert statements for brevity): + +```sql +mysql> CREATE TABLE t(a int, b int, c int); +mysql> RECOMMEND INDEX RUN for "select a, b from t where a=1 and b=1"; ++----------+-------+------------+---------------+------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+---------------------------------+ +| database | table | index_name | index_columns | index_size | reason | top_impacted_query | create_index_statement | ++----------+-------+------------+---------------+------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+---------------------------------+ +| test | t | idx_a_b | a,b | 19872 | Column [a b] appear in Equal or Range Predicate clause(s) in query: select `a` , `b` from `test` . `t` where `a` = ? and `b` = ? | [{"Query":"SELECT `a`,`b` FROM `test`.`t` WHERE `a` = 1 AND `b` = 1","Improvement":0.999994}] | CREATE INDEX idx_a_b ON t(a,b); | ++----------+-------+------------+---------------+------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+---------------------------------+ +``` + +The index advisor considers single column indexes on `a` and `b` seperately and end up combining them in a single index which provides the best performance for the above singel query. + +Below, we show explain result for two cases: (1) query without indexes and (2) same query with the two column indexes using hypo (`what if`) index. The index advisor internally attempts both cases and pick the one with the least cost. Note that the search space also includes hypo indexes on `a` and `b` seperatley which does not provide lower cost than the two column iondex on both columns. For space limitation, we do not show these plans. + +```sql +mysql> explain format='verbose' select a, b from t where a=1 and b=1; ++-------------------------+---------+------------+-----------+---------------+----------------------------------+ +| id | estRows | estCost | task | access object | operator info | ++-------------------------+---------+------------+-----------+---------------+----------------------------------+ +| TableReader_7 | 0.01 | 196066.71 | root | | data:Selection_6 | +| └─Selection_6 | 0.01 | 2941000.00 | cop[tikv] | | eq(test.t.a, 1), eq(test.t.b, 1) | +| └─TableFullScan_5 | 5000.00 | 2442000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++-------------------------+---------+------------+-----------+---------------+----------------------------------+ + +mysql> explain format='verbose' select /*+ HYPO_INDEX(t, idx_ab, a, b) */ a, b from t where a=1 and b=1; ++------------------------+---------+---------+-----------+-----------------------------+-------------------------------------------------+ +| id | estRows | estCost | task | access object | operator info | ++------------------------+---------+---------+-----------+-----------------------------+-------------------------------------------------+ +| IndexReader_6 | 0.05 | 1.10 | root | | index:IndexRangeScan_5 | +| └─IndexRangeScan_5 | 0.05 | 10.18 | cop[tikv] | table:t, index:idx_ab(a, b) | range:[1 1,1 1], keep order:false, stats:pseudo | ++------------------------+---------+---------+-----------+-----------------------------+-------------------------------------------------+ +``` + +## Workload Option + +We illustrate this option through an example below, assuming 5,000 rows in `t1` and `t2`: + +```sql +mysql> CREATE TABLE t1 (a int, b int, c int, d int); +mysql> CREATE TABLE t2 (a int, b int, c int, d int); + +-- run some queires in this workload +mysql> select a, b from t1 where a=1 and b<=5; +mysql> select d from t1 order by d limit 10; +mysql> select * from t1, t2 where t1.a=1 and t1.d=t2.d; + +mysql> RECOMMEND INDEX RUN; ++----------+-------+------------+---------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+ +| database | table | index_name | index_columns | index_size | reason | top_impacted_query | create_index_statement | ++----------+-------+------------+---------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+ +| test | t1 | idx_a_b | a,b | 19872 | Column [a b] appear in Equal or Range Predicate clause(s) in query: select `a` , `b` from `test` . `t1` where `a` = ? and `b` <= ? | [{"Query":"SELECT `a`,`b` FROM `test`.`t1` WHERE `a` = 1 AND `b` \u003c= 5","Improvement":0.998214},{"Query":"SELECT * FROM (`test`.`t1`) JOIN `test`.`t2` WHERE `t1`.`a` = 1 AND `t1`.`d` = `t2`.`d`","Improvement":0.336837}] | CREATE INDEX idx_a_b ON t1(a,b); | +| test | t1 | idx_d | d | 9936 | Column [d] appear in Equal or Range Predicate clause(s) in query: select `d` from `test` . `t1` order by `d` limit ? | [{"Query":"SELECT `d` FROM `test`.`t1` ORDER BY `d` LIMIT 10","Improvement":0.999433}] | CREATE INDEX idx_d ON t1(d); | +| test | t2 | idx_d | d | 9936 | Column [d] appear in Equal or Range Predicate clause(s) in query: select * from ( `test` . `t1` ) join `test` . `t2` where `t1` . `a` = ? and `t1` . `d` = `t2` . `d` | [{"Query":"SELECT * FROM (`test`.`t1`) JOIN `test`.`t2` WHERE `t1`.`a` = 1 AND `t1`.`d` = `t2`.`d`","Improvement":0.638567}] | CREATE INDEX idx_d ON t2(d); | ++----------+-------+------------+---------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+ +``` + +In this case, the index advisor identifies optimal indexes for an entire workload rather than focusing on a single query. The workload queries are sourced from the TiDB system table `information_schema.statements_summary`. + +This table can contain a vast number of queries, ranging from tens to hundreds of thousands, which can impact the performance of the index advisor. To address this, the index advisor prioritizes the most important queries in the workload based on their frequency, as frequent queries have a greater impact on overall workload performance. By default, the index advisor selects the top 1,000 queries, a configurable value controlled by the parameter `max_num_query` (see below). + +The results of the `RECOMMEND INDEX` commands are stored in the `mysql.index_advisor_results` table. Users can query this table to view the recommended indexes. Below is an example of the contents of this system table after executing the two `RECOMMEND INDEX` commands mentioned above. + +```sql +mysql> select * from mysql.index_advisor_results; ++----+---------------------+---------------------+-------------+------------+------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+-------+ +| id | created_at | updated_at | schema_name | table_name | index_name | index_columns | index_details | top_impacted_queries | workload_impact | extra | ++----+---------------------+---------------------+-------------+------------+------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+-------+ +| 1 | 2024-12-10 11:44:45 | 2024-12-10 11:44:45 | test | t1 | idx_a_b | a,b | {"IndexSize": 0, "Reason": "Column [a b] appear in Equal or Range Predicate clause(s) in query: select `a` , `b` from `test` . `t1` where `a` = ? and `b` <= ?"} | [{"Improvement": 0.998214, "Query": "SELECT `a`,`b` FROM `test`.`t1` WHERE `a` = 1 AND `b` <= 5"}, {"Improvement": 0.337273, "Query": "SELECT * FROM (`test`.`t1`) JOIN `test`.`t2` WHERE `t1`.`a` = 1 AND `t1`.`d` = `t2`.`d`"}] | {"WorkloadImprovement": 0.395235} | NULL | +| 2 | 2024-12-10 11:44:45 | 2024-12-10 11:44:45 | test | t1 | idx_d | d | {"IndexSize": 0, "Reason": "Column [d] appear in Equal or Range Predicate clause(s) in query: select `d` from `test` . `t1` order by `d` limit ?"} | [{"Improvement": 0.999715, "Query": "SELECT `d` FROM `test`.`t1` ORDER BY `d` LIMIT 10"}] | {"WorkloadImprovement": 0.225116} | NULL | +| 3 | 2024-12-10 11:44:45 | 2024-12-10 11:44:45 | test | t2 | idx_d | d | {"IndexSize": 0, "Reason": "Column [d] appear in Equal or Range Predicate clause(s) in query: select * from ( `test` . `t1` ) join `test` . `t2` where `t1` . `a` = ? and `t1` . `d` = `t2` . `d`"} | [{"Improvement": 0.639393, "Query": "SELECT * FROM (`test`.`t1`) JOIN `test`.`t2` WHERE `t1`.`a` = 1 AND `t1`.`d` = `t2`.`d`"}] | {"WorkloadImprovement": 0.365871} | NULL | ++----+---------------------+---------------------+-------------+------------+------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------+-------+ +``` + +## Index Advisor Options + +The `RECOMMEND INDEX` syntax supports configuring and displaying options related to the command, as shown below: + +```sql +Recommend Index Set