Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display row width in results of explain statements #20676

Open
qw4990 opened this issue Oct 27, 2020 · 6 comments
Open

display row width in results of explain statements #20676

qw4990 opened this issue Oct 27, 2020 · 6 comments
Labels
feature/discussing This feature request is discussing among product managers good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@qw4990
Copy link
Contributor

qw4990 commented Oct 27, 2020

Feature Request

Is your feature request related to a problem? Please describe:

Since one part of our cost model is estRows * rowWidth * scanFactor, the parameter rowWidth has an impact on the cost, but now TiDB only displays estRows when processing explain, which may confuse users in some cases, for example:

create table t (a int, b int, c int, d varchar(128), primary key(a), index(b, c));
insert into t values (1, 1, 1, space(128));
insert into t values (2, 2, 2, space(128));
insert into t values (3, 3, 3, space(128));
insert into t values (4, 4, 4, space(128));
analyze table t;

mysql> explain select count(*) from t where a<=1 and b>=2;
+-----------------------------+---------+-----------+------------------------+----------------------------------+
| id                          | estRows | task      | access object          | operator info                    |
+-----------------------------+---------+-----------+------------------------+----------------------------------+
| StreamAgg_10                | 1.00    | root      |                        | funcs:count(1)->Column#5         |
| └─IndexReader_22            | 0.75    | root      |                        | index:Selection_21               |
|   └─Selection_21            | 0.75    | cop[tikv] |                        | le(test.t.a, 1)                  |
|     └─IndexRangeScan_20     | 3.00    | cop[tikv] | table:t, index:b(b, c) | range:[2,+inf], keep order:false |
+-----------------------------+---------+-----------+------------------------+----------------------------------+

mysql> explain select count(*) from t use index(primary) where a<=1 and b>=2;
+-----------------------------+---------+-----------+---------------+----------------------------------+
| id                          | estRows | task      | access object | operator info                    |
+-----------------------------+---------+-----------+---------------+----------------------------------+
| StreamAgg_10                | 1.00    | root      |               | funcs:count(1)->Column#5         |
| └─TableReader_17            | 0.75    | root      |               | data:Selection_16                |
|   └─Selection_16            | 0.75    | cop[tikv] |               | ge(test.t.b, 2)                  |
|     └─TableRangeScan_15     | 1.00    | cop[tikv] | table:t       | range:[-inf,1], keep order:false |
+-----------------------------+---------+-----------+---------------+----------------------------------+

As shown above, PK has a better selectivity with estRows=1 than IDX with estRows=3, but the optimizer chooses IDX.
This may confuse our users if they don't know the impact of rowWidth on cost, and also make it harder for optimizer maintainers to investigate index selection problems.

Describe the feature you'd like:

Like PG, display rowWidth in results of explain:

EXPLAIN SELECT * FROM tenk1;

                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)

We can display this field in operator info:

mysql> explain select count(*) from t use index(primary) where a<=1 and b>=2;
+-----------------------------+---------+-----------+---------------+----------------------------------+
| id                          | estRows | task      | access object | operator info                    |
+-----------------------------+---------+-----------+---------------+----------------------------------+
| StreamAgg_10                | 1.00    | root      |               | funcs:count(1)->Column#5         |
| └─TableReader_17            | 0.75    | root      |               | data:Selection_16                |
|   └─Selection_16            | 0.75    | cop[tikv] |               | ge(test.t.b, 2)                  |
|     └─TableRangeScan_15     | 1.00    | cop[tikv] | table:t       | range:[-inf,1], keep order:false, row width:2333 |
+-----------------------------+---------+-----------+---------------+----------------------------------+
@qw4990 qw4990 added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature. labels Oct 27, 2020
@qw4990 qw4990 added the feature/discussing This feature request is discussing among product managers label Oct 27, 2020
@qw4990
Copy link
Contributor Author

qw4990 commented Oct 27, 2020

@SunRunAway @eurekaka @zz-jason What're your opinions?

@SunRunAway
Copy link
Contributor

Another option is to put into estRows

@qw4990
Copy link
Contributor Author

qw4990 commented Oct 30, 2020

Another option is to put into estRows
Like this?

mysql> explain select count(*) from t use index(primary) where a<=1 and b>=2;
+-----------------------------+---------------+-----------+---------------+----------------------------------+
| id                          | estRows,width | task      | access object | operator info                    |
+-----------------------------+---------------+-----------+---------------+----------------------------------+
| StreamAgg_10                | 1.00, 7       | root      |               | funcs:count(1)->Column#5         |
| └─TableReader_17            | 0.75, 456     | root      |               | data:Selection_16                |
|   └─Selection_16            | 0.75, 1234    | cop[tikv] |               | ge(test.t.b, 2)                  |
|     └─TableRangeScan_15     | 1.00, 2333    | cop[tikv] | table:t       | range:[-inf,1], keep order:false |
+-----------------------------+---------------+-----------+---------------+----------------------------------+

@ghost
Copy link

ghost commented Oct 30, 2020

I think the cost is more important than the rows or width? i.e. these are all just dimensions that factor into the cost.

May I suggest renaming the column to cost?

mysql> explain select count(*) from t use index(primary) where a<=1 and b>=2;
+-----------------------------+------------------+-----------+---------------+----------------------------------+
| id                          | cost             | task      | access object | operator info                    |
+-----------------------------+------------------+-----------+---------------+----------------------------------+
| StreamAgg_10                | 1232 [1.00; 7]   | root      |               | funcs:count(1)->Column#5         |
| └─TableReader_17            | 231 [0.75; 456]  | root      |               | data:Selection_16                |
|   └─Selection_16            | 32 [0.75; 1234]  | cop[tikv] |               | ge(test.t.b, 2)                  |
|     └─TableRangeScan_15     | 343 [1.00; 2333] | cop[tikv] | table:t       | range:[-inf,1], keep order:false |
+-----------------------------+------------------+-----------+---------------+----------------------------------+

(Many other DBs show cost in their explain output).

@qw4990
Copy link
Contributor Author

qw4990 commented Oct 30, 2020

I think the cost is more important than the rows or width? i.e. these are all just dimensions that factor into the cost.

May I suggest renaming the column to cost?

mysql> explain select count(*) from t use index(primary) where a<=1 and b>=2;
+-----------------------------+------------------+-----------+---------------+----------------------------------+
| id                          | cost             | task      | access object | operator info                    |
+-----------------------------+------------------+-----------+---------------+----------------------------------+
| StreamAgg_10                | 1232 [1.00; 7]   | root      |               | funcs:count(1)->Column#5         |
| └─TableReader_17            | 231 [0.75; 456]  | root      |               | data:Selection_16                |
|   └─Selection_16            | 32 [0.75; 1234]  | cop[tikv] |               | ge(test.t.b, 2)                  |
|     └─TableRangeScan_15     | 343 [1.00; 2333] | cop[tikv] | table:t       | range:[-inf,1], keep order:false |
+-----------------------------+------------------+-----------+---------------+----------------------------------+

(Many other DBs show cost in their explain output).

I think this is better! What do you think? @SunRunAway @eurekaka @zz-jason

@miaoqingli
Copy link

I agree that TiDB need the cost that can directly display how much cost the execution plan used.
For example, there are Cost (%CPU) and bytes in Oracle.
image
There are width and cost in another database named DM.
image
In general, cost is considered to be the number which was caculated by CBO. Such as a plan which need table scan 10 rows, the column width is 0.5, if assuming table scan 1 rows is 1 cpu , then the total cost is 1010.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/discussing This feature request is discussing among product managers good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants