Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YCSB workloade is too slow for scan operation #9067

Closed
siddontang opened this issue Jan 15, 2019 · 6 comments · Fixed by #9839 or #9315
Closed

YCSB workloade is too slow for scan operation #9067

siddontang opened this issue Jan 15, 2019 · 6 comments · Fixed by #9839 or #9315
Assignees
Labels
sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@siddontang
Copy link
Member

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    I start a cluster with 1 PD + 1 TiKV + 1 TiDB, run go-ycsb with workloade like:
./bin/go-ycsb run tidb -p recordcount=1000000 -p operationcount=1000000 -p threadcount=100 -p fieldcount=5 -p fieldlength=16 -p maxscanlength=10 -p mysql.port=4000 -P workloads/workloade

The workloade is little changed with 100% scan:

workload=core

readallfields=true

readproportion=0
updateproportion=0
scanproportion=1
insertproportion=0

requestdistribution=zipfian

maxscanlength=100

scanlengthdistribution=uniform
  1. What did you expect to see,

Scanning at most 10 items has better performance.

  1. What did you see instead?

The performance is very pool.

From TiDB log

2019/01/15 16:10:46.446 coprocessor.go:675: [info] [TIME_COP_PROCESS] resp_time:813.877886ms txn_start_ts:405678285283917833 region_id:2 store_addr:127.0.0.1:20160 kv_process_ms:637 scan_total_write:293207 scan_processed_write:293206 scan_total_data:293206 scan_processed_data:293206 scan_total_lock:1 scan_processed_lock:0
2019/01/15 16:10:46.447 adapter.go:402: [warning]

[SLOW_QUERY] cost_time:0.814203953s process_time:0.776s wait_time:0.523s request_count:3 total_keys:358762 processed_keys:358759 succ:true con:430 user:[email protected] txn_start_ts:405678285283917833 database:test table_ids:[31],sql:SELECT field0,field1,field2,field3,field4 FROM usertable WHERE YCSB_KEY >= ? LIMIT ? [arguments: user6438141073078409495, 9]

And TiKV log:

2019/01/15 16:10:43.482 INFO tracker.rs:175: [region 34] [slow-query] execute takes 1.769s, wait takes 316ms, peer: Some("ipv4:127.0.0.1:38990"), start_ts: Some(405678284182913026), table_id: Some(31), tag: select (desc: Some(false)) [keys: 963286, hit: 963284, ranges: 1 (Some(start: "t\200\000\000\000\000\000\000\037_r\200\000\000\000\000\003\263\226" end: "t\200\000\000\000\000\000\000\037_r\200\000\000\000\000\013\r\000")), perf: PerfStatisticsDelta(PerfStatisticsFields { internal_key_skipped_count: 963283, internal_delete_skipped_count: 0, block_cache_hit_count: 1237, block_read_count: 0, block_read_byte: 0 })]

Seem Scan scans too many data, and the limit expression may not be pushed to TiKV.

The schema is

MySQL [test]> desc usertable;
+----------+-------------+------+------+---------+-------+
| Field    | Type        | Null | Key  | Default | Extra |
+----------+-------------+------+------+---------+-------+
| YCSB_KEY | varchar(64) | NO   | PRI  | NULL    |       |
| FIELD0   | varchar(16) | YES  |      | NULL    |       |
| FIELD1   | varchar(16) | YES  |      | NULL    |       |
| FIELD2   | varchar(16) | YES  |      | NULL    |       |
| FIELD3   | varchar(16) | YES  |      | NULL    |       |
| FIELD4   | varchar(16) | YES  |      | NULL    |       |
+----------+-------------+------+------+---------+-------+
  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?
| Release Version: v2.1.0-rc.3-476-g3835bef-dirty
Git Commit Hash: 3835bef7ae67ac297bc307d87941d525ecc68308
Git Branch: master
UTC Build Time: 2019-01-15 07:27:34
GoVersion: go version go1.11.4 linux/amd64
Race Enabled: false
TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
Check Table Before Drop: false |
@siddontang siddontang added the type/bug The issue is confirmed as a bug. label Jan 15, 2019
@eurekaka
Copy link
Contributor

If the table is empty, the plan looks like:

MySQL [test]> explain SELECT field0,field1,field2,field3,field4 FROM usertable WHERE YCSB_KEY >= 'user6438141073078409495' LIMIT 9;
+----------------------------+-------+------+-------------------------------------------------------------------------------------------------------------------+
| id                         | count | task | operator info                                                                                                     |
+----------------------------+-------+------+-------------------------------------------------------------------------------------------------------------------+
| Projection_7               | 9.00  | root | test.usertable.field0, test.usertable.field1, test.usertable.field2, test.usertable.field3, test.usertable.field4 |
| └─Limit_9                  | 9.00  | root | offset:0, count:9                                                                                                 |
|   └─IndexLookUp_19         | 9.00  | root |                                                                                                                   |
|     ├─Limit_18             | 9.00  | cop  | offset:0, count:9                                                                                                 |
|     │ └─IndexScan_16       | 9.00  | cop  | table:usertable, index:YCSB_KEY, range:["user6438141073078409495",+inf], keep order:false, stats:pseudo           |
|     └─TableScan_17         | 9.00  | cop  | table:usertable, keep order:false                                                                                 |
+----------------------------+-------+------+-------------------------------------------------------------------------------------------------------------------+

so the limit is pushed down to TiKV if table is empty.
I have run ./bin/go-ycsb run tidb -p recordcount=1000000 -p operationcount=1000000 -p threadcount=100 -p fieldcount=5 -p fieldlength=16 -p maxscanlength=10 -p mysql.port=4000 -P workloads/workloade and checked the table usertable at the same time, looks like the table is empty during the workload:

MySQL [test]> select count(*) from usertable;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

so the limit should have been pushed down in this workload, am I missing anything here? is the table supposed to be empty in the test workload?

@siddontang
Copy link
Member Author

No, you need to load data at first.

@breeswish has already told me that now TiDB doesn't support this query, we need to use index hint forcibly.

@eurekaka
Copy link
Contributor

@breeswish Could you please elaborate more on this problem here? thanks.

@alivxxx
Copy link
Contributor

alivxxx commented Jan 16, 2019

@eurekaka From @breeswish's info:

  • The YCSB_KEY is increasing in the table;
  • There are 1000000 records in the table;
  • From processed_keys:358759, we can verify that they are about 65% keys that satisfy the condition YCSB_KEY >= 'user6438141073078409495'.

So the planner thinks it is better to choose the table scan, as it assumes the keys are randomly spreaded in table, so scans 9/0.65 rows is enough. It is a known issue and will be fixed in the later.

@breezewish
Copy link
Member

Hi @eurekaka This is the actual slow log:

2019/01/15 16:40:39.404 INFO tracker.rs:175: [region 2] [slow-query] execute takes 1.022s, wait takes 259ms, peer: Some("ipv4:127.0.0.1:40816"), start_ts: Some(405678755182018573), table_id: Some(31), tag: select (desc: Some(false)) [keys: 586414, hit: 586412, ranges: 1 (Some(start: "t\200\000\000\000\000\000\000\037_r\200\000\000\000\000\013\r\000" end: "t\200\000\000\000\000\000\000\037_r\377\377\377\377\377\377\377\377\000")), perf: PerfStatisticsDelta(PerfStatisticsFields { internal_key_skipped_count: 586411, internal_delete_skipped_count: 0, block_cache_hit_count: 867, block_read_count: 0, block_read_byte: 0 })]
2019/01/15 16:40:39.404 adapter.go:402: [warning] [SLOW_QUERY] cost_time:1.2815635969999999s process_time:1.022s wait_time:0.899s request_count:3 total_keys:293219 processed_keys:293216 succ:true con:79 user:[email protected] txn_start_ts:405678755182018573 database:test table_ids:[31],sql:SELECT field0,field1,field2,field3,field4 FROM usertable WHERE YCSB_KEY >= ? LIMIT ? [arguments: user6107727503211566506, 5]

The SQL is:

SELECT field0,field1,field2,field3,field4 FROM usertable WHERE YCSB_KEY >= "user6107727503211566506" LIMIT 5;

The plan is:

MySQL [test]> explain SELECT field0,field1,field2,field3,field4 FROM usertable WHERE YCSB_KEY >= "user6107727503211566506" LIMIT 5;
+----------------------------+-------+------+-------------------------------------------------------------------------------------------------------------------+
| id                         | count | task | operator info                                                                                                     |
+----------------------------+-------+------+-------------------------------------------------------------------------------------------------------------------+
| Projection_7               | 5.00  | root | test.usertable.field0, test.usertable.field1, test.usertable.field2, test.usertable.field3, test.usertable.field4 |
| └─Limit_8                  | 5.00  | root | offset:0, count:5                                                                                                 |
|   └─TableReader_15         | 5.00  | root | data:Limit_14                                                                                                     |
|     └─Limit_14             | 5.00  | cop  | offset:0, count:5                                                                                                 |
|       └─Selection_12       | 5.00  | cop  | ge(test.usertable.ycsb_key, "user6107727503211566506")                                                            |
|         └─TableScan_11     | 7.76  | cop  | table:usertable, range:[-inf,+inf], keep order:false                                                              |
+----------------------------+-------+------+--------------------------------------------------------------

Notice that selectivity is high:

MySQL [test]> select count(*) from usertable  WHERE YCSB_KEY >= "user6107727503211566506" LIMIT 5;
+----------+
| count(*) |
+----------+
|   652883 |
+----------+
1 row in set (0.47 sec)
MySQL [test]> select count(*) from usertable;
+----------+
| count(*) |
+----------+
|  1017429 |
+----------+
1 row in set (0.38 sec)

I have already talked with @lamxTyler about this issue, you can ask him for further information.

@zz-jason zz-jason added the sig/planner SIG: Planner label Jan 16, 2019
@eurekaka
Copy link
Contributor

@breeswish @lamxTyler thanks for the clarification, this can be solved after we know the order correlation between column and table/index. Keep this issue opened here for later revisit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
5 participants