Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema: latency spike with new index backfill on tpc-c 10k (with partitioning) #36925

Closed
awoods187 opened this issue Apr 17, 2019 · 7 comments
Closed
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors T-disaster-recovery X-stale

Comments

@awoods187
Copy link
Contributor

awoods187 commented Apr 17, 2019

Describe the problem

I ran tpc-c 10k (with partitioning set) and passed with 97% efficiency:

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  900.0s   124949.7  97.2%    268.7    251.7    520.1    604.0    838.9   3623.9

Then I ran a schema change:
CREATE UNIQUE INDEX ON tpcc.order (o_entry_d, o_w_id, o_d_id, o_carrier_id, o_id);

I immediately saw a latency spike + QPS drop:
image

I tried running again after the schema change finished:
image

To Reproduce

set -euxo pipefail
export CLUSTER=andy-10k
roachprod create $CLUSTER -n 16 --clouds=aws  --aws-machine-type-ssd=c5d.4xlarge
roachprod run $CLUSTER --  "sudo umount /mnt/data1; sudo mount -o discard,defaults,nobarrier /mnt/data1/; mount | grep /mnt/data1"
roachprod stage $CLUSTER:1-15 cockroach c65b71a27e4d0941bf9427b5dec1ff7f096bba7b
roachprod stage $CLUSTER:16 workload
roachprod start $CLUSTER:1-15 --racks=5 
roachprod adminurl --open $CLUSTER:1
roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=10000 --db=tpcc"
roachprod run $CLUSTER:16 "./workload run tpcc --warehouses=10000 --partitions=5 --duration=1s --split --scatter {pgurl:1}"
roachprod sql $CLUSTER:1
SET CLUSTER SETTING kv.snapshot_rebalance.max_rate='64MiB'
Wait for replication queue to go to 0
SET CLUSTER SETTING kv.snapshot_rebalance.max_rate='8MiB'
roachprod run $CLUSTER:16 "./workload run tpcc --ramp=10m --warehouses=10000 --active-warehouses=10000 --partitions=5 --duration=15m --split --scatter {pgurl:1-15}"

Environment:
v19.1.0-rc.3-11-gc65b71a

Debug.zip
https://drive.google.com/open?id=17ZYGOYU7Dpo7Yta6KhHTAf0nUFyRZSgg

Extract from the latency spike till now.
extract.txt

Epic CRDB-8816

Jira issue: CRDB-4469

@awoods187 awoods187 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors labels Apr 17, 2019
@awoods187
Copy link
Contributor Author

I dropped the index and set the cluster back up to run tpc-c. I confirmed that it passed again. Then, I ran tpc-c and killed one node:
image
We see a momentary spike of latency but then it comes gracefully back down.
Here's a six hour view to compare the behavior:
image

cc @bdarnell what do you think about this? The cluster is definitely worse with a schema change as opposed to with a dead node.

@bdarnell
Copy link
Contributor

So the schema change was running from about 20:20 to 21:10, and then from 21:30 to 22:00 the schema change was no longer running and you were just seeing the impact of the new index? This seems to show that the schema change isn't at fault, but the new schema is. You're adding a new unpartitioned, time-ordered index (o_entry_d is a timestamp) to a partitioned table. This will destroy all the parallelism that the partitioning is trying to create, so it's unsurprising that there's a high cost.

@awoods187
Copy link
Contributor Author

awoods187 commented Apr 18, 2019

Ahh the old partitioning the index ux surprise. I want to run an experiment with the schema change done with no load and then running the load to see what happens. But the index is hung for some reason.
image
goroutines.txt

Next up, if the index completes, I'll partition the index and run.

But, I think the schema change would have to be run first and then partitioned after it's run. So it would have this large initial latency spike regardless of if partitioning solves the sustained latency problem.

@awoods187
Copy link
Contributor Author

awoods187 commented Apr 23, 2019

Alright, I set up a new cluster to test this out with a different schema change:

CREATE UNIQUE INDEX ON tpcc.order (o_w_id ASC, o_d_id ASC, o_carrier_id ASC, o_id ASC) PARTITION BY RANGE (o_w_id) (                                                                                            PARTITION p4_0 VALUES FROM (0) TO (2000),                                                                                                                                                                                           PARTITION p4_1 VALUES FROM (2000) TO (4000),                                                                                                                                                                                        PARTITION p4_2 VALUES FROM (4000) TO (6000),                                                                                                                                                                                        PARTITION p4_3 VALUES FROM (6000) TO (8000),                                                                                                                                                                                        PARTITION p4_4 VALUES FROM (8000) TO (10000));

Immediately the latency spikes to 10+s and stays spiked while QPS drops by about 1/3. Interestingly, the latency on some nodes remind at 200ms.
image
During this same time period we see CPU go down during the initial schema change followed by an increase on some nodes with a rather large spread.
image

The QPS is low and choppy until I initiated a cancel of the schema change job a little after 00:10. The cancel and rollback job runs for 16+minutes until it completes at little before 00:30.
image

Similarly, the CPU pattern continues until I canceled the schema change. Once I did, the CPU becomes much tighter in spread until the rollback completes:
image
image

@awoods187
Copy link
Contributor Author

I killed a node so that we could compare the behavior of the schema change to the cluster with a dead node:
image
image

@awoods187
Copy link
Contributor Author

Debug.zip https://drive.google.com/open?id=1Ql9bhbQ-PrXKOvp9JMauBOoLXtJn1pao
Merged logs for the schema change duration from debug.zip
extract.txt.zip

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors T-disaster-recovery X-stale
Projects
No open projects
Archived in project
Development

No branches or pull requests

4 participants