Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: large latency spikes when creating a large storing index while running tpcc #45888

Closed
rohany opened this issue Mar 9, 2020 · 8 comments
Closed
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. T-disaster-recovery X-stale

Comments

@rohany
Copy link
Contributor

rohany commented Mar 9, 2020

Investigation into #44501 and #44504 have uncovered that the core of the problem seems to be creating the index that stores all the columns.

To reproduce:

roachprod create $CLUSTER -n 4 --clouds=aws  --aws-machine-type-ssd=c5d.4xlarge
roachprod stage $CLUSTER:1-3 cockroach
roachprod stage $CLUSTER:4 workload
roachprod start $CLUSTER:1-3
roachprod adminurl --open $CLUSTER:1
roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=2500 --db=tpcc --checks=false"
roachprod run $CLUSTER:4 "./workload run tpcc --ramp=5m --warehouses=2500 --active-warehouses=2000 --split --scatter {pgurl:1-3}"

After the ramp period, run in another shell

roachprod sql $CLUSTER:3
> use tpcc;
> create unique index on customer (c_w_id, c_d_id, c_id) storing (c_first, c_middle, c_last, c_street_1, c_street_2, c_city, c_state, c_zip, c_phone, c_since, c_credit, c_credit_lim, c_discount, c_balance, c_ytd_payment, c_payment_cnt, c_delivery_cnt, c_data);

After some time, large p99 latency spikes can be witnessed, sometimes going up to multiple seconds.

Epic CRDB-8816

Jira issue: CRDB-5120

@rohany rohany self-assigned this Mar 9, 2020
@awoods187 awoods187 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. labels Mar 9, 2020
@rohany
Copy link
Contributor Author

rohany commented Mar 9, 2020

Investigation and discussion with @ajwerner seems to have led to some understanding of this. The sql latency spikes observed can be directly correlated with spikes transaction restarts. Intuition for this is the following: when we create a second primary key, we introduce a large amount of new artificial contention, as all txns that update rows in the primary key also have to go and update the row in the index being built.

Due to this, it seems that running tpcc with 2k warehouses and a large storing index will overload a 3 node cluster. I see fewer latency spikes at 1k warehouses, and only 1 spike with 500 warehouses.

However, it is unclear what users should do if they have a workload running and need to change their primary key.

@ajwerner
Copy link
Contributor

  1. This feels very much the same as schema: latency spike with new index backfill on tpc-c 10k (with partitioning) #36925

Interestingly this relates to some other discussions about the inner workings of schema changes which perform backfills to new indexes. Here's some words that relate to several recent conversations. The thrust of this is (a) adding a new

Today we have two forms of index backfill, the "index backfill" and the "column backfill". When we perform that column backfill we do that work in a normal transaction because we're writing to a live index. For "index backfill" we're writing to an index that hasn't been written to (sort of). Rather, we put the new index backfill in DELETE_AND_WRITE_ONLY mode where foreground writes go to the new index in parallel with the AddSSTable. This is safe because the MVCC timestamps in the AddSSTable will precede all foreground writes.

Unfortunately there's no access to this fanciness for "column backfill". The upshot of column backfill is that it doesn't create a second index which foreground traffic needs to write to. This means that for transactions which used to hit the 1PC optimization, they still do. Furthermore, for transactions which are contended, the contention footprint is not doubled.

As we can see in this specific issue here, that contention increase can be a big deal.

For upcoming work on #9851 (changing data types), we're got what seems like two bad options.

  1. We could do it all with column backfills. That would require two passes, one to add the new column type and then another to remove the old one. Furthermore, column backfills are quite a bit less efficient.

  2. We could do it with an index backfill. Create a new index with the new column type then swap out the old one. This is bad because in the meantime, clients will need to write to both. When you rely on the 1PC optimization, it's a huge perf win and can make or break a workload

The concrete proposal here is to buy into #36850 and give ourselves a third option.

@dt
Copy link
Member

dt commented Mar 12, 2020

Intuition for this is the following: when we create a second primary key, we introduce a large amount of new artificial contention, as all txns that update rows in the primary key also have to go and update the row in the index being built.

...

This feels very much the same as #36925

I'm having trouble squaring these in my understanding: if it were just the fact that we have a secondary index, then wouldn't the latency linger post backfill? My understanding from #36925 was that it was only _during the backfill that we see issues?

@dt
Copy link
Member

dt commented Mar 12, 2020

I mentioned this in person last week but I think this investigation would be well served by a trace of a query which sees the latency spike, run before, during and after the backfill that shows exactly where we are spending the extra time.

@ajwerner
Copy link
Contributor

I mentioned this in person last week but I think this investigation would be well served by a trace of a query which sees the latency spike, run before, during and after the backfill that shows exactly where we are spending the extra time.

That's reasonable. You're right that if adding the index is going to cause problems, keeping the backfill out of the way is not going to solve anything. When I entered the conversation my mindset had been on the primary key changes where we're going to remove the old index after the backfill.

@rohany
Copy link
Contributor Author

rohany commented Mar 12, 2020

I mentioned this in person last week but I think this investigation would be well served by a trace of a query which sees the latency spike, run before, during and after the backfill that shows exactly where we are spending the extra time.

I was having trouble collecting traces with jaeger due to memory issues, but traces that I collected from the /debug/ endpoint were just evident of a large number of retries (seen on the admin UI as well). The traces were sql.txn, and showed the txns got pushed for a long time -- some of the largest offenders were up to 60 seconds.

At least for primary key changes the old primary index is dropped so that after the backfill everything calms down. I'll try again with keeping the storing index around.

@awoods187
Copy link
Contributor

Whats our thinking now that we've completed additional investigations here?

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. T-disaster-recovery X-stale
Projects
No open projects
Archived in project
Development

No branches or pull requests

6 participants