Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: consistency checker queue continued to increase after workload stopped #28377

Closed
robert-s-lee opened this issue Aug 8, 2018 · 3 comments
Assignees

Comments

@robert-s-lee
Copy link
Contributor

read through #21824. Want to confirm the expected behavior.

BUG REPORT

Created test case per #28376

  • Which version of CockroachDB are you using?

v2.1.0-alpha.20180730 on osx

  • What did you do?
for i in `seq 10000`; do cockroach sql --insecure --database test -e "create database if not exists db${i}; use db${i}; create table if not exists test${i} (id int primary key); insert into test${i} values (1),(2),(3);"; done

stopped the test after creating 8,000 databases.

  • What did you expect to see?

even on a small MacBook, the database was operational.

  • What did you see instead?

checking for abnormal metric and saw that consistency checker queue continued to increase after workload stopped.

screen shot 2018-08-08 at 9 10 14 am

screen shot 2018-08-08 at 9 10 08 am

I180808 13:11:07.180262 175 storage/replica_proposal.go:222  [n1,s1,r22/1:/Table/5{1-2}] new range lease repl=(n1,s1):1 seq=4 start=1533733867.175762791,0 epo=1 pro=1533733867.175800343,0 following repl=(n3,s3):2 seq=3 start=1533647315.171325321,1 epo=1 pro=1533647316.120851806,0
I180808 13:11:07.464286 185 storage/replica_proposal.go:222  [n1,s1,r11463/1:/Table/1145{7-8}] new range lease repl=(n1,s1):1 seq=5 start=1533733867.461558025,0 epo=1 pro=1533733867.461561871,0 following repl=(n3,s3):2 seq=4 start=1533725275.949570639,0 epo=2 pro=1533725275.949596737,0
I180808 13:11:08.520167 142 storage/replica_proposal.go:222  [n1,s1,r5279/1:/Table/527{3-4}] new range lease repl=(n1,s1):1 seq=5 start=1533733868.516460494,0 epo=1 pro=1533733868.516464838,0 following repl=(n3,s3):2 seq=4 start=1533725275.949570639,0 epo=2 pro=1533725275.949596737,0
I180808 13:11:09.537130 145 storage/replica_proposal.go:222  [n1,s1,r5310/1:/Table/530{4-5}] new range lease repl=(n1,s1):1 seq=5 start=1533733869.533432123,0 epo=1 pro=1533733869.533435833,0 following repl=(n3,s3):2 seq=4 start=1533725275.949570639,0 epo=2 pro=1533725275.949596737,0
I180808 13:11:10.636281 183 storage/replica_proposal.go:222  [n1,s1,r12234/1:/Table/1222{8-9}] new range lease repl=(n1,s1):1 seq=5 start=1533733870.632955126,0 epo=1 pro=1533733870.632959801,0 following repl=(n3,s3):2 seq=4 start=1533725275.949570639,0 epo=2 pro=1533725275.949596737,0
I180808 13:11:15.481832 229 server/status/runtime.go:425  [n1] runtime stats: 606 MiB RSS, 173 goroutines, 311 MiB/1.2 GiB/1.8 GiB GO alloc/idle/total, 226 MiB/270 MiB CGO alloc/total, 190.00cgo/sec, 0.13/0.03 %(u/s)time, 0.00 %gc (1x)
I180808 13:11:16.042702 177 storage/replica_proposal.go:222  [n1,s1,r11195/1:/Table/111{89-90}] new range lease repl=(n1,s1):1 seq=5 start=1533733876.040034707,0 epo=1 pro=1533733876.040038989,0 following repl=(n3,s3):2 seq=4 start=1533725275.949570639,0 epo=2 pro=1533725275.949596737,0
I180808 13:11:18.227249 180 storage/replica_proposal.go:222  [n1,s1,r6809/1:/Table/680{3-4}] new range lease repl=(n1,s1):1 seq=5 start=1533733878.223059716,0 epo=1 pro=1533733878.223064142,0 following repl=(n3,s3):2 seq=4 start=1533725275.949570639,0 epo=2 pro=1533725275.949596737,0
I180808 13:11:20.962705 522080 rpc/nodedialer/nodedialer.go:90  [n1] connection to n2 established
I180808 13:11:22.432859 140 storage/replica_proposal.go:222  [n1,s1,r13805/1:/Table/13{799-800}] new range lease repl=(n1,s1):1 seq=5 start=1533733882.428899568,0 epo=1 pro=1533733882.428905510,0 following repl=(n3,s3):2 seq=4 start=1533725275.949570639,0 epo=2 pro=1533725275.949596737,0
I180808 13:11:23.440705 141 storage/replica_proposal.go:222  [n1,s1,r14739/1:/Table/1473{3-4}] new range lease repl=(n1,s1):1 seq=5 start=1533733883.436914841,0 epo=1 pro=1533733883.436919395,0 following repl=(n3,s3):2 seq=4 start=1533725275.949570639,0 epo=2 pro=1533725275.949596737,0
  • What was the impact?
@robert-s-lee
Copy link
Contributor Author

consistency checker queue looks to be declining slowly. The decline seems to match absence of leaseholder transfers (from Range Operations graph). What is the relationship between the two metric? Is this normal? Any downside to having high consistency checker queue?

screen shot 2018-08-08 at 10 24 22 am

screen shot 2018-08-08 at 10 23 46 am

@tbg
Copy link
Member

tbg commented Aug 8, 2018

I haven't checked the specific numbers, but the consistency checker works differently from most other queues. It tries to spread out its load over its full checking cycle (~24h). Yet, replicas are queued when they become eligible for a check (after 24h). So when lots of ranges are created around the same time, they will all become eligible at the same time at the first, and the bump should only disperse after ~1 day. If you're seeing this in a cluster that has had an approximately constant number of ranges for multiple days, there's a problem. Otherwise, it's expected. From your description ("created 8k databases") it seems expected.

@robert-s-lee
Copy link
Contributor Author

@tschottdorf thank you for the info. Make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants