roachtest: clearrange/checks=false failed #34897

cockroach-teamcity · 2019-02-14T10:05:26Z

SHA: https://github.com/cockroachdb/cockroach/commits/acba091f04f3d8ecabf51009bf394951fbd3643c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1137872&tab=buildLog

The test failed on master:
	cluster.go:1226,clearrange.go:57,clearrange.go:37,test.go:1212: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1137872-clearrange-checks-false:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

The text was updated successfully, but these errors were encountered:

cockroach-teamcity · 2019-02-15T10:33:50Z

SHA: https://github.com/cockroachdb/cockroach/commits/d7c56dcb87c8f187e50303c8e32a64836c42515c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1139797&tab=buildLog

The test failed on master:
	cluster.go:1226,clearrange.go:57,clearrange.go:37,test.go:1212: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1139797-clearrange-checks-false:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

cockroach-teamcity · 2019-02-16T10:24:13Z

SHA: https://github.com/cockroachdb/cockroach/commits/4fc4d63ddddeb507564bfe53c6cd79549ff2fd27

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1141650&tab=buildLog

The test failed on master:
	cluster.go:1226,clearrange.go:88,clearrange.go:37,test.go:1212: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1141650-clearrange-checks-false:1 -- ./cockroach sql --insecure -e "DROP DATABASE IF EXISTS tinybank" returned:
		stderr:
		
		stdout:
		Error: cannot dial server.
		Is the server running?
		If the server is running, check --host client-side and --advertise server-side.
		
		read tcp 127.0.0.1:39356->127.0.0.1:26257: i/o timeout
		Failed running "sql"
		Error:  exit status 1
		: exit status 1

cockroach-teamcity · 2019-02-17T14:16:34Z

SHA: https://github.com/cockroachdb/cockroach/commits/8e9a1e310e3e8e37f091b7ca8bd204084ad9e2e5

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1142461&tab=buildLog

The test failed on master:
	test.go:1200: test timed out (6h30m0s)
	cluster.go:1585,clearrange.go:192,clearrange.go:37,test.go:1212: context canceled

cockroach-teamcity · 2019-02-19T10:21:42Z

SHA: https://github.com/cockroachdb/cockroach/commits/bd80a74f882a583d6bb2a04dfdb57b49254bc7ba

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1143393&tab=buildLog

The test failed on master:
	cluster.go:1226,clearrange.go:88,clearrange.go:37,test.go:1212: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1143393-clearrange-checks-false:1 -- ./cockroach sql --insecure -e "DROP DATABASE IF EXISTS tinybank" returned:
		stderr:
		
		stdout:
		Error: cannot dial server.
		Is the server running?
		If the server is running, check --host client-side and --advertise server-side.
		
		read tcp 127.0.0.1:59582->127.0.0.1:26257: i/o timeout
		Failed running "sql"
		Error:  exit status 1
		: exit status 1

tbg · 2019-02-19T10:33:22Z

This second to last one looks like it's really a test timeout and that the test, ironically, got stuck importing a tiny table:

11:28:31 cluster.go:253: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1142461-clearrange-checks-false:1 -- ./cockroach sql --insecure -e "DROP DATABASE IF EXISTS tinybank"
DROP DATABASE
11:28:36 cluster.go:253: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1142461-clearrange-checks-false:1 -- ./cockroach workload fixtures import bank --db=tinybank --payload-bytes=100 --ranges=10 --rows=800 --seed=1
7: 31137
5: 7054
1: 19076
10: 20811
8: 15433
9: 17108
2: 26324
4: 11799
3: 5286
6: 18016
14:13:59 cluster.go:908: fetching debug zip

14:13:59 is very close to the test timeout of 6:30h (the test run started at 7:42:56).

tbg · 2019-02-19T10:50:10Z

The last one is probably similar, except in this case the ./cockroach sql invocation didn't get a response from the server quick enough and cancelled the test right there.

(Note that this is all before even dropping a table)

tbg · 2019-02-19T11:11:16Z

In the last one, n10 took ~15s to start. It's unclear where all that time went, for example there's a 5s gap very early (:27-:32) that I can't pin on anything in the code (perhaps it's the reporting, that's in the gap between the log messages among other things that shouldn't take that long?)

Starting the Store also seems to eat 5s. Anyway, it takes a while to start is the bottom line here. I wonder why c.Start already returns while that's still going on. We use --background, perhaps that ought to return a little later. @bdarnell, do you have any opinions here? We currently signal readiness relatively early:

cockroach/pkg/server/server.go

Lines 1375 to 1380 in 475958d

    
           if len(bootstrappedEngines) > 0 { 
        
           	// The cluster was already initialized. 
        
           	doBootstrap = false 
        
           	if s.cfg.ReadyFn != nil { 
        
           		s.cfg.ReadyFn(false /*waitForInit*/) 
        
           	}

but that's probably for a good reason. Intuitively I would've thought readiness would be signaled with this message

teamcity-1143393-clearrange-checks-false-0004> I190219 10:19:43.065628 25 cli/start.go:695 node startup completed:
CockroachDB node starting at 2019-02-19 10:19:43.065405278 +0000 UTC (took 15.7s)
build: CCL v2.2.0-alpha.20190211-218-gbd80a74 @ 2019/02/19 06:46:22 (go1.11.4)
webui: http://teamcity-1143393-clearrange-checks-false-0004:26258
...

Anyway, the more interesting failure is that in which the RESTORE just gets stuck basically forever.

I190219 10:19:27.353927 1 util/log/clog.go:1197  [config] file created at: 2019/02/19 10:19:27
I190219 10:19:27.353927 1 util/log/clog.go:1197  [config] running on machine: teamcity-1143393-clearrange-checks-false-0004
I190219 10:19:27.353927 1 util/log/clog.go:1197  [config] binary: CockroachDB CCL v2.2.0-alpha.20190211-218-gbd80a74 (x86_64-unknown-linux-gnu, built 2019/02/19 06:46:22, go1.11.4)
I190219 10:19:27.353927 1 util/log/clog.go:1197  [config] arguments: [./cockroach start --insecure --store=path=/mnt/data1/cockroach --log-dir=/root/logs --cache=25% --max-sql-memory=25% --port=26257 --http-port=26258 --locality=cloud=gce
,region=us-east1,zone=us-east1-b --join=35.196.17.241:26257]
I190219 10:19:27.353927 1 util/log/clog.go:1197  line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid file:line msg utf8=✓
I190219 10:19:27.353926 1 cli/start.go:1031  logging to directory /root/logs
W190219 10:19:27.355571 1 cli/start.go:1066  RUNNING IN INSECURE MODE!

- Your cluster is open for any client that can access <all your IP addresses>.
- Any user, even root, can log in without providing a password.
- Any user, connecting as root, can read or write any data in your cluster.
- There is no network encryption nor authentication, and thus no confidentiality.

Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v2.2/secure-a-cluster.html
I190219 10:19:27.355730 1 server/status/recorder.go:610  available memory from cgroups (8.0 EiB) exceeds system memory 15 GiB, using system memory
I190219 10:19:27.355742 1 cli/start.go:1080  CockroachDB CCL v2.2.0-alpha.20190211-218-gbd80a74 (x86_64-unknown-linux-gnu, built 2019/02/19 06:46:22, go1.11.4)
I190219 10:19:32.448960 1 server/status/recorder.go:610  available memory from cgroups (8.0 EiB) exceeds system memory 15 GiB, using system memory
I190219 10:19:32.448980 1 server/config.go:382  system total memory: 15 GiB
I190219 10:19:32.449021 1 server/config.go:384  server configuration:
max offset             500000000
cache size             3.7 GiB
SQL memory pool size   3.7 GiB
scan interval          10m0s
scan min idle time     10ms
scan max idle time     1s
event log enabled      true
I190219 10:19:32.449040 1 cli/start.go:920  using local environment variables: COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING=1, COCKROACH_ENABLE_RPC_COMPRESSION=false, COCKROACH_BACKGROUND_RESTART=1
I190219 10:19:32.449051 1 cli/start.go:927  process identity: uid 0 euid 0 gid 0 egid 0
I190219 10:19:32.449060 1 cli/start.go:552  starting cockroach node
I190219 10:19:32.454990 25 storage/engine/rocksdb.go:573  opening rocksdb instance at "/mnt/data1/cockroach/cockroach-temp154990427"
I190219 10:19:32.495350 25 server/server.go:871  [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I190219 10:19:32.499004 25 storage/engine/rocksdb.go:573  opening rocksdb instance at "/mnt/data1/cockroach"
I190219 10:19:32.727661 25 server/config.go:490  [n?] 1 storage engine initialized
I190219 10:19:32.727678 25 server/config.go:493  [n?] RocksDB cache size: 3.7 GiB
I190219 10:19:32.727691 25 server/config.go:493  [n?] store 0: RocksDB, max size 0 B, max open file limit 60536
W190219 10:19:32.739146 25 gossip/gossip.go:1496  [n?] no incoming or outgoing connections
W190219 10:19:32.739273 25 cli/start.go:894  neither --listen-addr nor --advertise-addr was specified.
The server will advertise "teamcity-1143393-clearrange-checks-false-0004" to other nodes, is this routable?

Consider using:
- for local-only servers:  --listen-addr=localhost
- for multi-node clusters: --advertise-addr=<host/IP addr>
I190219 10:19:32.741693 25 server/server.go:921  [n?] Sleeping till wall time 1550571572741660520 to catches up to 1550571572995315950 to ensure monotonicity. Delta: 253.65543ms
I190219 10:19:32.743194 30 gossip/client.go:128  [n?] started gossip client to 35.196.17.241:26257
I190219 10:19:32.997672 25 gossip/gossip.go:392  [n10] NodeDescriptor set to node_id:10 address:<network_field:"tcp" address_field:"teamcity-1143393-clearrange-checks-false-0004:26257" > attrs:<> locality:<tiers:<key:"cloud" value:"gce" > tiers:<key:"region" value:"us-east1" > tiers:<key:"zone" value:"us-east1-b" > > ServerVersion:<major_val:2 minor_val:1 patch:0 unstable:8 > build_tag:"v2.2.0-alpha.20190211-218-gbd80a74" started_at:1550571572996001692
I190219 10:19:33.010052 30 gossip/client.go:133  [n10] closing client to n1 (35.196.17.241:26257): received forward from n1 to 2 (teamcity-1143393-clearrange-checks-false-0006:26257)
I190219 10:19:33.013728 121 gossip/client.go:128  [n10] started gossip client to teamcity-1143393-clearrange-checks-false-0006:26257
I190219 10:19:33.740412 110 gossip/client.go:128  [n10] started gossip client to teamcity-1143393-clearrange-checks-false-0006:26257
I190219 10:19:33.741262 110 gossip/client.go:133  [n10] closing client to n2 (teamcity-1143393-clearrange-checks-false-0006:26257): rpc error: code = Unknown desc = duplicate connection from node at teamcity-1143393-clearrange-checks-false-0004:26257
I190219 10:19:34.756232 236 gossip/client.go:128  [n10] started gossip client to teamcity-1143393-clearrange-checks-false-0008:26257
I190219 10:19:37.391596 57 gossip/gossip.go:1510  [n10] node has connected to cluster via gossip
W190219 10:19:38.998130 47 storage/closedts/provider/provider.go:144  [ct-closer] unable to move closed timestamp forward: not live
I190219 10:19:43.033326 25 server/node.go:461  [n10] initialized store [n10,s10]: disk (capacity=314 GiB, available=143 GiB, used=173 GiB, logicalBytes=195 GiB), ranges=6842, leases=0, queries=0.00, writes=0.00, bytesPerReplica={p10=18708296.00 p25=33600168.00 p50=33600168.00 p75=33600168.00 p90=33607164.00 pMax=33607164.00}, writesPerReplica={p10=0.00 p25=0.00 p50=0.00 p75=0.00 p90=0.00 pMax=0.00}
I190219 10:19:43.033688 25 storage/stores.go:243  [n10] read 9 node addresses from persistent storage
I190219 10:19:43.033990 25 server/node.go:699  [n10] connecting to gossip network to verify cluster ID...
I190219 10:19:43.034012 25 server/node.go:719  [n10] node connected via gossip and verified as part of cluster "63dac28d-1f60-4c52-88d9-a4285e33e98e"
I190219 10:19:43.034233 25 server/node.go:542  [n10] node=10: started with [<no-attributes>=/mnt/data1/cockroach] engine(s) and attributes []
I190219 10:19:43.034498 25 server/status/recorder.go:610  [n10] available memory from cgroups (8.0 EiB) exceeds system memory 15 GiB, using system memory
I190219 10:19:43.035901 25 server/server.go:1577  [n10] starting http server at [::]:26258 (use: teamcity-1143393-clearrange-checks-false-0004:26258)
I190219 10:19:43.035924 25 server/server.go:1579  [n10] starting grpc/postgres server at [::]:26257
I190219 10:19:43.035941 25 server/server.go:1580  [n10] advertising CockroachDB node at teamcity-1143393-clearrange-checks-false-0004:26257
I190219 10:19:43.065268 25 server/server.go:1647  [n10] done ensuring all necessary migrations have run
I190219 10:19:43.065296 25 server/server.go:1650  [n10] serving sql connections
I190219 10:19:43.065585 25 cli/start.go:687  [config] clusterID: 63dac28d-1f60-4c52-88d9-a4285e33e98e
I190219 10:19:43.065628 25 cli/start.go:695  node startup completed:
CockroachDB node starting at 2019-02-19 10:19:43.065405278 +0000 UTC (took 15.7s)

See cockroachdb#34897 (comment). Release note: None

bdarnell · 2019-02-19T17:07:58Z

We use --background, perhaps that ought to return a little later. @bdarnell, do you have any opinions here? We currently signal readiness relatively early:

Yeah, this should probably be later. It should at least be moved below the ensureClockMonotonicity call later in the same block, and probably even later. I think one reason this is where it is is that we need (needed?) two different paths for initialized and uninitialized clusters, but I think we just stuffed it in the one if alreadyBootstrapped block we already had instead of thinking about where it really belongs.

tbg · 2019-02-19T19:32:01Z

Filed #35070 for the readiness signaling.

See cockroachdb#34897 (comment). Release note: None

35050: roachtest: misc clearrange improvements r=petermattis a=tbg Touches #34897. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>

cockroach-teamcity · 2019-02-22T14:00:21Z

SHA: https://github.com/cockroachdb/cockroach/commits/46d1ab4c06edfd37d875114972c55b44105acd83

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1147903&tab=buildLog

The test failed on master:
	test.go:1199: test timed out (6h30m0s)
	cluster.go:1226,clearrange.go:58,clearrange.go:37,test.go:1211: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1147903-clearrange-checks-false:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		: signal: killed

cockroach-teamcity · 2019-02-23T14:07:43Z

SHA: https://github.com/cockroachdb/cockroach/commits/d0a93d286ee42ceb94f986b6a06a1afd52cf914e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1149020&tab=buildLog

The test failed on master:
	test.go:1199: test timed out (6h30m0s)
	cluster.go:1585,clearrange.go:189,clearrange.go:37,test.go:1211: context canceled

cockroach-teamcity · 2019-02-24T14:01:11Z

SHA: https://github.com/cockroachdb/cockroach/commits/eaad50f0ea64985a0d7da05abb00cc9f321c5fa9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1149743&tab=buildLog

The test failed on master:
	test.go:1199: test timed out (6h30m0s)
	cluster.go:1585,clearrange.go:189,clearrange.go:37,test.go:1211: context canceled

cockroach-teamcity · 2019-03-02T13:56:04Z

SHA: https://github.com/cockroachdb/cockroach/commits/032c4980720abc1bdd71e4428e4111e6e6383297

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1158877&tab=buildLog

The test failed on master:
	test.go:1199: test timed out (6h30m0s)
	cluster.go:1585,clearrange.go:189,clearrange.go:37,test.go:1211: context canceled

cockroach-teamcity · 2019-03-03T09:34:39Z

SHA: https://github.com/cockroachdb/cockroach/commits/d888b76df319571afe4d5816f1a1f0f53905653f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1159641&tab=buildLog

The test failed on master:
	cluster.go:1226,clearrange.go:58,clearrange.go:37,test.go:1211: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1159641-clearrange-checks-false:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190303 07:40:23.870359 1 ccl/workloadccl/cliccl/fixtures.go:321  starting import of 1 tables
		I190303 09:31:58.723951 68 ccl/workloadccl/fixture.go:498  imported bank (1h51m35s, 0 rows, 0 index entries, 0 B)
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

cockroach-teamcity · 2019-03-04T09:49:35Z

SHA: https://github.com/cockroachdb/cockroach/commits/de1793532332fb64fca27cafe92d2481d900a5a0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1160394&tab=buildLog

The test failed on master:
	cluster.go:1585,clearrange.go:189,clearrange.go:37,test.go:1211: pq: count-leases: AS OF SYSTEM TIME: cannot specify timestamp in the future

cockroach-teamcity · 2019-03-06T09:56:38Z

SHA: https://github.com/cockroachdb/cockroach/commits/959dcf7de0f94cfcfa0062387b109adebd1f11da

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1163702&tab=buildLog

The test failed on master:
	cluster.go:1603,clearrange.go:189,clearrange.go:37,test.go:1214: pq: count-leases: AS OF SYSTEM TIME: cannot specify timestamp in the future

cockroach-teamcity · 2019-03-09T14:18:05Z

SHA: https://github.com/cockroachdb/cockroach/commits/38bb1e7905b89f911bd74be4f5830217ffb7b343

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1168752&tab=buildLog

The test failed on master:
	test.go:1202: test timed out (6h30m0s)
	cluster.go:1603,clearrange.go:189,clearrange.go:37,test.go:1214: context canceled

cockroach-teamcity · 2019-03-10T14:14:39Z

SHA: https://github.com/cockroachdb/cockroach/commits/5f7de3c35348e847dc0d4ab1ba8d76574e1706c2

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1169980&tab=buildLog

The test failed on master:
	test.go:1202: test timed out (6h30m0s)
	cluster.go:1603,clearrange.go:189,clearrange.go:37,test.go:1214: context canceled

cockroach-teamcity · 2019-03-12T10:37:45Z

SHA: https://github.com/cockroachdb/cockroach/commits/9d058d53c8a82fceb2205f1827c26f1bf36c32ba

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1172386&tab=buildLog

The test failed on master:
	cluster.go:1603,clearrange.go:189,clearrange.go:37,test.go:1214: unexpected node event: 7: dead

tbg · 2019-04-11T10:59:16Z

Does COCKROACH_ENGINE_MAX_SYNC_DURATION also apply to writes to the temp storage engine?

No, it does not. This only applies to the engines that are used as stores. This means that if you're seeing the disk checker crash because of activity on the temp engines, then it's because the temp engines starve IO for the main engine.

What difference should it make to the regular store, which is barely taking any writes in the early stages of import, if the temp store is flushing a lot?

I think @petermattis had a past investigation in which CRDB performed poorly because some other process was filling up the OS' write buffer and so even small syncs done by RocksDB could take a long time. Perhaps something similar is going on here.

EDIT: one other thing to try is adding more flush threads. It looks like right now the temp store and real store are sharing a single flush thread.

Interesting, how can that be? They're entirely different RocksDB instances. How does that work?

with an unrelated transient issue after 3h11m.

Just curious, what was the transient issue? We're really trying to get rid of transient issues.

petermattis · 2019-04-11T11:39:44Z

I think @petermattis had a past investigation in which CRDB performed poorly because some other process was filling up the OS' write buffer and so even small syncs done by RocksDB could take a long time. Perhaps something similar is going on here.

@tbg's memory is good. Linux's default behavior is to allow too large an amount of dirty file data to accumulate before flushing. When the flushing does occur it can cause significant disk IO starvation for everything on the system. I can't recall off hand what to look for, but vmstat or some file in /proc will indicate the "dirty" data size. In normal CockroachDB usage, the amount stays low because we're frequently syncing the WAL and sstables (as they are written). I'd be mildly surprised if this is coming up here due to the temp-store, but definitely worth checking.

EDIT: one other thing to try is adding more flush threads. It looks like right now the temp store and real store are sharing a single flush thread.

Interesting, how can that be? They're entirely different RocksDB instances. How does that work?

The thread pool could be shared between RocksDB instances, but I'm pretty sure we don't share it. We do share the block cache.

tbg · 2019-04-11T12:26:25Z

We do share the block cache.

Where? Am I looking at the wrong place?

cockroach/pkg/storage/engine/temp_engine.go

Lines 22 to 50 in adf23fa

    
           // NewTempEngine creates a new engine for DistSQL processors to use when the 
        
           // working set is larger than can be stored in memory. 
        
           func NewTempEngine( 
        
           	tempStorage base.TempStorageConfig, storeSpec base.StoreSpec, 
        
           ) (MapProvidingEngine, error) { 
        
           	if tempStorage.InMemory { 
        
           		// TODO(arjun): Limit the size of the store once #16750 is addressed. 
        
           		// Technically we do not pass any attributes to temporary store. 
        
           		return NewInMem(roachpb.Attributes{} /* attrs */, 0 /* cacheSize */), nil 
        
           	} 
        
           	rocksDBCfg := RocksDBConfig{ 
        
           		Attrs: roachpb.Attributes{}, 
        
           		Dir:   tempStorage.Path, 
        
           		// MaxSizeBytes doesn't matter for temp storage - it's not 
        
           		// enforced in any way. 
        
           		MaxSizeBytes:    0, 
        
           		MaxOpenFiles:    128, // TODO(arjun): Revisit this. 
        
           		UseFileRegistry: storeSpec.UseFileRegistry, 
        
           		ExtraOptions:    storeSpec.ExtraOptions, 
        
           	} 
        
           	rocksDBCache := NewRocksDBCache(0) 
        
           	rocksdb, err := NewRocksDB(rocksDBCfg, rocksDBCache) 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	return rocksdb, nil 
        
           }

petermattis · 2019-04-11T13:08:33Z

Where? Am I looking at the wrong place?

I'm a lying liar and you caught me! More accurately, I misremembered. We share the block cache between all of the stores on a node. I wonder if we should use that same shared block cache for the temp store. The current 0-size block cache might make random access DiskMap usage significantly slower.

ajkr · 2019-04-11T14:59:15Z

@tbg's memory is good. Linux's default behavior is to allow too large an amount of dirty file data to accumulate before flushing. When the flushing does occur it can cause significant disk IO starvation for everything on the system. I can't recall off hand what to look for, but vmstat or some file in /proc will indicate the "dirty" data size.

I have some low level strace data showing the tiny WAL write syncs getting stuck behind huge temp store sstable syncs. For example:

[pid 12888] 05:29:15 fdatasync(37</mnt/data1/cockroach/cockroach-temp128692840/001118.sst> <unfinished ...>
--
[pid 12892] 05:29:15 fdatasync(150</mnt/data1/cockroach/cockroach-temp128692840/001119.sst> <unfinished ...>
[pid 12881] 05:29:15 fdatasync(17</mnt/data1/cockroach/000003.log> <unfinished ...>
[pid 12888] 05:29:16 <... fdatasync resumed> ) = 0 <1.638572>
[pid 12892] 05:29:16 <... fdatasync resumed> ) = 0 <1.468135>
[pid 12881] 05:29:16 <... fdatasync resumed> ) = 0 <1.164935>

In normal CockroachDB usage, the amount stays low because we're frequently syncing the WAL and sstables (as they are written). I'd be mildly surprised if this is coming up here due to the temp-store, but definitely worth checking.

bytes_per_sync behaves differently from how I'd expect -- it only requests asynchronous writeback each time bytes_per_sync bytes are written to an sstable. It does this using sync_file_range with SYNC_FILE_RANGE_WRITE flag. So it could be possible no writeback has happened by the time the sstable is finished and fdatasync is called, and then it'll all happen at once.

That option might make more sense together with a rate limiter on background writes. That'd cause us to pause occasionally when generating large files allowing some of the async writebacks to finish. I don't know what rate limit to pick though. If we had some estimate of the device's max throughput that'd be enough info for us to use auto-tuned rate limiter.

ajkr · 2019-04-11T15:08:29Z

Interesting, how can that be? They're entirely different RocksDB instances. How does that work?

In RocksDB the Env object owns the thread pools. The Env we use Env::Default() which is a static object so is the same for anyone in the same process.

Just curious, what was the transient issue? We're really trying to get rid of transient issues.

I think this one was caused by my home Internet connection.

petermattis · 2019-04-11T15:08:57Z

bytes_per_sync behaves differently from how I'd expect -- it only requests asynchronous writeback each time bytes_per_sync bytes are written to an sstable. It does this using sync_file_range with SYNC_FILE_RANGE_WRITE flag. So it could be possible no writeback has happened by the time the sstable is finished and fdatasync is called, and then it'll all happen at once.

Oh, interesting. I had assumed the file syncing was synchronous. I wonder if we should be passing SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WAIT_AFTER.

petermattis · 2019-04-11T15:11:24Z

In RocksDB the Env object owns the thread pools. The Env we use Env::Default() which is a static object so is the same for anyone in the same process.

Globals!!! I'm pretty sure that sharing is unintentional. I think the temp-store should have a separate thread pool with 1 high priority and 1 low priority thread. And sharing the thread pool across all stores on a node is definitely not what was intended. Yikes!

tbg · 2019-04-11T15:14:27Z

Yikes indeed. I'm excited by the better place we'll be in once this investigation is resolved, though.

cockroach-teamcity · 2019-04-11T19:16:37Z

SHA: https://github.com/cockroachdb/cockroach/commits/134478e4dde16919eb86efb81fb22d8cce8a9701

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1236263&tab=buildLog

The test failed on release-19.1:
	cluster.go:1329,clearrange.go:58,clearrange.go:37,test.go:1237: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1236263-clearrange-checks-false:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		stderr:
		
		stdout:
		I190411 18:37:18.507963 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		I190411 19:09:55.396960 52 ccl/workloadccl/fixture.go:516  imported bank (32m37s, 0 rows, 0 index entries, 0 B)
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

ajkr · 2019-04-12T01:23:21Z

OK finished experimenting with these three changes:

(1) facebook/rocksdb#4460
(2) wal_bytes_per_sync = 0
(3) launching minimum of two flush threads and two compaction threads

All experiments were using COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=3s.

With the three changes combined, the import succeeded in 3h30m11s. I suspected, though, perhaps (3) is the most important, so tried a few more experiments.

Stripped out (2) since that looked least likely to make a difference. The import failed with an unrelated transient issue after 3h11m.

Stripped out both (1) and (2). The import succeeded in 2h53m45s.

Stripped out (1), (2), and (3) (for sanity). The import failed as expected after 10m55s.

So (3) appears to be the major differentiator. But why? What difference should it make to the regular store, which is barely taking any writes in the early stages of import, if the temp store is flushing a lot? Also, why does the import finish slower when we do less WAL writes to the temp store (this seems to agree with @petermattis's benchmark result)? Lots of unknowns here.

Following up on this series, we tried a fourth thing today.

(4) cockroachdb/rocksdb#29

I only ran one experiment which contained changes (1), (2), and (4). Like usual it was run with COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=3s. The import succeeded in 2h35m4s. That's the best we've seen yet.

petermattis · 2019-04-12T01:52:26Z

Was the (1), (2), and (4) experiment run on ZFS? That's an excellent improvement. I think it would be worthwhile to make sure the result is stable and run the test multiple times (you can do this in parallel using roachtest run -c and specify a manually created cluster).

I believe you mentioned elsewhere that ext4 is able to complete the import portion of this test significantly faster. Is that result reproducible?

cockroach-teamcity · 2019-04-12T07:30:19Z

SHA: https://github.com/cockroachdb/cockroach/commits/509c5b130fb1ad0042beb74e083817aa68e4fc92

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=clearrange/checks=false PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1237002&tab=buildLog

The test failed on release-19.1:
	test.go:1225: test timed out (6h30m0s)
	cluster.go:1688,clearrange.go:189,clearrange.go:37,test.go:1237: context canceled

ajkr · 2019-04-12T15:25:19Z

Was the (1), (2), and (4) experiment run on ZFS? That's an excellent improvement. I think it would be worthwhile to make sure the result is stable and run the test multiple times (you can do this in parallel using roachtest run -c and specify a manually created cluster).

I believe you mentioned elsewhere that ext4 is able to complete the import portion of this test significantly faster. Is that result reproducible?

That one was on ZFS. OK, I ran a few more experiments to answer your questions. Summary of all our results from yesterday and today:

ZFS + (1) + (2) + (4): 2h35m4s, 2h32m43, 2h27m51s
ZFS + (4): 2h38m38s
ext4 + (1) + (2) + (4): 1h40m34s, 1h41m39s
ext4 + (4): 1h40m59

So the results are pretty stable, and looks like (1) and (2) don't make a big difference, but also don't hurt.

I realized also we don't really have a baseline since these are all run with COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=3s which always fails on master. Will get one.

petermattis · 2019-04-12T15:33:07Z

Glad to see that the import times are stable. This is great progress and I appreciate your diligence in testing here. I'm surprised at how large the difference is between ZFS and ext4. Looks like (4) is the important change, though I'm totally fine with (1) and (2) as well.

ajkr · 2019-04-12T20:39:36Z

This is great progress and I appreciate your diligence in testing here.

Thanks, hope I didn't run up our GCE bill too much :).

I realized also we don't really have a baseline since these are all run with COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=3s which always fails on master. Will get one.

Got a baseline using COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=false COCKROACH_ENGINE_MAX_SYNC_DURATION=3s on master branch. It was 3h8m15s on ZFS.

Includes the following changes, all of which have landed upstream. - cockroachdb/rocksdb#27: "ldb: set `total_order_seek` for scans" - cockroachdb/rocksdb#28: "Fix cockroachdb#3840: only `SyncClosedLogs` for multiple CFs" - cockroachdb/rocksdb#29: "Optionally wait on bytes_per_sync to smooth I/O" - cockroachdb/rocksdb#30: "Option string/map/file can set env from object registry" Also made the RocksDB changes that we decided in cockroachdb#34897: - Do not sync WAL before installing flush result. This is achieved by backporting cockroachdb/rocksdb#28; no configuration change is necessary. - Do not sync WAL ever for temp stores. This is achieved by setting `wal_bytes_per_sync = 0`. - Limit size of final syncs when generating SSTs. This is achieved by backporting cockroachdb/rocksdb#29 and turning it on with `strict_bytes_per_sync = true`. Release note: None

37172: c-deps: bump rocksdb for multiple backported PRs r=ajkr a=ajkr Includes the following changes, all of which have landed upstream. - cockroachdb/rocksdb#27: "ldb: set `total_order_seek` for scans" - cockroachdb/rocksdb#28: "Fix #3840: only `SyncClosedLogs` for multiple CFs" - cockroachdb/rocksdb#29: "Optionally wait on bytes_per_sync to smooth I/O" - cockroachdb/rocksdb#30: "Option string/map/file can set env from object registry" Also made the RocksDB changes that we decided in #34897: - Do not sync WAL before installing flush result. This is achieved by backporting cockroachdb/rocksdb#28; no configuration change is necessary. - Do not sync WAL ever for temp stores. This is achieved by setting `wal_bytes_per_sync = 0`. - Limit size of final syncs when generating SSTs. This is achieved by backporting cockroachdb/rocksdb#29 and turning it on with `strict_bytes_per_sync = true`. Release note: None Co-authored-by: Andrew Kryczka <[email protected]>

ajkr · 2019-05-08T20:10:29Z

Failing on zfs snapshot, that's strange. Well, I plan to play with these tests over the next week or so, and will investigate it then if I happen to see it.

Includes the following changes, all of which have landed upstream. - cockroachdb/rocksdb#27: "ldb: set `total_order_seek` for scans" - cockroachdb/rocksdb#28: "Fix cockroachdb#3840: only `SyncClosedLogs` for multiple CFs" - cockroachdb/rocksdb#29: "Optionally wait on bytes_per_sync to smooth I/O" - cockroachdb/rocksdb#30: "Option string/map/file can set env from object registry" Also made the RocksDB changes that we decided in cockroachdb#34897: - Do not sync WAL before installing flush result. This is achieved by backporting cockroachdb/rocksdb#28; no configuration change is necessary. - Do not sync WAL ever for temp stores. This is achieved by setting `wal_bytes_per_sync = 0`. - Limit size of final syncs when generating SSTs. This is achieved by backporting cockroachdb/rocksdb#29 and turning it on with `strict_bytes_per_sync = true`. Release note: None

cockroach-teamcity added this to the 2.2 milestone Feb 14, 2019

cockroach-teamcity assigned tbg Feb 14, 2019

cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Feb 14, 2019

tbg added a commit to tbg/cockroach that referenced this issue Feb 19, 2019

roachtest: use high connect timeout in clearrange

503adcf

See cockroachdb#34897 (comment). Release note: None

tbg mentioned this issue Feb 19, 2019

roachtest: misc clearrange improvements #35050

Merged

tbg mentioned this issue Feb 19, 2019

server: should signal readiness (--background) later #35070

Closed

tbg added a commit to tbg/cockroach that referenced this issue Feb 20, 2019

roachtest: use high connect timeout in clearrange

5a76b54

See cockroachdb#34897 (comment). Release note: None

craig bot pushed a commit that referenced this issue Feb 20, 2019

Merge #35050

572bdb9

35050: roachtest: misc clearrange improvements r=petermattis a=tbg Touches #34897. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>

ajkr mentioned this issue Apr 11, 2019

IMPORT: fatal disk stall during CSV import #36757

Closed

ajkr mentioned this issue Apr 18, 2019

Optionally wait on bytes_per_sync to smooth I/O cockroachdb/rocksdb#29

Merged

This comment has been minimized.

Sign in to view

tbg mentioned this issue Apr 25, 2019

roachtest: clearrange/checks=true failed #34860

Closed

ajkr mentioned this issue Apr 27, 2019

c-deps: bump rocksdb for multiple backported PRs #37172

Merged

ajkr mentioned this issue May 7, 2019

db: sprinkle directory fsyncs where necessary cockroachdb/pebble#125

Closed

ajkr closed this as completed May 8, 2019

ajkr mentioned this issue May 8, 2019

rocksdb: investigate compaction under high write rate #37408

Closed

tbg mentioned this issue Jul 10, 2019

Listening URL file is written after process is backgrounded #38758

Closed

roachtest: clearrange/checks=false failed #34897

roachtest: clearrange/checks=false failed #34897

Comments

cockroach-teamcity commented Feb 14, 2019

cockroach-teamcity commented Feb 15, 2019

cockroach-teamcity commented Feb 16, 2019

cockroach-teamcity commented Feb 17, 2019

cockroach-teamcity commented Feb 19, 2019

tbg commented Feb 19, 2019 • edited Loading

tbg commented Feb 19, 2019

tbg commented Feb 19, 2019

bdarnell commented Feb 19, 2019

tbg commented Feb 19, 2019 via email

cockroach-teamcity commented Feb 22, 2019

cockroach-teamcity commented Feb 23, 2019

cockroach-teamcity commented Feb 24, 2019

cockroach-teamcity commented Mar 2, 2019

cockroach-teamcity commented Mar 3, 2019

cockroach-teamcity commented Mar 4, 2019

cockroach-teamcity commented Mar 6, 2019

cockroach-teamcity commented Mar 9, 2019

cockroach-teamcity commented Mar 10, 2019

cockroach-teamcity commented Mar 12, 2019

tbg commented Apr 11, 2019

petermattis commented Apr 11, 2019

tbg commented Apr 11, 2019

petermattis commented Apr 11, 2019

ajkr commented Apr 11, 2019

ajkr commented Apr 11, 2019

petermattis commented Apr 11, 2019

petermattis commented Apr 11, 2019

tbg commented Apr 11, 2019

cockroach-teamcity commented Apr 11, 2019

ajkr commented Apr 12, 2019

petermattis commented Apr 12, 2019

cockroach-teamcity commented Apr 12, 2019

ajkr commented Apr 12, 2019

petermattis commented Apr 12, 2019

ajkr commented Apr 12, 2019 • edited Loading

This comment has been minimized.

ajkr commented May 8, 2019

tbg commented Feb 19, 2019 •

edited

Loading

ajkr commented Apr 12, 2019 •

edited

Loading