Hung import during tpc-c #34499

awoods187 · 2019-02-01T19:06:17Z

Describe the problem

Node crashed during import but the import is hung instead of failed.

To Reproduce
export CLUSTER=andy-nolbs
roachprod create $CLUSTER -n 16 --clouds=aws --aws-machine-type-ssd=c5d.4xlarge
roachprod run $CLUSTER:1-15 -- 'sudo umount /mnt/data1; sudo mount -o discard,defaults,nobarrier /dev/nvme1n1 /mnt/data1/; mount | grep /mnt/data1'
roachprod stage $CLUSTER:1-15 cockroach
roachprod stage $CLUSTER:16 workload
roachprod start $CLUSTER:1-15 --racks=5 -e COCKROACH_ENGINE_MAX_SYNC_DURATION=24h
Turn off load-based splitting before loading fixtures:
roachprod sql $CLUSTER:1
SET CLUSTER SETTING kv.range_split.by_load_enabled = false
roachprod adminurl --open $CLUSTER:1
roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=10000 --db=tpcc"

Environment:
v2.2.0-alpha.20181217-988-g63f8ee7

awoods187 · 2019-02-01T19:21:47Z

Grabbed the goroutine dump from the stuck node:

https://gist.github.com/awoods187/4e2b4716a0a4bfcdd02e840de527a5f4

jordanlewis · 2019-02-01T19:29:12Z

The tons and tons of goroutines stuck like this are why we dumped the goroutines:

goroutine 154 [semacquire, 155 minutes]:
sync.runtime_Semacquire(0xc00b4cff78)
	/usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00b4cff70)
	/usr/local/go/src/sync/waitgroup.go:130 +0x64
github.com/cockroachdb/cockroach/pkg/storage/engine.(*rocksDBBatch).Commit(0xc00b4cfd40, 0xed3e66e01, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:1905 +0x593
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc02911b200, 0x38b0780, 0xc021e136e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:688 +0x5ae
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue.func1(0x38b0780, 0xc021e136e0, 0xc02911b200, 0x38b0780)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3802 +0x12c
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc001078600, 0x38b0780, 0xc021e136e0, 0xc013db3c00, 0xc007d3bed0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3449 +0x135
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue(0xc001078600, 0x38b0780, 0xc0008226c0, 0xd5f)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3790 +0x21b
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc0009d2780, 0x38b0780, 0xc0008226c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:225 +0x21a
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x38b0780, 0xc0008226c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:165 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc0003223d0, 0xc0001fd680, 0xc0003223c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:198 +0xe1
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:191 +0xa8

Seems like a deadlock somewhere.

tbg · 2019-02-01T20:34:14Z

Cc @petermattis. I'm also planning to take a look but it won't be before next week

tbg · 2019-02-05T15:57:05Z

Andy can you check that you're not simply running out of iops quota or the like on AWS?
Though this shouldn't be the case as the disk stall mechanism is supposed to kill the nodes.

The goroutine dump unfortunately doesn't help much if the problem is really that something is stuck at the Rocks level. We'd need to catch this in action and then jump on it.

awoods187 · 2019-02-05T16:15:11Z

I don't think I ran out of anything here because it was SSD on AWS which scales iops accordingly (on c5d.4xlarge nodes. Similarly, even with restore amplification, we are nowhere close to hitting capacity. I'm not sure what else to check but this setup worked fine on multiple other runs (with variations of turning merging and loadbased splitting on and off)

petermattis · 2019-02-05T16:59:07Z

Seems to be stuckage deep inside RocksDB:

1: syscall [155 minutes] [Created by stop.(*Stopper).RunWorker @ stopper.go:191]
    engine     _cgo_gotypes.go:957                     _Cfunc_DBPut(#5294, #1178, 0x6, 0, 0, #3647, 0x1d, 0, 0)
    engine     rocksdb.go:2589                         dbPut.func1(*DBEngine(#5294), MVCCKey(#1178), []byte(0x0 len=0 cap=824926292608), 0x1d, 0x0, 0x0)
    engine     rocksdb.go:2589                         dbPut(*DBEngine(#5294), MVCCKey(#1178), []byte(0x6 len=0 cap=0), #3647, 0x1d, 0x1d, 0x1d, ...)
    engine     rocksdb.go:746                          (*RocksDB).Put(*RocksDB(#1076), MVCCKey(#1178), []byte(0x6 len=0 cap=0), #3647, 0x1d, 0x1d, #637, ...)
    engine     mvcc.go:991                             (*putBuffer).putMeta(*putBuffer(#953), Writer(#5317), MVCCKey(#1178), *MVCCMetadata(0x6), 0x0, 0x0, #953, #1179, ...)
    engine     mvcc.go:1312                            mvccPutInternal(Context(#88), Writer(#5317), Iterator(0x0), *MVCCStats(0x0), Key(#1178), Timestamp(0x6), ...)
    engine     mvcc.go:1102                            mvccPutUsingIter(Context(#88), Writer(#5317), Iterator(0x0), *MVCCStats(0x0), Key(#1178), Timestamp(0x6), ...)
    engine     mvcc.go:1030                            MVCCPut(Context(#88), ReadWriter(#5316), *MVCCStats(0x0), Key(#1178), Timestamp(0x6), Value(0x0), ...)
    engine     mvcc.go:641                             MVCCPutProto(Context(#88), ReadWriter(#5316), *MVCCStats(0x0), Key(#1178), Timestamp(0x6), *Transaction(0x0), ...)
    storage    store.go:1837                           (*Store).WriteLastUpTimestamp(*Store(#1220), Context(#88), Timestamp(#5374), #1348, #816)
    server     server.go:1587                          (*Server).Start.func18.1(*Server(#1220), Context(0x6eb4f4))
    storage    stores.go:130                           (*Stores).VisitStores.func1(*Stores(0xe), func(#1220))
    syncutil   int_map.go:338                          (*IntMap).Range(*IntMap(#830), func(#3930))
    storage    stores.go:129                           (*Stores).VisitStores(*Stores(#829), func(#3931), #88)
    server     server.go:1586                          (*Server).Start.func18(*Server(#88), Context(#4398))
    storage    node_liveness.go:847                    (*NodeLiveness).updateLivenessAttempt(*NodeLiveness(#1015), Context(#88), livenessUpdate(0xe), *Liveness(#5377), func(0x0), 0x0, #4786, ...)
    storage    node_liveness.go:770                    (*NodeLiveness).updateLiveness(*NodeLiveness(#1015), Context(#88), livenessUpdate(0xe), *Liveness(#5377), func(0x0), 0x0, #4786, ...)
    storage    node_liveness.go:564                    (*NodeLiveness).heartbeatInternal(*NodeLiveness(#1015), Context(#88), *Liveness(#4786), bool(#5429), 0x0)
    storage    node_liveness.go:447                    (*NodeLiveness).StartHeartbeat.func1.1(*NodeLiveness(#1015), Context(#25), *Stopper(#5429), HeartbeatCallback(0x0), #213, #3932, #87, #4873)
    storage    node_liveness.go:458                    (*NodeLiveness).StartHeartbeat.func1(*NodeLiveness(#88), Context(#1264))
    stop       stopper.go:198                          (*Stopper).RunWorker.func1(*Stopper(#543), Context(#231))

1: syscall [155 minutes] [Created by stop.(*Stopper).RunAsyncTask @ stopper.go:318]
    engine     _cgo_gotypes.go:957                     _Cfunc_DBPut(#5294, #4792, 0xa, 0, 0, #4793, 0x1d, 0, 0)
    engine     rocksdb.go:2589                         dbPut.func1(*DBEngine(#5294), MVCCKey(#4792), []byte(0x0 len=0 cap=825493939392), 0x1d, 0x0, 0x0)
    engine     rocksdb.go:2589                         dbPut(*DBEngine(#5294), MVCCKey(#4792), []byte(0x20 len=0 cap=0), #4793, 0x1d, 0x1d, 0x1d, ...)
    engine     rocksdb.go:746                          (*RocksDB).Put(*RocksDB(#1076), MVCCKey(#4792), []byte(0x20 len=0 cap=0), #4793, 0x1d, 0x1d, #4615, ...)
    engine     mvcc.go:991                             (*putBuffer).putMeta(*putBuffer(#1693), Writer(#5317), MVCCKey(#4792), *MVCCMetadata(0x20), 0x0, 0x0, #1693, #5086, ...)
    engine     mvcc.go:1312                            mvccPutInternal(Context(#87), Writer(#5317), Iterator(0x0), *MVCCStats(0x0), Key(#4792), Timestamp(0x20), ...)
    engine     mvcc.go:1102                            mvccPutUsingIter(Context(#87), Writer(#5317), Iterator(0x0), *MVCCStats(0x0), Key(#4792), Timestamp(0x20), ...)
    engine     mvcc.go:1030                            MVCCPut(Context(#87), ReadWriter(#5316), *MVCCStats(0x0), Key(#4792), Timestamp(0x20), Value(0x0), ...)
    engine     mvcc.go:641                             MVCCPutProto(Context(#87), ReadWriter(#5316), *MVCCStats(0x0), Key(#4792), Timestamp(0x20), *Transaction(0x0), ...)
    storage    replica.go:812                          (*Replica).setLastReplicaGCTimestamp(*Replica(#4436), Context(#87), Timestamp(#5404), #5247, 0x1)
    storage    replica_gc_queue.go:234                 (*replicaGCQueue).process(*replicaGCQueue(#301), Context(#87), *Replica(#4436), *SystemConfig(0x0), 0x0, 0x0)
    storage    queue.go:751                            (*baseQueue).processReplica(*baseQueue(#833), Context(#87), *Replica(#4436), 0x0, 0x0)
    storage    queue.go:645                            (*baseQueue).processLoop.func1.2(*baseQueue(#88), *Stopper(#4950))
    stop       stopper.go:323                          (*Stopper).RunAsyncTask.func1(*Stopper(#231), Context(#88), string(#4733, len=45), func(0x0), #4886)

1: syscall [155 minutes] [Created by stop.(*Stopper).RunWorker @ stopper.go:191]
    engine     _cgo_gotypes.go:276                     _Cfunc_DBApplyBatchRepr(#5294, #4488, 0x32, 0, 0, 0)
    engine     rocksdb.go:2604                         dbApplyBatchRepr.func1(*DBEngine(#5294), MVCCKey(#4488), []byte(0xb24a00 len=0 cap=0))
    engine     rocksdb.go:2604                         dbApplyBatchRepr(*DBEngine(#5294), MVCCKey(#4488), []byte(0x400 len=0 cap=0), 0x0)
    engine     rocksdb.go:1928                         (*rocksDBBatch).commitInternal(*rocksDBBatch(#3761), bool(0x0), #3830)
    engine     rocksdb.go:1859                         (*rocksDBBatch).Commit(*rocksDBBatch(#3761), bool(#139), 0x0)
    storage    replica_raft.go:688                     (*Replica).handleRaftReadyRaftMuLocked(*Replica(#2414), Context(#88), IncomingSnapshot(0x0), 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    storage    store.go:3802                           (*Store).processRequestQueue.func1(*Store(#88), Context(#4045), RangeID(#88))
    storage    store.go:3449                           (*Store).withReplicaForRequest(*Store(#1220), Context(#88), *RaftMessageRequest(#4134), func(#4000))
    storage    store.go:3790                           (*Store).processRequestQueue(*Store(#1220), Context(#88), RangeID(0x2))
    storage    scheduler.go:225                        (*raftScheduler).worker(*raftScheduler(#998), Context(#88))
    storage    scheduler.go:165                        (*raftScheduler).Start.func2(*raftScheduler(#88), Context(#908))
    stop       stopper.go:198                          (*Stopper).RunWorker.func1(*Stopper(#563), Context(#231))

The goroutine above is the one blocking all other RocksDB batch commits.

1: syscall [155 minutes] [Created by stop.(*Stopper).RunWorker @ stopper.go:191]
    engine     _cgo_gotypes.go:690                     _Cfunc_DBIngestExternalFiles(#5294, #3837, 0x1, 0x1, 0, 0)
    engine     rocksdb.go:2923                         (*RocksDB).IngestExternalFiles.func2(*RocksDB(#3837), Context(0x1), []string(#5294 len=825018663544 cap=1), bool(0x1), 0x0)
    engine     rocksdb.go:2923                         (*RocksDB).IngestExternalFiles(*RocksDB(#1076), Context(#88), []string(#3798 len=1 cap=1), bool(0x0), 0x0)
    storage    replica_proposal.go:410                 addSSTablePreApply(Context(#88), *Settings(#828), Engine(#104), sideloadStorage(#92), 6, 73, ReplicatedEvalResult_AddSSTable(#5274), ...)
    storage    replica_raft.go:1809                    (*Replica).processRaftCommand(*Replica(#3460), Context(#88), CmdIDKey(#5238), 6, 73, RaftCommand(#136), 0x5, ...)
    storage    replica_raft.go:785                     (*Replica).handleRaftReadyRaftMuLocked(*Replica(#3460), Context(#88), IncomingSnapshot(0x0), 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    storage    store.go:3802                           (*Store).processRequestQueue.func1(*Store(#88), Context(#3389), RangeID(#88))
    storage    store.go:3449                           (*Store).withReplicaForRequest(*Store(#1220), Context(#88), *RaftMessageRequest(#4936), func(#4086))
    storage    store.go:3790                           (*Store).processRequestQueue(*Store(#1220), Context(#88), RangeID(0x2735))
    storage    scheduler.go:225                        (*raftScheduler).worker(*raftScheduler(#998), Context(#88))
    storage    scheduler.go:165                        (*raftScheduler).Start.func2(*raftScheduler(#88), Context(#891))
    stop       stopper.go:198                          (*Stopper).RunWorker.func1(*Stopper(#396), Context(#231))

So 4 different goroutines stuck in RocksDB.

@awoods187 How reproducible is this? I think I'd need to inspect a running cluster to see what is going on. Ideally I'd get a core dump which I can spelunk with a debugger.

awoods187 · 2019-02-06T13:57:10Z

I've only seen it once so not very reproducible. I can let you know if i see it again or we can try on that particular sha and setup a few times

tbg · 2019-02-12T12:17:44Z

This is certainly similar to #34212 (though in principle they could be disjoint deadlocks), down to the goroutines involved. Core dump is available, but from what I've seen the deadlock is "obvious". I'm going to close for #34212.

petermattis self-assigned this Feb 5, 2019

petermattis added the C-investigation Further steps needed to qualify. C-label will change. label Feb 5, 2019

tbg closed this as completed Feb 12, 2019

awoods187 added the S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors label Mar 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hung import during tpc-c #34499

Hung import during tpc-c #34499

awoods187 commented Feb 1, 2019

awoods187 commented Feb 1, 2019

jordanlewis commented Feb 1, 2019

tbg commented Feb 1, 2019

tbg commented Feb 5, 2019

awoods187 commented Feb 5, 2019

petermattis commented Feb 5, 2019

awoods187 commented Feb 6, 2019

tbg commented Feb 12, 2019

Hung import during tpc-c #34499

Hung import during tpc-c #34499

Comments

awoods187 commented Feb 1, 2019

awoods187 commented Feb 1, 2019

jordanlewis commented Feb 1, 2019

tbg commented Feb 1, 2019

tbg commented Feb 5, 2019

awoods187 commented Feb 5, 2019

petermattis commented Feb 5, 2019

awoods187 commented Feb 6, 2019

tbg commented Feb 12, 2019