Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hung import during tpc-c #34499

Closed
awoods187 opened this issue Feb 1, 2019 · 8 comments
Closed

Hung import during tpc-c #34499

awoods187 opened this issue Feb 1, 2019 · 8 comments
Assignees
Labels
C-investigation Further steps needed to qualify. C-label will change. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors

Comments

@awoods187
Copy link
Contributor

Describe the problem

Node crashed during import but the import is hung instead of failed.

image
image

To Reproduce
export CLUSTER=andy-nolbs
roachprod create $CLUSTER -n 16 --clouds=aws --aws-machine-type-ssd=c5d.4xlarge
roachprod run $CLUSTER:1-15 -- 'sudo umount /mnt/data1; sudo mount -o discard,defaults,nobarrier /dev/nvme1n1 /mnt/data1/; mount | grep /mnt/data1'
roachprod stage $CLUSTER:1-15 cockroach
roachprod stage $CLUSTER:16 workload
roachprod start $CLUSTER:1-15 --racks=5 -e COCKROACH_ENGINE_MAX_SYNC_DURATION=24h
Turn off load-based splitting before loading fixtures:
roachprod sql $CLUSTER:1
SET CLUSTER SETTING kv.range_split.by_load_enabled = false
roachprod adminurl --open $CLUSTER:1
roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=10000 --db=tpcc"

Environment:
v2.2.0-alpha.20181217-988-g63f8ee7

@awoods187
Copy link
Contributor Author

Grabbed the goroutine dump from the stuck node:

https://gist.github.com/awoods187/4e2b4716a0a4bfcdd02e840de527a5f4

@jordanlewis
Copy link
Member

The tons and tons of goroutines stuck like this are why we dumped the goroutines:

goroutine 154 [semacquire, 155 minutes]:
sync.runtime_Semacquire(0xc00b4cff78)
	/usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00b4cff70)
	/usr/local/go/src/sync/waitgroup.go:130 +0x64
github.com/cockroachdb/cockroach/pkg/storage/engine.(*rocksDBBatch).Commit(0xc00b4cfd40, 0xed3e66e01, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:1905 +0x593
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc02911b200, 0x38b0780, 0xc021e136e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:688 +0x5ae
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue.func1(0x38b0780, 0xc021e136e0, 0xc02911b200, 0x38b0780)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3802 +0x12c
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc001078600, 0x38b0780, 0xc021e136e0, 0xc013db3c00, 0xc007d3bed0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3449 +0x135
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue(0xc001078600, 0x38b0780, 0xc0008226c0, 0xd5f)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3790 +0x21b
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc0009d2780, 0x38b0780, 0xc0008226c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:225 +0x21a
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x38b0780, 0xc0008226c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:165 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc0003223d0, 0xc0001fd680, 0xc0003223c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:198 +0xe1
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:191 +0xa8

Seems like a deadlock somewhere.

@tbg
Copy link
Member

tbg commented Feb 1, 2019

Cc @petermattis. I'm also planning to take a look but it won't be before next week

@tbg
Copy link
Member

tbg commented Feb 5, 2019

Andy can you check that you're not simply running out of iops quota or the like on AWS?
Though this shouldn't be the case as the disk stall mechanism is supposed to kill the nodes.

The goroutine dump unfortunately doesn't help much if the problem is really that something is stuck at the Rocks level. We'd need to catch this in action and then jump on it.

@awoods187
Copy link
Contributor Author

I don't think I ran out of anything here because it was SSD on AWS which scales iops accordingly (on c5d.4xlarge nodes. Similarly, even with restore amplification, we are nowhere close to hitting capacity. I'm not sure what else to check but this setup worked fine on multiple other runs (with variations of turning merging and loadbased splitting on and off)

@petermattis petermattis self-assigned this Feb 5, 2019
@petermattis
Copy link
Collaborator

Seems to be stuckage deep inside RocksDB:

1: syscall [155 minutes] [Created by stop.(*Stopper).RunWorker @ stopper.go:191]
    engine     _cgo_gotypes.go:957                     _Cfunc_DBPut(#5294, #1178, 0x6, 0, 0, #3647, 0x1d, 0, 0)
    engine     rocksdb.go:2589                         dbPut.func1(*DBEngine(#5294), MVCCKey(#1178), []byte(0x0 len=0 cap=824926292608), 0x1d, 0x0, 0x0)
    engine     rocksdb.go:2589                         dbPut(*DBEngine(#5294), MVCCKey(#1178), []byte(0x6 len=0 cap=0), #3647, 0x1d, 0x1d, 0x1d, ...)
    engine     rocksdb.go:746                          (*RocksDB).Put(*RocksDB(#1076), MVCCKey(#1178), []byte(0x6 len=0 cap=0), #3647, 0x1d, 0x1d, #637, ...)
    engine     mvcc.go:991                             (*putBuffer).putMeta(*putBuffer(#953), Writer(#5317), MVCCKey(#1178), *MVCCMetadata(0x6), 0x0, 0x0, #953, #1179, ...)
    engine     mvcc.go:1312                            mvccPutInternal(Context(#88), Writer(#5317), Iterator(0x0), *MVCCStats(0x0), Key(#1178), Timestamp(0x6), ...)
    engine     mvcc.go:1102                            mvccPutUsingIter(Context(#88), Writer(#5317), Iterator(0x0), *MVCCStats(0x0), Key(#1178), Timestamp(0x6), ...)
    engine     mvcc.go:1030                            MVCCPut(Context(#88), ReadWriter(#5316), *MVCCStats(0x0), Key(#1178), Timestamp(0x6), Value(0x0), ...)
    engine     mvcc.go:641                             MVCCPutProto(Context(#88), ReadWriter(#5316), *MVCCStats(0x0), Key(#1178), Timestamp(0x6), *Transaction(0x0), ...)
    storage    store.go:1837                           (*Store).WriteLastUpTimestamp(*Store(#1220), Context(#88), Timestamp(#5374), #1348, #816)
    server     server.go:1587                          (*Server).Start.func18.1(*Server(#1220), Context(0x6eb4f4))
    storage    stores.go:130                           (*Stores).VisitStores.func1(*Stores(0xe), func(#1220))
    syncutil   int_map.go:338                          (*IntMap).Range(*IntMap(#830), func(#3930))
    storage    stores.go:129                           (*Stores).VisitStores(*Stores(#829), func(#3931), #88)
    server     server.go:1586                          (*Server).Start.func18(*Server(#88), Context(#4398))
    storage    node_liveness.go:847                    (*NodeLiveness).updateLivenessAttempt(*NodeLiveness(#1015), Context(#88), livenessUpdate(0xe), *Liveness(#5377), func(0x0), 0x0, #4786, ...)
    storage    node_liveness.go:770                    (*NodeLiveness).updateLiveness(*NodeLiveness(#1015), Context(#88), livenessUpdate(0xe), *Liveness(#5377), func(0x0), 0x0, #4786, ...)
    storage    node_liveness.go:564                    (*NodeLiveness).heartbeatInternal(*NodeLiveness(#1015), Context(#88), *Liveness(#4786), bool(#5429), 0x0)
    storage    node_liveness.go:447                    (*NodeLiveness).StartHeartbeat.func1.1(*NodeLiveness(#1015), Context(#25), *Stopper(#5429), HeartbeatCallback(0x0), #213, #3932, #87, #4873)
    storage    node_liveness.go:458                    (*NodeLiveness).StartHeartbeat.func1(*NodeLiveness(#88), Context(#1264))
    stop       stopper.go:198                          (*Stopper).RunWorker.func1(*Stopper(#543), Context(#231))
1: syscall [155 minutes] [Created by stop.(*Stopper).RunAsyncTask @ stopper.go:318]
    engine     _cgo_gotypes.go:957                     _Cfunc_DBPut(#5294, #4792, 0xa, 0, 0, #4793, 0x1d, 0, 0)
    engine     rocksdb.go:2589                         dbPut.func1(*DBEngine(#5294), MVCCKey(#4792), []byte(0x0 len=0 cap=825493939392), 0x1d, 0x0, 0x0)
    engine     rocksdb.go:2589                         dbPut(*DBEngine(#5294), MVCCKey(#4792), []byte(0x20 len=0 cap=0), #4793, 0x1d, 0x1d, 0x1d, ...)
    engine     rocksdb.go:746                          (*RocksDB).Put(*RocksDB(#1076), MVCCKey(#4792), []byte(0x20 len=0 cap=0), #4793, 0x1d, 0x1d, #4615, ...)
    engine     mvcc.go:991                             (*putBuffer).putMeta(*putBuffer(#1693), Writer(#5317), MVCCKey(#4792), *MVCCMetadata(0x20), 0x0, 0x0, #1693, #5086, ...)
    engine     mvcc.go:1312                            mvccPutInternal(Context(#87), Writer(#5317), Iterator(0x0), *MVCCStats(0x0), Key(#4792), Timestamp(0x20), ...)
    engine     mvcc.go:1102                            mvccPutUsingIter(Context(#87), Writer(#5317), Iterator(0x0), *MVCCStats(0x0), Key(#4792), Timestamp(0x20), ...)
    engine     mvcc.go:1030                            MVCCPut(Context(#87), ReadWriter(#5316), *MVCCStats(0x0), Key(#4792), Timestamp(0x20), Value(0x0), ...)
    engine     mvcc.go:641                             MVCCPutProto(Context(#87), ReadWriter(#5316), *MVCCStats(0x0), Key(#4792), Timestamp(0x20), *Transaction(0x0), ...)
    storage    replica.go:812                          (*Replica).setLastReplicaGCTimestamp(*Replica(#4436), Context(#87), Timestamp(#5404), #5247, 0x1)
    storage    replica_gc_queue.go:234                 (*replicaGCQueue).process(*replicaGCQueue(#301), Context(#87), *Replica(#4436), *SystemConfig(0x0), 0x0, 0x0)
    storage    queue.go:751                            (*baseQueue).processReplica(*baseQueue(#833), Context(#87), *Replica(#4436), 0x0, 0x0)
    storage    queue.go:645                            (*baseQueue).processLoop.func1.2(*baseQueue(#88), *Stopper(#4950))
    stop       stopper.go:323                          (*Stopper).RunAsyncTask.func1(*Stopper(#231), Context(#88), string(#4733, len=45), func(0x0), #4886)
1: syscall [155 minutes] [Created by stop.(*Stopper).RunWorker @ stopper.go:191]
    engine     _cgo_gotypes.go:276                     _Cfunc_DBApplyBatchRepr(#5294, #4488, 0x32, 0, 0, 0)
    engine     rocksdb.go:2604                         dbApplyBatchRepr.func1(*DBEngine(#5294), MVCCKey(#4488), []byte(0xb24a00 len=0 cap=0))
    engine     rocksdb.go:2604                         dbApplyBatchRepr(*DBEngine(#5294), MVCCKey(#4488), []byte(0x400 len=0 cap=0), 0x0)
    engine     rocksdb.go:1928                         (*rocksDBBatch).commitInternal(*rocksDBBatch(#3761), bool(0x0), #3830)
    engine     rocksdb.go:1859                         (*rocksDBBatch).Commit(*rocksDBBatch(#3761), bool(#139), 0x0)
    storage    replica_raft.go:688                     (*Replica).handleRaftReadyRaftMuLocked(*Replica(#2414), Context(#88), IncomingSnapshot(0x0), 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    storage    store.go:3802                           (*Store).processRequestQueue.func1(*Store(#88), Context(#4045), RangeID(#88))
    storage    store.go:3449                           (*Store).withReplicaForRequest(*Store(#1220), Context(#88), *RaftMessageRequest(#4134), func(#4000))
    storage    store.go:3790                           (*Store).processRequestQueue(*Store(#1220), Context(#88), RangeID(0x2))
    storage    scheduler.go:225                        (*raftScheduler).worker(*raftScheduler(#998), Context(#88))
    storage    scheduler.go:165                        (*raftScheduler).Start.func2(*raftScheduler(#88), Context(#908))
    stop       stopper.go:198                          (*Stopper).RunWorker.func1(*Stopper(#563), Context(#231))

The goroutine above is the one blocking all other RocksDB batch commits.

1: syscall [155 minutes] [Created by stop.(*Stopper).RunWorker @ stopper.go:191]
    engine     _cgo_gotypes.go:690                     _Cfunc_DBIngestExternalFiles(#5294, #3837, 0x1, 0x1, 0, 0)
    engine     rocksdb.go:2923                         (*RocksDB).IngestExternalFiles.func2(*RocksDB(#3837), Context(0x1), []string(#5294 len=825018663544 cap=1), bool(0x1), 0x0)
    engine     rocksdb.go:2923                         (*RocksDB).IngestExternalFiles(*RocksDB(#1076), Context(#88), []string(#3798 len=1 cap=1), bool(0x0), 0x0)
    storage    replica_proposal.go:410                 addSSTablePreApply(Context(#88), *Settings(#828), Engine(#104), sideloadStorage(#92), 6, 73, ReplicatedEvalResult_AddSSTable(#5274), ...)
    storage    replica_raft.go:1809                    (*Replica).processRaftCommand(*Replica(#3460), Context(#88), CmdIDKey(#5238), 6, 73, RaftCommand(#136), 0x5, ...)
    storage    replica_raft.go:785                     (*Replica).handleRaftReadyRaftMuLocked(*Replica(#3460), Context(#88), IncomingSnapshot(0x0), 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    storage    store.go:3802                           (*Store).processRequestQueue.func1(*Store(#88), Context(#3389), RangeID(#88))
    storage    store.go:3449                           (*Store).withReplicaForRequest(*Store(#1220), Context(#88), *RaftMessageRequest(#4936), func(#4086))
    storage    store.go:3790                           (*Store).processRequestQueue(*Store(#1220), Context(#88), RangeID(0x2735))
    storage    scheduler.go:225                        (*raftScheduler).worker(*raftScheduler(#998), Context(#88))
    storage    scheduler.go:165                        (*raftScheduler).Start.func2(*raftScheduler(#88), Context(#891))
    stop       stopper.go:198                          (*Stopper).RunWorker.func1(*Stopper(#396), Context(#231))

So 4 different goroutines stuck in RocksDB.

@awoods187 How reproducible is this? I think I'd need to inspect a running cluster to see what is going on. Ideally I'd get a core dump which I can spelunk with a debugger.

@petermattis petermattis added the C-investigation Further steps needed to qualify. C-label will change. label Feb 5, 2019
@awoods187
Copy link
Contributor Author

I've only seen it once so not very reproducible. I can let you know if i see it again or we can try on that particular sha and setup a few times

@tbg
Copy link
Member

tbg commented Feb 12, 2019

This is certainly similar to #34212 (though in principle they could be disjoint deadlocks), down to the goroutines involved. Core dump is available, but from what I've seen the deadlock is "obvious". I'm going to close for #34212.

@tbg tbg closed this as completed Feb 12, 2019
@awoods187 awoods187 added the S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors label Mar 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-investigation Further steps needed to qualify. C-label will change. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors
Projects
None yet
Development

No branches or pull requests

4 participants