Raft Applied index is below Snapshot index, causing assert failures #2581

slawo · 2018-09-11T15:16:26Z

Bug report

Main Container Software: kubernetes v1.11.1
Dgraph TAG Version: v1.0.8

Describe the bug
One of the servers of a cluster keeps crashing with the following error message

github.com/dgraph-io/dgraph/x.AssertTruef
    /ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:64
github.com/dgraph-io/dgraph/x.(*WaterMark).process.func1
    /ext-go/1/src/github.com/dgraph-io/dgraph/x/watermark.go:148
github.com/dgraph-io/dgraph/x.(*WaterMark).process
    /ext-go/1/src/github.com/dgraph-io/dgraph/x/watermark.go:191
runtime.goexit

To Reproduce
Steps to reproduce the behavior:

There are no clear steps to trigger this. This happened on 2 different clusters so far (not fully monitored). And each time this happens on an instance it will go into a crash loop.

Additional context

The complete logs are:

+ dgraph server --my=dgraph-server-5.dgraph-server.prm.svc.cluster.local:7080 --lru_mb=4096 --zero=dgraph-zero-0.dgraph-zero.prm.svc.cluster.local:5080

Dgraph version   : v1.0.8
Commit SHA-1     : 1dd8376f
Commit timestamp : 2018-08-31 10:47:07 -0700
Branch           : HEAD

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.

Licensed under Apache 2.0 + Commons Clause. Copyright 2015-2018 Dgraph Labs, Inc.


2018/09/11 14:56:42 server.go:118: Setting Badger option: ssd
2018/09/11 14:56:42 server.go:134: Setting Badger table load option: mmap
2018/09/11 14:56:42 server.go:147: Setting Badger value log load option: none
2018/09/11 14:56:42 server.go:158: Opening postings Badger DB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:32 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741824 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
2018/09/11 14:56:43 gRPC server started.  Listening on port 9080
2018/09/11 14:56:43 HTTP server started.  Listening on port 8080
2018/09/11 14:56:43 groups.go:80: Current Raft Id: 6
2018/09/11 14:56:43 worker.go:86: Worker listening at address: [::]:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-zero-0.dgraph-zero.prm.svc.cluster.local:5080
2018/09/11 14:56:43 groups.go:107: Connected to group zero. Assigned group: 0
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-server-1.dgraph-server.prm.svc.cluster.local:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-server-2.dgraph-server.prm.svc.cluster.local:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-server-0.dgraph-server.prm.svc.cluster.local:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-server-4.dgraph-server.prm.svc.cluster.local:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-server-3.dgraph-server.prm.svc.cluster.local:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-zero-1.dgraph-zero.prm.svc.cluster.local:5080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-zero-2.dgraph-zero.prm.svc.cluster.local:5080
2018/09/11 14:56:43 draft.go:76: Node ID: 6 with GroupID: 2
2018/09/11 14:56:43 draft.go:963: Restarting node for group: 2
2018/09/11 14:56:43 raft.go:567: INFO: 6 became follower at term 2
2018/09/11 14:56:43 raft.go:315: INFO: newRaft 6 [peers: [4,5,6], term: 2, commit: 48373, applied: 43480, lastindex: 48373, lastterm: 2]
2018/09/11 14:56:43 Name: Applied watermark doneUntil: 45375. Index: 43481
github.com/dgraph-io/dgraph/x.AssertTruef
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:64
github.com/dgraph-io/dgraph/x.(*WaterMark).process.func1
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/watermark.go:148
github.com/dgraph-io/dgraph/x.(*WaterMark).process
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/watermark.go:191
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:2361

The text was updated successfully, but these errors were encountered:

manishrjain · 2018-09-11T19:12:54Z

Do you have the w and p directories? Can you send that to us? My email id is manish/dgraph.io.

slawo · 2018-09-12T08:45:44Z

@manishrjain I rebuilt the entire setup and I did not keep the data folders.
I observed a lot of nodes missing. I will freeze all folders next time it happens.

Set the Applied index in Raft directly, so it does not pick up an index older than the snapshot. Ensure that it is in sync with the Applied watermark. This fixes #2581.

Set the Applied index in Raft directly, so it does not pick up an index older than the snapshot. Ensure that it is in sync with the Applied watermark. This fixes hypermodeinc#2581.

manishrjain added not_reproducible kind/bug Something is broken. and removed not_reproducible labels Sep 14, 2018

manishrjain self-assigned this Sep 15, 2018

manishrjain changed the title ~~server on crashloop~~ Raft Applied is below Snapshot Ts, causing assert failures Sep 15, 2018

manishrjain changed the title ~~Raft Applied is below Snapshot Ts, causing assert failures~~ Raft Applied index is below Snapshot index, causing assert failures Sep 15, 2018

manishrjain mentioned this issue Sep 15, 2018

Set the Applied index in Raft directly #2597

Merged

manishrjain closed this as completed in #2597 Sep 15, 2018

manishrjain added a commit that referenced this issue Sep 15, 2018

Set the Applied index in Raft directly

c955ec1

Set the Applied index in Raft directly, so it does not pick up an index older than the snapshot. Ensure that it is in sync with the Applied watermark. This fixes #2581.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raft Applied index is below Snapshot index, causing assert failures #2581

Raft Applied index is below Snapshot index, causing assert failures #2581

slawo commented Sep 11, 2018 •

edited

Loading

manishrjain commented Sep 11, 2018 •

edited

Loading

slawo commented Sep 12, 2018

Raft Applied index is below Snapshot index, causing assert failures #2581

Raft Applied index is below Snapshot index, causing assert failures #2581

Comments

slawo commented Sep 11, 2018 • edited Loading

Bug report

manishrjain commented Sep 11, 2018 • edited Loading

slawo commented Sep 12, 2018

slawo commented Sep 11, 2018 •

edited

Loading

manishrjain commented Sep 11, 2018 •

edited

Loading