Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raft Applied index is below Snapshot index, causing assert failures #2581

Closed
slawo opened this issue Sep 11, 2018 · 2 comments · Fixed by #2597
Closed

Raft Applied index is below Snapshot index, causing assert failures #2581

slawo opened this issue Sep 11, 2018 · 2 comments · Fixed by #2597
Assignees
Labels
kind/bug Something is broken.

Comments

@slawo
Copy link

slawo commented Sep 11, 2018

Bug report

  • Main Container Software: kubernetes v1.11.1
  • Dgraph TAG Version: v1.0.8

Describe the bug
One of the servers of a cluster keeps crashing with the following error message

github.com/dgraph-io/dgraph/x.AssertTruef
    /ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:64
github.com/dgraph-io/dgraph/x.(*WaterMark).process.func1
    /ext-go/1/src/github.com/dgraph-io/dgraph/x/watermark.go:148
github.com/dgraph-io/dgraph/x.(*WaterMark).process
    /ext-go/1/src/github.com/dgraph-io/dgraph/x/watermark.go:191
runtime.goexit

To Reproduce
Steps to reproduce the behavior:

There are no clear steps to trigger this. This happened on 2 different clusters so far (not fully monitored). And each time this happens on an instance it will go into a crash loop.

Additional context

The complete logs are:

+ dgraph server --my=dgraph-server-5.dgraph-server.prm.svc.cluster.local:7080 --lru_mb=4096 --zero=dgraph-zero-0.dgraph-zero.prm.svc.cluster.local:5080

Dgraph version   : v1.0.8
Commit SHA-1     : 1dd8376f
Commit timestamp : 2018-08-31 10:47:07 -0700
Branch           : HEAD

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.

Licensed under Apache 2.0 + Commons Clause. Copyright 2015-2018 Dgraph Labs, Inc.


2018/09/11 14:56:42 server.go:118: Setting Badger option: ssd
2018/09/11 14:56:42 server.go:134: Setting Badger table load option: mmap
2018/09/11 14:56:42 server.go:147: Setting Badger value log load option: none
2018/09/11 14:56:42 server.go:158: Opening postings Badger DB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:32 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741824 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
2018/09/11 14:56:43 gRPC server started.  Listening on port 9080
2018/09/11 14:56:43 HTTP server started.  Listening on port 8080
2018/09/11 14:56:43 groups.go:80: Current Raft Id: 6
2018/09/11 14:56:43 worker.go:86: Worker listening at address: [::]:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-zero-0.dgraph-zero.prm.svc.cluster.local:5080
2018/09/11 14:56:43 groups.go:107: Connected to group zero. Assigned group: 0
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-server-1.dgraph-server.prm.svc.cluster.local:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-server-2.dgraph-server.prm.svc.cluster.local:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-server-0.dgraph-server.prm.svc.cluster.local:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-server-4.dgraph-server.prm.svc.cluster.local:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-server-3.dgraph-server.prm.svc.cluster.local:7080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-zero-1.dgraph-zero.prm.svc.cluster.local:5080
2018/09/11 14:56:43 pool.go:108: == CONNECTED ==> Setting dgraph-zero-2.dgraph-zero.prm.svc.cluster.local:5080
2018/09/11 14:56:43 draft.go:76: Node ID: 6 with GroupID: 2
2018/09/11 14:56:43 draft.go:963: Restarting node for group: 2
2018/09/11 14:56:43 raft.go:567: INFO: 6 became follower at term 2
2018/09/11 14:56:43 raft.go:315: INFO: newRaft 6 [peers: [4,5,6], term: 2, commit: 48373, applied: 43480, lastindex: 48373, lastterm: 2]
2018/09/11 14:56:43 Name: Applied watermark doneUntil: 45375. Index: 43481
github.com/dgraph-io/dgraph/x.AssertTruef
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:64
github.com/dgraph-io/dgraph/x.(*WaterMark).process.func1
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/watermark.go:148
github.com/dgraph-io/dgraph/x.(*WaterMark).process
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/watermark.go:191
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:2361
@manishrjain
Copy link
Contributor

manishrjain commented Sep 11, 2018

Do you have the w and p directories? Can you send that to us? My email id is manish/dgraph.io.

@slawo
Copy link
Author

slawo commented Sep 12, 2018

@manishrjain I rebuilt the entire setup and I did not keep the data folders.
I observed a lot of nodes missing. I will freeze all folders next time it happens.

@manishrjain manishrjain added not_reproducible kind/bug Something is broken. and removed not_reproducible labels Sep 14, 2018
@manishrjain manishrjain self-assigned this Sep 15, 2018
@manishrjain manishrjain changed the title server on crashloop Raft Applied is below Snapshot Ts, causing assert failures Sep 15, 2018
@manishrjain manishrjain changed the title Raft Applied is below Snapshot Ts, causing assert failures Raft Applied index is below Snapshot index, causing assert failures Sep 15, 2018
manishrjain added a commit that referenced this issue Sep 15, 2018
Set the Applied index in Raft directly, so it does not pick up an index older than the snapshot. Ensure that it is in sync with the Applied watermark.

This fixes #2581.
dna2github pushed a commit to dna2fork/dgraph that referenced this issue Jul 19, 2019
Set the Applied index in Raft directly, so it does not pick up an index older than the snapshot. Ensure that it is in sync with the Applied watermark.

This fixes hypermodeinc#2581.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken.
Development

Successfully merging a pull request may close this issue.

2 participants