-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: compactions and consistency checks queued over more than a week #21824
Comments
The full log for a node is attached. |
Could you check the |
Leo, could you send us the logs for the other nodes as well? |
Thanks @leomkkwan. The logs look uninteresting (heartbeats timing out after 16:36, but no more information) with the exception of n4.log (see below). What this tells me is that that node exhibited I/O problems but it doesn't explain why. @a-robinson I'm assigning you since I'm not going to be able to look at this too soon and you might have an idea. (I don't see any replicaGC in the logs or I would have suspected the compaction queue).
|
@nvanbenschoten Didn't you introduce (and later fix) a bug regarding failed heartbeats around this time? |
@petermattis are you referring to #22636 (comment)? If so, I don't think this issue is related. Here, the liveness heartbeats are very slow. In that issue, they were all failing with the error |
@nvanbenschoten Yeah, #22636 was the issue I was thinking of. |
Sorry for letting this slip by! @leomkkwan by any chance have you upgraded to a more recent alpha release since the issue was reported or seen anything like this again? Also, there isn't much network latency between the nodes, is there? It looks like there may be a leader-not-leaseholder situation, judging by the fact that n4 has all the messages indicating it's the leaseholder for r3, but both n1 and n5 had to send n4 raft snapshots for r3 at different points in time. It's hard to explain the crazy amount of raft log activity without any additional info about what was going on in the cluster at the time, though. |
Any interest in providing more detail, @leomkkwan? |
The cluster was destroyed the morning.
Because there is a few withdraw in the alpha and we did upgraded for all versions, therefore we decided that the cluster was broken and start from scratch from Beta. We are rebuilding the clusters and if we see this issue again, I will update this ticket. Thanks, Leo |
@leomkkwan sorry about letting this sit. How did these graphs change since your message? |
No change. Consistency Checker Queue and Compaction Queue are flat for the last week. A little history about this clusters.. Also include log file for all nodes, the only one looks interesting to me is in n5.log which it spam with the line n3.log Hope this help. I am not a enterprise customer yet and I spent three weeks to import data back to the cluster. If there is anyway I can recovery from this without needing to rebuild the data will be appreciated. |
The The compaction queue having a large number of pending bytes does not mean anything is wrong. The compaction queue can happily queue up a large amount of potential bytes without acting on them if they don't meet certain conditions around having large contiguous chunks of reclaimable space or the node being low on disk space. The consistency checker queue being stuck is weirder. I can't think of any occasions in which we've seen something like that before. I have a few follow-up questions and things for you to check:
|
Hi @leomkkwan, the logs look... actually really good, which is unfortunate since I was hoping to zero in on a root cause. Could you send me a
|
When `CheckConsistency` returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. Unfortunately this check was incorrect and would block until the store actually started draining. A toy example of this problem is below (this will deadlock). The dual return form of chan receive isn't non-blocking -- the second parameter indicates whether the received value corresponds to a closing of the channel. Switch to a `select` instead. ``` package main import ( "fmt" ) func main() { ch := make(chan struct{}) _, ok := <-ch fmt.Println(ok) } ``` Touches cockroachdb#21824. Release note (bug fix): Prevent the consistency checker from deadlocking. This would previously manifest itself as a steady number of replicas queued for consistency checking on one or more nodes and would resolve by restarting the affected nodes.
When `CheckConsistency` returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. Unfortunately this check was incorrect and would block until the store actually started draining. A toy example of this problem is below (this will deadlock). The dual return form of chan receive isn't non-blocking -- the second parameter indicates whether the received value corresponds to a closing of the channel. Switch to a `select` instead. ``` package main import ( "fmt" ) func main() { ch := make(chan struct{}) _, ok := <-ch fmt.Println(ok) } ``` Touches cockroachdb#21824. Release note (bug fix): Prevent the consistency checker from deadlocking. This would previously manifest itself as a steady number of replicas queued for consistency checking on one or more nodes and would resolve by restarting the affected nodes.
When `CheckConsistency` returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. Unfortunately this check was incorrect and would block until the store actually started draining. A toy example of this problem is below (this will deadlock). The dual return form of chan receive isn't non-blocking -- the second parameter indicates whether the received value corresponds to a closing of the channel. Switch to a `select` instead. ``` package main import ( "fmt" ) func main() { ch := make(chan struct{}) _, ok := <-ch fmt.Println(ok) } ``` Touches cockroachdb#21824. Release note (bug fix): Prevent the consistency checker from deadlocking. This would previously manifest itself as a steady number of replicas queued for consistency checking on one or more nodes and would resolve by restarting the affected nodes.
When `CheckConsistency` returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. Unfortunately this check was incorrect and would block until the store actually started draining. A toy example of this problem is below (this will deadlock). The dual return form of chan receive isn't non-blocking -- the second parameter indicates whether the received value corresponds to a closing of the channel. Switch to a `select` instead. ``` package main import ( "fmt" ) func main() { ch := make(chan struct{}) _, ok := <-ch fmt.Println(ok) } ``` Touches cockroachdb#21824. Release note (bug fix): Prevent the consistency checker from deadlocking. This would previously manifest itself as a steady number of replicas queued for consistency checking on one or more nodes and would resolve by restarting the affected nodes.
25297: build: add roachtests to be run on every PR r=benesch,petermattis a=danhhz Release note: None 25434: opt: Change how Any operator is represented and flattened r=andy-kimball a=andy-kimball The Any operator currently takes a single input rowset, and expects it to return a single boolean column. The hoister flattens the Any operator by testing and projecting that boolean column. The problem is that this representation cannot easily be decorrelated. Example: z = ANY(SELECT x FROM xy) This is currently represented as: (Any (Project xy [ z=x ])) The z=x projection field cannot easily be hoisted over a left join. This commit uses an alternate representation: (Any xy z EqOp) The new representation keeps the input, scalar, and comparison op components separate, so they can be combined in ways that it easier to decorrelate. 25456: storage: fix deadlock in consistency queue r=bdarnell,a-robinson a=tschottdorf When `CheckConsistency` returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. Unfortunately this check was incorrect and would block until the store actually started draining. A toy example of this problem is below (this will deadlock). The dual return form of chan receive isn't non-blocking -- the second parameter indicates whether the received value corresponds to a closing of the channel. Switch to a `select` instead. ```go package main import ( "fmt" ) func main() { ch := make(chan struct{}) _, ok := <-ch fmt.Println(ok) } ``` Touches #21824. Release note (bug fix): Prevent the consistency checker from deadlocking. This would previously manifest itself as a steady number of replicas queued for consistency checking on one or more nodes and would resolve by restarting the affected nodes. Co-authored-by: Daniel Harrison <[email protected]> Co-authored-by: Andrew Kimball <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]>
When `CheckConsistency` returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. Unfortunately this check was incorrect and would block until the store actually started draining. A toy example of this problem is below (this will deadlock). The dual return form of chan receive isn't non-blocking -- the second parameter indicates whether the received value corresponds to a closing of the channel. Switch to a `select` instead. ``` package main import ( "fmt" ) func main() { ch := make(chan struct{}) _, ok := <-ch fmt.Println(ok) } ``` Touches cockroachdb#21824. Release note (bug fix): Prevent the consistency checker from deadlocking. This would previously manifest itself as a steady number of replicas queued for consistency checking on one or more nodes and would resolve by restarting the affected nodes.
25471: backport-2.0: rocksdb: use max_manifest_file_size option r=bdarnell a=tschottdorf Backport 1/1 commits from #25341. /cc @cockroachdb/release --- Cockroach uses a single long running rocksdb instance for the entire process lifetime, which could be many months. By default, rocksdb tracks filesystem state changes in a log file called the MANIFEST, which grows without bound until the instance is re-opened. We should bound the maximum file size of rocksdb MANIFEST using the corresponding rocksdb option to prevent unbounded growth. The MANIFEST file grew to several GBs in size in a customer bug report but that was probably because of some other bad behavior in rocksdb state management. We do want to bound the MANIFEST size in such cases as well. Release note: None 25474: backport-2.0: storage: fix deadlock in consistency queue r=bdarnell a=tschottdorf Backport 1/1 commits from #25456. /cc @cockroachdb/release --- When `CheckConsistency` returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. Unfortunately this check was incorrect and would block until the store actually started draining. A toy example of this problem is below (this will deadlock). The dual return form of chan receive isn't non-blocking -- the second parameter indicates whether the received value corresponds to a closing of the channel. Switch to a `select` instead. ```go package main import ( "fmt" ) func main() { ch := make(chan struct{}) _, ok := <-ch fmt.Println(ok) } ``` Touches #21824. Release note (bug fix): Prevent the consistency checker from deadlocking. This would previously manifest itself as a steady number of replicas queued for consistency checking on one or more nodes and would resolve by restarting the affected nodes. Co-authored-by: Garvit Juniwal <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]>
@leomkkwan, are you still seeing the consistency checker problem? |
The compactor had an optimization that would stop the compactor loop when no work was to be done. Unfortunately, the condition to probe this was incorrect. Essentially, whenever *any* work had been done, the loop stopped and would not reawaken until the next incoming suggestion. In the absence of such a suggestion, the skipped existing suggestions would never be revisited, and thus never discarded. This created confusion as it kept the "queued bytes" metric up. Refactor the code so that instead of stopping, the loop waits for the max age before running again (except if there are further suggestions). It's hard to use timers correctly, so this commit should be scrutinized. In particular, whether it's correct to call `t.Reset` in a select that does not read from the timer's channel. The test change makes `make test PKG=./pkg/storage/compactor` fail every time on my machine before the fix. I would suggest not backporting this, at least not until there is a follow-up bugfix that needs backporting and doesn't apply cleanly on top of this diff. Fixes cockroachdb#21824. Release note (bug fix): Expired compaction suggestions are now dropped not too soon after they expire. Previously, this could be delayed indefinitely.
The compactor had an optimization that would stop the compactor loop when no work was to be done. Unfortunately, the condition to probe this was incorrect. Essentially, whenever *any* work had been done, the loop stopped and would not reawaken until the next incoming suggestion. In the absence of such a suggestion, the skipped existing suggestions would never be revisited, and thus never discarded. This created confusion as it kept the "queued bytes" metric up. Refactor the code so that instead of stopping, the loop waits for the max age before running again (except if there are further suggestions). It's hard to use timers correctly, so this commit should be scrutinized. In particular, whether it's correct to call `t.Reset` in a select that does not read from the timer's channel. The test change makes `make test PKG=./pkg/storage/compactor` fail every time on my machine before the fix. I would suggest not backporting this, at least not until there is a follow-up bugfix that needs backporting and doesn't apply cleanly on top of this diff. Fixes cockroachdb#21824. Release note (bug fix): Expired compaction suggestions are now dropped not too soon after they expire. Previously, this could be delayed indefinitely.
26039: compactor: never stop the loop r=bdarnell a=tschottdorf The compactor had an optimization that would stop the compactor loop when no work was to be done. Unfortunately, the condition to probe this was incorrect. Essentially, whenever *any* work had been done, the loop stopped and would not reawaken until the next incoming suggestion. In the absence of such a suggestion, the skipped existing suggestions would never be revisited, and thus never discarded. This created confusion as it kept the "queued bytes" metric up. Refactor the code so that instead of stopping, the loop waits for the max age before running again (except if there are further suggestions). It's hard to use timers correctly, so this commit should be scrutinized. In particular, whether it's correct to call `t.Reset` in a select that does not read from the timer's channel. The test change makes `make test PKG=./pkg/storage/compactor` fail every time on my machine before the fix. I would suggest not backporting this, at least not until there is a follow-up bugfix that needs backporting and doesn't apply cleanly on top of this diff. Fixes #21824. Release note (bug fix): Expired compaction suggestions are now dropped not too soon after they expire. Previously, this could be delayed indefinitely. Co-authored-by: Tobias Schottdorf <[email protected]>
26039: compactor: never stop the loop r=bdarnell a=tschottdorf The compactor had an optimization that would stop the compactor loop when no work was to be done. Unfortunately, the condition to probe this was incorrect. Essentially, whenever *any* work had been done, the loop stopped and would not reawaken until the next incoming suggestion. In the absence of such a suggestion, the skipped existing suggestions would never be revisited, and thus never discarded. This created confusion as it kept the "queued bytes" metric up. Refactor the code so that instead of stopping, the loop waits for the max age before running again (except if there are further suggestions). It's hard to use timers correctly, so this commit should be scrutinized. In particular, whether it's correct to call `t.Reset` in a select that does not read from the timer's channel. The test change makes `make test PKG=./pkg/storage/compactor` fail every time on my machine before the fix. I would suggest not backporting this, at least not until there is a follow-up bugfix that needs backporting and doesn't apply cleanly on top of this diff. Fixes #21824. Release note (bug fix): Expired compaction suggestions are now dropped not too soon after they expire. Previously, this could be delayed indefinitely. Co-authored-by: Tobias Schottdorf <[email protected]>
The compactor had an optimization that would stop the compactor loop when no work was to be done. Unfortunately, the condition to probe this was incorrect. Essentially, whenever *any* work had been done, the loop stopped and would not reawaken until the next incoming suggestion. In the absence of such a suggestion, the skipped existing suggestions would never be revisited, and thus never discarded. This created confusion as it kept the "queued bytes" metric up. Refactor the code so that instead of stopping, the loop waits for the max age before running again (except if there are further suggestions). It's hard to use timers correctly, so this commit should be scrutinized. In particular, whether it's correct to call `t.Reset` in a select that does not read from the timer's channel. The test change makes `make test PKG=./pkg/storage/compactor` fail every time on my machine before the fix. I would suggest not backporting this, at least not until there is a follow-up bugfix that needs backporting and doesn't apply cleanly on top of this diff. Fixes cockroachdb#21824. Release note (bug fix): Expired compaction suggestions are now dropped not too soon after they expire. Previously, this could be delayed indefinitely.
26659: backport-2.0: compactor: never stop the loop r=bdarnell a=tschottdorf Backport 2/2 commits from #26039. /cc @cockroachdb/release --- The compactor had an optimization that would stop the compactor loop when no work was to be done. Unfortunately, the condition to probe this was incorrect. Essentially, whenever *any* work had been done, the loop stopped and would not reawaken until the next incoming suggestion. In the absence of such a suggestion, the skipped existing suggestions would never be revisited, and thus never discarded. This created confusion as it kept the "queued bytes" metric up. Refactor the code so that instead of stopping, the loop waits for the max age before running again (except if there are further suggestions). It's hard to use timers correctly, so this commit should be scrutinized. In particular, whether it's correct to call `t.Reset` in a select that does not read from the timer's channel. The test change makes `make test PKG=./pkg/storage/compactor` fail every time on my machine before the fix. I would suggest not backporting this, at least not until there is a follow-up bugfix that needs backporting and doesn't apply cleanly on top of this diff. Fixes #21824. Release note (bug fix): Expired compaction suggestions are now dropped not too soon after they expire. Previously, this could be delayed indefinitely. Co-authored-by: Tobias Schottdorf <[email protected]>
Reported by forums user leokwan:
https://forum.cockroachlabs.com/t/upgrade-to-v2-0-alpha-20180122/1312
"
I’ve a 5 nodes idle cluster running v2.0-alpha.20171218. I shut down all nodes, and do a upgrade to v2.0-alpha.20180122. The upgrade seems to be success; however, the Consistency Checker Queue keeps growing
And then I may see some nodes become dead on the overview page, but it will come back to normal after a few minutes. However, it will become dead again later.
I check the log on the server, I may see some warnings like this for all nodes
From dstat, all nodes have read/write actions constantly , and it looks like one of the CPU thread would keep busy.
The cluster become very unstable. I then downgrade to v2.0-alpha.20171218, the Consistency Checker Queue eventually clear and the cluster looks to be back to normal.
Leo
"
The text was updated successfully, but these errors were encountered: