release-22.1: liveness: improve disk probes during node liveness updates #81476

blathers-crl · 2022-05-18T20:49:46Z

Backport 1/1 commits from liveness: improve disk probes during node liveness updates #81133.
Backport 2/2 commits from liveness: run sync disk write in a stopper task #81813.

/cc @cockroachdb/release

When NodeLiveness updates the liveness record (e.g. during
heartbeats), it first does a noop sync write to all disks. This ensures
that a node with a stalled disk will fail to maintain liveness and lose
its leases.

However, this sync write could block indefinitely, and would not respect
the caller's context, which could cause the caller to stall rather than
time out. This in turn could lead to stalls higher up in the stack,
in particular with lease acquisitions that do a synchronous heartbeat.

This patch does the sync write in a separate goroutine in order to
respect the caller's context. The write operation itself will not
(can not) respect the context, and may thus leak a goroutine. However,
concurrent sync writes will coalesce onto an in-flight write.

Additionally, this runs the sync writes in parallel across all disks,
since we can now trivially do so. This may be advantageous on nodes with
many stores, to avoid spurious heartbeat failures under load.

Touches #81100.

Release note (bug fix): Disk write probes during node liveness
heartbeats will no longer get stuck on stalled disks, instead returning
an error once the operation times out. Additionally, disk probes now run
in parallel on nodes with multiple stores.

Release justification: cluster availability improvement.

blathers-crl · 2022-05-18T20:49:49Z

cockroach-teamcity · 2022-05-18T20:49:54Z

This change is

tbg

LGTM but maybe a week or two of bake on master?

erikgrinaker · 2022-05-19T09:06:59Z

LGTM but maybe a week or two of bake on master?

Yep, for sure.

When `NodeLiveness` updates the liveness record (e.g. during heartbeats), it first does a noop sync write to all disks. This ensures that a node with a stalled disk will fail to maintain liveness and lose its leases. However, this sync write could block indefinitely, and would not respect the caller's context, which could cause the caller to stall rather than time out. This in turn could lead to stalls higher up in the stack, in particular with lease acquisitions that do a synchronous heartbeat. This patch does the sync write in a separate goroutine in order to respect the caller's context. The write operation itself will not (can not) respect the context, and may thus leak a goroutine. However, concurrent sync writes will coalesce onto an in-flight write. Additionally, this runs the sync writes in parallel across all disks, since we can now trivially do so. This may be advantageous on nodes with many stores, to avoid spurious heartbeat failures under load. Release note (bug fix): Disk write probes during node liveness heartbeats will no longer get stuck on stalled disks, instead returning an error once the operation times out. Additionally, disk probes now run in parallel on nodes with multiple stores.

Release note: None

This patch runs the sync disk write during node heartbeats in a stopper task. The write is done in a goroutine, so that we can respect the caller's context cancellation (even though the write itself won't). However, this could race with engine shutdown when stopping the node, violating the Pebble contract and triggering the race detector. Running it as a stopper task will cause the node to wait for the disk write to complete before closing the engine. Of course, if the disk stalls then node shutdown will now never complete. This is very unfortunate, since stopping the node is often the only mitigation to recover stuck ranges with stalled disks. This is mitigated by Pebble panic'ing the node on stalled disks, and Kubernetes and other orchestration tools killing the process after some time. Release note: None

blathers-crl bot requested review from a team as code owners May 18, 2022 20:49

blathers-crl bot force-pushed the blathers/backport-release-22.1-81133 branch from 0b54478 to 6da8e80 Compare May 18, 2022 20:49

blathers-crl bot requested review from andreimatei and tbg May 18, 2022 20:49

blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels May 18, 2022

blathers-crl bot assigned erikgrinaker May 18, 2022

tbg approved these changes May 18, 2022

View reviewed changes

blathers-crl bot requested a review from a team as a code owner June 8, 2022 17:37

erikgrinaker added 3 commits June 8, 2022 17:38

liveness: move stopper to NodeLivenessOptions

af44457

Release note: None

erikgrinaker force-pushed the blathers/backport-release-22.1-81133 branch from 3e03cc0 to 97054ae Compare June 8, 2022 17:39

erikgrinaker merged commit 79594b2 into release-22.1 Jun 8, 2022

erikgrinaker deleted the blathers/backport-release-22.1-81133 branch June 8, 2022 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-22.1: liveness: improve disk probes during node liveness updates #81476

release-22.1: liveness: improve disk probes during node liveness updates #81476

blathers-crl bot commented May 18, 2022 •

edited by erikgrinaker

Loading

blathers-crl bot commented May 18, 2022

cockroach-teamcity commented May 18, 2022

tbg left a comment

erikgrinaker commented May 19, 2022

release-22.1: liveness: improve disk probes during node liveness updates #81476

release-22.1: liveness: improve disk probes during node liveness updates #81476

Conversation

blathers-crl bot commented May 18, 2022 • edited by erikgrinaker Loading

blathers-crl bot commented May 18, 2022

cockroach-teamcity commented May 18, 2022

tbg left a comment

Choose a reason for hiding this comment

erikgrinaker commented May 19, 2022

blathers-crl bot commented May 18, 2022 •

edited by erikgrinaker

Loading