liveness: run sync disk write in a stopper task #81813

erikgrinaker · 2022-05-25T11:31:09Z

liveness: move stopper to NodeLivenessOptions

Release note: None

liveness: run sync disk write in a stopper task

This patch runs the sync disk write during node heartbeats in a stopper
task. The write is done in a goroutine, so that we can respect the
caller's context cancellation (even though the write itself won't).
However, this could race with engine shutdown when stopping the node,
violating the Pebble contract and triggering the race detector. Running
it as a stopper task will cause the node to wait for the disk write to
complete before closing the engine.

Of course, if the disk stalls then node shutdown will now never
complete. This is very unfortunate, since stopping the node is often
the only mitigation to recover stuck ranges with stalled disks. This is
mitigated by Pebble panic'ing the node on stalled disks, and Kubernetes
and other orchestration tools killing the process after some time.

Touches #81786.
Resolves #81511.
Resolves #81827.

Release note: None

cockroach-teamcity · 2022-05-25T11:31:18Z

This change is

stevendanna · 2022-05-25T13:44:54Z

Looks like this will resolve #81827, should have looked before opening that one. Thanks!

erikgrinaker · 2022-05-25T13:49:59Z

CI failures appear to be unrelated flake.

stevendanna

Thanks for jumping on this! I suppose if we were worried about the impact of a disk stall on shutdown, we could do something where we give up on the operation after some amount of time. But, I buy the paragraph you wrote about mitigations.

Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @stevendanna, and @tbg)

pkg/kv/kvserver/liveness/liveness.go line 1277 at r2 (raw file):

			resultCs[i], _ = nl.engineSyncs.DoChan(strconv.Itoa(i), func() (interface{}, error) {
				var taskErr error
				if err := nl.stopper.RunTask(ctx, "liveness-hb-diskwrite", func(ctx context.Context) {

Looks like there is also RunTaskWithErr that would cut down on some of the boilerplate here, but it's fine as is.

stevendanna

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker, @stevendanna, and @tbg)

nicktrav

Thanks!

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)

erikgrinaker

I suppose if we were worried about the impact of a disk stall on shutdown, we could do something where we give up on the operation after some amount of time. But, I buy the paragraph you wrote about mitigations.

Yeah, but I don't know if that's always going to be safe -- for example, in this case that means that we'll be closing Pebble while we still have in-flight writes, which is what prompted this in the first place. There's no way to cancel these writes afaik.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @nicktrav, @stevendanna, and @tbg)

pkg/kv/kvserver/liveness/liveness.go line 1277 at r2 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Looks like there is also RunTaskWithErr that would cut down on some of the boilerplate here, but it's fine as is.

Oh nice, that's better -- thanks! Definitely not "fine" as is, more like functional. :)

This patch runs the sync disk write during node heartbeats in a stopper task. The write is done in a goroutine, so that we can respect the caller's context cancellation (even though the write itself won't). However, this could race with engine shutdown when stopping the node, violating the Pebble contract and triggering the race detector. Running it as a stopper task will cause the node to wait for the disk write to complete before closing the engine. Of course, if the disk stalls then node shutdown will now never complete. This is very unfortunate, since stopping the node is often the only mitigation to recover stuck ranges with stalled disks. This is mitigated by Pebble panic'ing the node on stalled disks, and Kubernetes and other orchestration tools killing the process after some time. Release note: None

knz · 2022-05-25T14:30:18Z

respect the caller's context cancellation (even though the write itself won't)

the write can't? I think go supports this via deadlines on the os.File handle.

erikgrinaker · 2022-05-25T14:35:57Z

respect the caller's context cancellation (even though the write itself won't)

the write can't? I think go supports this via deadlines on the os.File handle.

Yes, but that applies to all writes, not just this one. We'd also have to propagate that through Pebble. Adding deadlines for all Pebble writes is a riskier change than I want to make here.

erikgrinaker · 2022-05-25T15:34:35Z

TFTRs!

bors r=stevendanna,nicktrav

craig · 2022-05-25T20:36:47Z

Build succeeded:

GitHub CI (Cockroach)

liveness: move stopper to NodeLivenessOptions

a752354

Release note: None

erikgrinaker requested a review from tbg May 25, 2022 11:31

erikgrinaker requested review from a team as code owners May 25, 2022 11:31

erikgrinaker self-assigned this May 25, 2022

erikgrinaker force-pushed the liveness-disk-task branch from 7f36c79 to 8dc745f Compare May 25, 2022 11:31

erikgrinaker changed the title ~~Liveness disk task~~ liveness: run sync disk write in a stopper task May 25, 2022

erikgrinaker force-pushed the liveness-disk-task branch from 8dc745f to acc8755 Compare May 25, 2022 11:33

stevendanna reviewed May 25, 2022

View reviewed changes

stevendanna approved these changes May 25, 2022

View reviewed changes

nicktrav approved these changes May 25, 2022

View reviewed changes

erikgrinaker force-pushed the liveness-disk-task branch from acc8755 to 3271df3 Compare May 25, 2022 14:25

erikgrinaker commented May 25, 2022

View reviewed changes

erikgrinaker force-pushed the liveness-disk-task branch from 3271df3 to b2c4afe Compare May 25, 2022 14:29

craig bot merged commit 5b0453e into cockroachdb:master May 25, 2022

nicktrav mentioned this pull request May 26, 2022

db: hold commit pipeline mutex when closing DB cockroachdb/pebble#1728

Closed

erikgrinaker deleted the liveness-disk-task branch May 28, 2022 17:10

This was referenced Jun 8, 2022

release-22.1: liveness: improve disk probes during node liveness updates #81476

Merged

release-21.2: liveness: improve disk probes during node liveness updates #81514

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

liveness: run sync disk write in a stopper task #81813

liveness: run sync disk write in a stopper task #81813

erikgrinaker commented May 25, 2022 •

edited

Loading

cockroach-teamcity commented May 25, 2022

stevendanna commented May 25, 2022

erikgrinaker commented May 25, 2022

stevendanna left a comment

stevendanna left a comment

nicktrav left a comment

erikgrinaker left a comment

knz commented May 25, 2022

erikgrinaker commented May 25, 2022

erikgrinaker commented May 25, 2022

craig bot commented May 25, 2022

liveness: run sync disk write in a stopper task #81813

liveness: run sync disk write in a stopper task #81813

Conversation

erikgrinaker commented May 25, 2022 • edited Loading

cockroach-teamcity commented May 25, 2022

stevendanna commented May 25, 2022

erikgrinaker commented May 25, 2022

stevendanna left a comment

Choose a reason for hiding this comment

stevendanna left a comment

Choose a reason for hiding this comment

nicktrav left a comment

Choose a reason for hiding this comment

erikgrinaker left a comment

Choose a reason for hiding this comment

knz commented May 25, 2022

erikgrinaker commented May 25, 2022

erikgrinaker commented May 25, 2022

craig bot commented May 25, 2022

erikgrinaker commented May 25, 2022 •

edited

Loading