Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

liveness: run sync disk write in a stopper task #81813

Merged
merged 2 commits into from
May 25, 2022

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented May 25, 2022

liveness: move stopper to NodeLivenessOptions

Release note: None

liveness: run sync disk write in a stopper task

This patch runs the sync disk write during node heartbeats in a stopper
task. The write is done in a goroutine, so that we can respect the
caller's context cancellation (even though the write itself won't).
However, this could race with engine shutdown when stopping the node,
violating the Pebble contract and triggering the race detector. Running
it as a stopper task will cause the node to wait for the disk write to
complete before closing the engine.

Of course, if the disk stalls then node shutdown will now never
complete. This is very unfortunate, since stopping the node is often
the only mitigation to recover stuck ranges with stalled disks. This is
mitigated by Pebble panic'ing the node on stalled disks, and Kubernetes
and other orchestration tools killing the process after some time.

Touches #81786.
Resolves #81511.
Resolves #81827.

Release note: None

@erikgrinaker erikgrinaker requested a review from tbg May 25, 2022 11:31
@erikgrinaker erikgrinaker requested review from a team as code owners May 25, 2022 11:31
@erikgrinaker erikgrinaker self-assigned this May 25, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker erikgrinaker changed the title Liveness disk task liveness: run sync disk write in a stopper task May 25, 2022
@stevendanna
Copy link
Collaborator

Looks like this will resolve #81827, should have looked before opening that one. Thanks!

@erikgrinaker
Copy link
Contributor Author

CI failures appear to be unrelated flake.

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for jumping on this! I suppose if we were worried about the impact of a disk stall on shutdown, we could do something where we give up on the operation after some amount of time. But, I buy the paragraph you wrote about mitigations.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker, @stevendanna, and @tbg)


pkg/kv/kvserver/liveness/liveness.go line 1277 at r2 (raw file):

			resultCs[i], _ = nl.engineSyncs.DoChan(strconv.Itoa(i), func() (interface{}, error) {
				var taskErr error
				if err := nl.stopper.RunTask(ctx, "liveness-hb-diskwrite", func(ctx context.Context) {

Looks like there is also RunTaskWithErr that would cut down on some of the boilerplate here, but it's fine as is.

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker, @stevendanna, and @tbg)

Copy link
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks!

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if we were worried about the impact of a disk stall on shutdown, we could do something where we give up on the operation after some amount of time. But, I buy the paragraph you wrote about mitigations.

Yeah, but I don't know if that's always going to be safe -- for example, in this case that means that we'll be closing Pebble while we still have in-flight writes, which is what prompted this in the first place. There's no way to cancel these writes afaik.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @nicktrav, @stevendanna, and @tbg)


pkg/kv/kvserver/liveness/liveness.go line 1277 at r2 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Looks like there is also RunTaskWithErr that would cut down on some of the boilerplate here, but it's fine as is.

Oh nice, that's better -- thanks! Definitely not "fine" as is, more like functional. :)

This patch runs the sync disk write during node heartbeats in a stopper
task. The write is done in a goroutine, so that we can respect the
caller's context cancellation (even though the write itself won't).
However, this could race with engine shutdown when stopping the node,
violating the Pebble contract and triggering the race detector. Running
it as a stopper task will cause the node to wait for the disk write to
complete before closing the engine.

Of course, if the disk stalls then node shutdown will now never
complete. This is very unfortunate, since stopping the node is often
the only mitigation to recover stuck ranges with stalled disks. This is
mitigated by Pebble panic'ing the node on stalled disks, and Kubernetes
and other orchestration tools killing the process after some time.

Release note: None
@knz
Copy link
Contributor

knz commented May 25, 2022

respect the caller's context cancellation (even though the write itself won't)

the write can't? I think go supports this via deadlines on the os.File handle.

@erikgrinaker
Copy link
Contributor Author

respect the caller's context cancellation (even though the write itself won't)

the write can't? I think go supports this via deadlines on the os.File handle.

Yes, but that applies to all writes, not just this one. We'd also have to propagate that through Pebble. Adding deadlines for all Pebble writes is a riskier change than I want to make here.

@erikgrinaker
Copy link
Contributor Author

TFTRs!

bors r=stevendanna,nicktrav

@craig
Copy link
Contributor

craig bot commented May 25, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: pebble panic from WriteSyncNoop kv/kvserver: TestReplicaDrainLease failed
5 participants