kvserver: no sync between splits and follower reads can cause invalid reads #67016

andreimatei · 2021-06-29T16:48:40Z

On leaseholders, latches provide serialization between different classes of operation. On followers, however, latches don't provide the same protection, because requests that are simply applied on a follower don't take latches. Follower reads do take latches, but I think that's by accident and useless given that writes don't take their latches.
At the very least, the following scenario involving a follower read and a split applying concurrently seems possible:

follower read across all of range begins, checks range descriptor
range split applies
RHS range gets removed
follower read misses RHS keys

In addition to latches, reads also take read locks on readOnlyCmdMu. Some range-level operations also take this lock, so the lock acts as a kind of structural synchronization. For example, I think removing a replica takes this lock (since #64324). Applying a split, however, does not. Maybe it should?

Alternatively, I think if step 1) in the scenario instead read "follower read across all of range begins, binds Pebble snapshot, checks range descriptor", then I think the hazard would be avoided because the "checking of the range descriptor" would provide sufficient synchronization with the split. Currently, reads don't bind their Pebble snapshot early (#55461), but I believe they could easily do it now that we have #66845. If we could bind the snapshot early, we could order it before a range descriptor check.

Besides this particular split/follower-read scenario, it seems to me that the topic of latches on followers should be explored. If the latches taken by follower-reads are not useful, we shouldn't take them. The evaluation of requests accesses both MVCC data and also in-memory range state. Access to this state is supposed to be made safe by latches, as asserted by SpanSetReplicaEvalContext. On followers, I think SpanSetReplicaEvalContext does not provide the guarantees that it thinks it does, which suggests that follower-reads should only have access to a much-reduced EvalContext which doesn't allow access to the range's in-memory state. Hopefully this would be sufficient for the limited set of requests that we evaluate on followers.

Jira issue: CRDB-8330

The text was updated successfully, but these errors were encountered:

andreimatei · 2021-07-01T15:53:34Z

We've discussed this a bit. Taking readonlyCmdMu more on the application path in write-mode, below Raft, seems dangerous as long as the lock is being held for the whole duration of read evaluation. It seems that the best way forward, at least in the short term, is to reduce the time the lock is held by evaluation by dropping it soon after r.checkExecutionCanProceed(). And couple that with eagerly-binding the Pebble snapshot while the lock is held. We've also discussed auditing checkExecutionCanProceed() to make sure that the replica state it accesses is properly synchronized with Pebble state (i.e. we write to Pebble before mutating in-memory state, or otherwise protect against checkExecutionCanProceed prematurely succeeding).

sumeerbhola · 2021-07-07T12:56:33Z

@andreimatei is fixing this unblocked by the merge of #66845?

andreimatei · 2021-07-07T15:20:22Z

#66845 helps. But beyond that, I wouldn't say it's particularly clear what to do here because I'm not very confident in the (lack of) consequences for dropping readonlyCmdMu early.

mwang1026 · 2021-10-28T16:15:55Z

Related to #55461 which will help unblock this work

aayushshah15 · 2022-06-21T21:01:31Z

Update for posterity: We now (as of #76312) check the range bounds and eagerly capture a storage engine snapshot while holding onto readOnlyCmdMu. However, it still seems like this bug might be possible because nothing in the split application path acquires readOnlyCmdMu. If we changed that (i.e. if splits only updated the LHS range descriptor while holding on to readOnlyCmdMu), we should no longer have this bug.

@nvanbenschoten (or @tbg) do you mind confirming whether the above seems correct to you?

nvanbenschoten · 2022-06-21T21:23:58Z

Yes, this sounds correct to me. We'll need to make below-raft range boundary changes synchronized with respect to above-raft {range desc checks, destroy checks, engine snapshots}. The most straightforward way to do that seems like it would be to grab the readOnlyCmdMu in exclusive mode when applying a raft entry that changes a range's key bounds.

Another option would be to make this optimistic by re-checking the range descriptor (checkSpanInRangeRLocked) after grabbing an engine snapshot. We already determined that we need to grab the r.mu.RLock after this snapshot for #55293. In fact, it seems possible that we could remove readOnlyCmdMu entirely by also rechecking isDestroyedRLocked after grabbing an engine snapshot.

Also, now that we grab the engine snapshot eagerly, we should release readOnlyCmdMu before evaluation instead of after evaluation due to the current defer r.readOnlyCmdMu.RUnlock() in executeReadOnlyBatch. That will be important to avoid blocking Raft on batch evaluation. We'll need to do the same thing in executeWriteBatch.

aayushshah15 · 2022-06-21T21:48:29Z

Another option would be to make this optimistic by re-checking the range descriptor (checkSpanInRangeRLocked) after grabbing an engine snapshot. We already determined that we need to grab the r.mu.RLock after this snapshot for #55293. In fact, it seems possible that we could remove readOnlyCmdMu entirely by also rechecking isDestroyedRLocked after grabbing an engine snapshot.

Is there an issue that discusses how we can get rid of readOnlyCmdMu? If not, I can file one to track it separately. I see some discussion about it in #43048.

Also, now that we grab the engine snapshot eagerly, we should release readOnlyCmdMu before evaluation instead of after evaluation due to the current defer r.readOnlyCmdMu.RUnlock() in executeReadOnlyBatch. That will be important to avoid blocking Raft on batch evaluation. We'll need to do the same thing in executeWriteBatch

Thanks for confirming this bit too. I can track it separately once we have an issue that talks about removing readOnlyCmdMu.

nvanbenschoten · 2022-07-05T21:16:59Z

Is there an issue that discusses how we can get rid of readOnlyCmdMu? If not, I can file one to track it separately.

I don't see one, so we should open up a new issue.

This patch fixes a bug in how follower reads are synchronized with the application of concurrent split operations. Reads on the leaseholder are serialized with concurrent split operations by latching. However, splits are simply applied on the follower, and as such, don't go through latching like they do on the leaseholder. Previously, this could lead to invalid reads in cases where the range split and the RHS was removed after the range descriptor's bounds were checked but before a storage snapshot was acquired. This patch fixes this hazard by checking the range bounds after acquiring the storage snapshot (in addition to before, like we used to prior to this change). It also adds a couple of tests -- one exercising the exact scenario described in the associated issue and another that runs concurrent split/read operations without tightly controlling the synchronization between them. Fixes cockroachdb#67016 Release note (bug fix): fixes a rare bug where concurrent follower read/split operations could lead to invalid read results.

89886: kvserver: ensure follower reads correctly synchronize with splits r=arulajmani a=arulajmani This patch fixes a bug in how follower reads are synchronized with the application of concurrent split operations. Reads on the leaseholder are serialized with concurrent split operations by latching. However, splits are simply applied on the follower, and as such, don't go through latching like they do on the leaseholder. Previously, this could lead to invalid reads in cases where the range split and the RHS was removed after the range descriptor's bounds were checked but before a storage snapshot was acquired. This patch fixes this hazard by checking the range bounds after acquiring the storage snapshot (in addition to before, like we used to prior to this change). It also adds a couple of tests -- one exercising the exact scenario described in the associated issue and another that runs concurrent split/read operations without tightly controlling the synchronization between them. Fixes #67016 Release note (bug fix): fixes a rare bug where concurrent follower read/split operations could lead to invalid read results. 90456: sqlsmith: do not error when UDTs have no members r=mgartner a=mgartner Fixes #90433 Release note: None Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Marcus Gartner <[email protected]>

Now that we've fixed cockroachdb#67016, and follower reads correctly synchronize with concurrent splits, it's safe for us to serve ExportRequests from followers. This patch permits that. Closes cockroachdb#88804 Release note: None

This patch fixes a bug in how follower reads are synchronized with the application of concurrent split operations. Reads on the leaseholder are serialized with concurrent split operations by latching. However, splits are simply applied on the follower, and as such, don't go through latching like they do on the leaseholder. Previously, this could lead to invalid reads in cases where the range split and the RHS was removed after the range descriptor's bounds were checked but before a storage snapshot was acquired. This patch fixes this hazard by checking the range bounds after acquiring the storage snapshot (in addition to before, like we used to prior to this change). It also adds a couple of tests -- one exercising the exact scenario described in the associated issue and another that runs concurrent split/read operations without tightly controlling the synchronization between them. Fixes cockroachdb#67016 Release note (bug fix): fixes a rare bug where concurrent follower read/split operations could lead to invalid read results.

Now that we've fixed cockroachdb#67016, and follower reads correctly synchronize with concurrent splits, it's safe for us to serve ExportRequests from followers. This patch permits that. Closes cockroachdb#88804 Release note: None

91405: kv: permit ExportRequest evaluation on followers r=nvanbenschoten a=arulajmani Now that we've fixed #67016, and follower reads correctly synchronize with concurrent splits, it's safe for us to serve ExportRequests from followers. This patch permits that. Closes #88804 Release note: None 93124: roachtest: check logger file before dereference r=smg260 a=smg260 Another nil dereferece fix like #92845. [Error in TC](https://teamcity.cockroachdb.com/viewLog.html?buildId=7824979&buildTypeId=Cockroach_Nightlies_RoachtestNightlyGceBazel) Release note: none Epic: none Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Miral Gadani <[email protected]>

aayushshah15 assigned arulajmani Jun 21, 2022

aayushshah15 assigned aayushshah15 and unassigned arulajmani Jun 21, 2022

nvanbenschoten mentioned this issue Sep 27, 2022

kv: permit Export request evaluation on follower replicas #88804

Closed

arulajmani assigned arulajmani and unassigned aayushshah15 Oct 3, 2022

arulajmani mentioned this issue Oct 13, 2022

kvserver: ensure follower reads correctly synchronize with splits #89886

Merged

craig bot closed this as completed in e1f85b6 Oct 24, 2022

arulajmani mentioned this issue Oct 25, 2022

release-22.2: kvserver: ensure follower reads correctly synchronize with splits #90624

Merged

arulajmani mentioned this issue Nov 7, 2022

kv: permit ExportRequest evaluation on followers #91405

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: no sync between splits and follower reads can cause invalid reads #67016

kvserver: no sync between splits and follower reads can cause invalid reads #67016

andreimatei commented Jun 29, 2021 •

edited by cockroach-jira-scripts

Loading

andreimatei commented Jul 1, 2021

sumeerbhola commented Jul 7, 2021

andreimatei commented Jul 7, 2021

mwang1026 commented Oct 28, 2021

aayushshah15 commented Jun 21, 2022

nvanbenschoten commented Jun 21, 2022

aayushshah15 commented Jun 21, 2022

nvanbenschoten commented Jul 5, 2022

kvserver: no sync between splits and follower reads can cause invalid reads #67016

kvserver: no sync between splits and follower reads can cause invalid reads #67016

Comments

andreimatei commented Jun 29, 2021 • edited by cockroach-jira-scripts Loading

andreimatei commented Jul 1, 2021

sumeerbhola commented Jul 7, 2021

andreimatei commented Jul 7, 2021

mwang1026 commented Oct 28, 2021

aayushshah15 commented Jun 21, 2022

nvanbenschoten commented Jun 21, 2022

aayushshah15 commented Jun 21, 2022

nvanbenschoten commented Jul 5, 2022

andreimatei commented Jun 29, 2021 •

edited by cockroach-jira-scripts

Loading