-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: no sync between splits and follower reads can cause invalid reads #67016
Comments
We've discussed this a bit. Taking |
@andreimatei is fixing this unblocked by the merge of #66845? |
#66845 helps. But beyond that, I wouldn't say it's particularly clear what to do here because I'm not very confident in the (lack of) consequences for dropping |
Related to #55461 which will help unblock this work |
Update for posterity: We now (as of #76312) check the range bounds and eagerly capture a storage engine snapshot while holding onto @nvanbenschoten (or @tbg) do you mind confirming whether the above seems correct to you? |
Yes, this sounds correct to me. We'll need to make below-raft range boundary changes synchronized with respect to above-raft {range desc checks, destroy checks, engine snapshots}. The most straightforward way to do that seems like it would be to grab the Another option would be to make this optimistic by re-checking the range descriptor ( Also, now that we grab the engine snapshot eagerly, we should release |
Is there an issue that discusses how we can get rid of readOnlyCmdMu? If not, I can file one to track it separately. I see some discussion about it in #43048.
Thanks for confirming this bit too. I can track it separately once we have an issue that talks about removing |
I don't see one, so we should open up a new issue. |
This patch fixes a bug in how follower reads are synchronized with the application of concurrent split operations. Reads on the leaseholder are serialized with concurrent split operations by latching. However, splits are simply applied on the follower, and as such, don't go through latching like they do on the leaseholder. Previously, this could lead to invalid reads in cases where the range split and the RHS was removed after the range descriptor's bounds were checked but before a storage snapshot was acquired. This patch fixes this hazard by checking the range bounds after acquiring the storage snapshot (in addition to before, like we used to prior to this change). It also adds a couple of tests -- one exercising the exact scenario described in the associated issue and another that runs concurrent split/read operations without tightly controlling the synchronization between them. Fixes cockroachdb#67016 Release note (bug fix): fixes a rare bug where concurrent follower read/split operations could lead to invalid read results.
89886: kvserver: ensure follower reads correctly synchronize with splits r=arulajmani a=arulajmani This patch fixes a bug in how follower reads are synchronized with the application of concurrent split operations. Reads on the leaseholder are serialized with concurrent split operations by latching. However, splits are simply applied on the follower, and as such, don't go through latching like they do on the leaseholder. Previously, this could lead to invalid reads in cases where the range split and the RHS was removed after the range descriptor's bounds were checked but before a storage snapshot was acquired. This patch fixes this hazard by checking the range bounds after acquiring the storage snapshot (in addition to before, like we used to prior to this change). It also adds a couple of tests -- one exercising the exact scenario described in the associated issue and another that runs concurrent split/read operations without tightly controlling the synchronization between them. Fixes #67016 Release note (bug fix): fixes a rare bug where concurrent follower read/split operations could lead to invalid read results. 90456: sqlsmith: do not error when UDTs have no members r=mgartner a=mgartner Fixes #90433 Release note: None Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Marcus Gartner <[email protected]>
Now that we've fixed cockroachdb#67016, and follower reads correctly synchronize with concurrent splits, it's safe for us to serve ExportRequests from followers. This patch permits that. Closes cockroachdb#88804 Release note: None
This patch fixes a bug in how follower reads are synchronized with the application of concurrent split operations. Reads on the leaseholder are serialized with concurrent split operations by latching. However, splits are simply applied on the follower, and as such, don't go through latching like they do on the leaseholder. Previously, this could lead to invalid reads in cases where the range split and the RHS was removed after the range descriptor's bounds were checked but before a storage snapshot was acquired. This patch fixes this hazard by checking the range bounds after acquiring the storage snapshot (in addition to before, like we used to prior to this change). It also adds a couple of tests -- one exercising the exact scenario described in the associated issue and another that runs concurrent split/read operations without tightly controlling the synchronization between them. Fixes cockroachdb#67016 Release note (bug fix): fixes a rare bug where concurrent follower read/split operations could lead to invalid read results.
Now that we've fixed cockroachdb#67016, and follower reads correctly synchronize with concurrent splits, it's safe for us to serve ExportRequests from followers. This patch permits that. Closes cockroachdb#88804 Release note: None
Now that we've fixed cockroachdb#67016, and follower reads correctly synchronize with concurrent splits, it's safe for us to serve ExportRequests from followers. This patch permits that. Closes cockroachdb#88804 Release note: None
91405: kv: permit ExportRequest evaluation on followers r=nvanbenschoten a=arulajmani Now that we've fixed #67016, and follower reads correctly synchronize with concurrent splits, it's safe for us to serve ExportRequests from followers. This patch permits that. Closes #88804 Release note: None 93124: roachtest: check logger file before dereference r=smg260 a=smg260 Another nil dereferece fix like #92845. [Error in TC](https://teamcity.cockroachdb.com/viewLog.html?buildId=7824979&buildTypeId=Cockroach_Nightlies_RoachtestNightlyGceBazel) Release note: none Epic: none Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Miral Gadani <[email protected]>
On leaseholders, latches provide serialization between different classes of operation. On followers, however, latches don't provide the same protection, because requests that are simply applied on a follower don't take latches. Follower reads do take latches, but I think that's by accident and useless given that writes don't take their latches.
At the very least, the following scenario involving a follower read and a split applying concurrently seems possible:
In addition to latches, reads also take read locks on
readOnlyCmdMu
. Some range-level operations also take this lock, so the lock acts as a kind of structural synchronization. For example, I think removing a replica takes this lock (since #64324). Applying a split, however, does not. Maybe it should?Alternatively, I think if step 1) in the scenario instead read "follower read across all of range begins, binds Pebble snapshot, checks range descriptor", then I think the hazard would be avoided because the "checking of the range descriptor" would provide sufficient synchronization with the split. Currently, reads don't bind their Pebble snapshot early (#55461), but I believe they could easily do it now that we have #66845. If we could bind the snapshot early, we could order it before a range descriptor check.
Besides this particular split/follower-read scenario, it seems to me that the topic of latches on followers should be explored. If the latches taken by follower-reads are not useful, we shouldn't take them. The evaluation of requests accesses both MVCC data and also in-memory range state. Access to this state is supposed to be made safe by latches, as asserted by
SpanSetReplicaEvalContext
. On followers, I thinkSpanSetReplicaEvalContext
does not provide the guarantees that it thinks it does, which suggests that follower-reads should only have access to a much-reducedEvalContext
which doesn't allow access to the range's in-memory state. Hopefully this would be sufficient for the limited set of requests that we evaluate on followers.Jira issue: CRDB-8330
The text was updated successfully, but these errors were encountered: