-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: provide a way to reset ContainsEstimates via stats recomputation #37120
Comments
cc @knz because we talked about this |
Pointers to understand/tackle this: The stats sit on the Replica, here (and in a persisted key that's kept in sync with the in-memory object) cockroach/pkg/storage/replica.go Lines 795 to 802 in 947878a
Whenever a Raft command applies, we add a corresponding stats update: cockroach/pkg/storage/replica_raft.go Lines 2343 to 2348 in 5c6824f
This happens on every Replica (i.e. three times with a default 3x replicated range), but it's deterministic, so all replicas should have identical stats (at any given position). If this isn't the case it's bad, but this isn't something we worry about in this issue. Some Raft commands set the The consistency checker is a queue that periodically computes a checksum across the replicas and makes sure it matches (at the same log position across the replicas). cockroach/pkg/storage/consistency_queue.go Line 125 in fcfd9b8
As a side effect of that (since it has to look at all the data anyway), it can recompute the "true" MVCCStats of the range, and compare it to what the range claims the MVCCStats are. Absent a bug, these stats can only disagree if If a delta is found, the consistency checker initiates a cockroach/pkg/storage/replica_consistency.go Lines 198 to 202 in c42e3eb
This request computes the current stats (it's kind of questionable that it scans the range again, but let's leave that alone) and adds the delta to them which is then applied to the stats through a raft command: cockroach/pkg/storage/batcheval/cmd_recompute_stats.go Lines 86 to 97 in 2558dcc
Now you'd hope that this would also reset The solution is to turn the boolean into something that commutes, i.e. integer addition. Instead of Steps to do this:
Note that this migration doesn't cover all the bases. There's a case in which the result will be a divergence of the stats. As sort of a pop quiz I'll leave open how that can happen. I have a way out as well. (Updates will be required to the list above). |
Just a ping that I am working on this although I haven't pinged for several days. On a low level note, I guess the new behavior for AddSSTable should also be implemented for time series merges as well (in |
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves cockroachdb#37120 Release note: None
This migration makes ContainsEstimates a counter so that the consistency checker can reset it (by returning a -ContainsEstimates) delta) without racing with another command that introduces new estimate stats. Resolves cockroachdb#37120 Release note: None
This migration makes ContainsEstimates a counter so that the consistency checker can reset it (by returning a -ContainsEstimates) delta) without racing with another command that introduces new estimate stats. Resolves cockroachdb#37120 Release note: None
This migration makes ContainsEstimates a counter so that the consistency checker can reset it (by returning a -ContainsEstimates) delta) without racing with another command that introduces new estimate stats. Resolves cockroachdb#37120 Release note: None
This migration makes ContainsEstimates a counter so that the consistency checker can reset it (by returning a -ContainsEstimates) delta) without racing with another command that introduces new estimate stats. Resolves cockroachdb#37120 Release note: None
This migration makes ContainsEstimates a counter so that the consistency checker can reset it (by returning a -ContainsEstimates) delta) without racing with another command that introduces new estimate stats. Resolves cockroachdb#37120 Release note: None
This migration makes ContainsEstimates a counter so that the consistency checker can reset it (by returning a -ContainsEstimates) delta) without racing with another command that introduces new estimate stats. Resolves cockroachdb#37120 Release note: None
37583: storage: Migrate MVCCStats.contains_estimates from bool to int64 r=tbg a=giorgosp This migration allows ContainsEstimates to be reset after a stats recomputation (by returning a -ContainsEstimates delta) without worrying about a race condition. Another command may add 1 to it flag and it will still be stored as valid. Resolves #37120 Release note: None 42129: colexec: fix AND and OR projections in some cases r=yuzefovich a=yuzefovich Previously, the original batch length was not respected when the selection vector is present. This resulted in, for example, query 19 of TPCH benchmark to return an error. This is now fixed. I have troubles coming up with a reduced reproduction though. I also checked that on release-19.2 branch the query is executed correctly with vectorized, so it must be the switch to flat bytes that triggers the problem. Release note: None 42172: colexec: fix sorted distinct with nulls behavior r=yuzefovich a=yuzefovich Previously, sorted distinct when the nulls might be present would always get the value at the index without paying attention whether that value is actually now. This is incorrect behavior because it is undefined in some cases (like when getting from flat bytes). Now this is fixed. Fixes: #42055. Release note: None Co-authored-by: George Papadrosou <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
See #36907 for context. Time series merges and
AddSSTable
requests can "taint" the range stats by introducing theContainsEstimates
flag. The consistency checker is supposed to notice this and will trigger a stats recomputation. However, it doesn't have a good way to actually reset theContainsEstimates
flag, even after stats are correct. This is because the recomputation relies on the commutativity of stats deltas, but resetting a boolean does not commute with setting that boolean. In other words, if the recomputation attempted to reset the boolean, it couldn't make sure that some other command simultaneously introduces an estimate that is then not reflected in the final, untainted, mvccstats.A simple fix is to upgrade the
MVCCStats.ContainsEstimates
field from a bool to a counter that is incremented with each Raft command containing estimates. The stats recomputation would emit the negative count observed on the stats contained in the snapshots it's using to compute a stats delta. In practice, this will almost always lead to ContainsEstimates==0, except when an estimate raced in, in which case it will be counted correctly. ContainsEstimates should never become negative since that could only happen with two concurrent recomputations, and I believe we prevent those already (or they could mangle the stats).Labeling as E-intermediate because there's a bit of a migration involved.
The text was updated successfully, but these errors were encountered: