kvserver: remove consistency checker diffs #21128

tbg · 2017-12-29T20:03:43Z

We are discussing supporting larger ranges and making first steps in their implementation, see for example #16954.

The consistency checker on failure currently requests a "diff" but this diff is implemented by sending a whole snapshot through memory (in CollectChecksumResponse). This won't work any more once ranges are allowed to be larger than a small fraction of memory.

gz#9424

Jira issue: CRDB-5898

The text was updated successfully, but these errors were encountered:

petermattis · 2020-04-13T23:52:25Z

In discussion today, @sumeerbhola was wondering how we compute the consistency checker diff. The answer is we buffer the entire Range data and send it in a single response. I tracked down this issue from this comment in the code:

  // TODO(tschottdorf): with larger ranges, this is no longer tenable.
  // See https://github.com/cockroachdb/cockroach/issues/21128.
  roachpb.RaftSnapshotData snapshot = 2;

Cc @ajwerner, @andy-kimball, and @johnrk as this throws a bit of a wrench into larger range sizes.

tbg · 2020-04-14T13:11:02Z

Just in case this isn't clear, the diff is requested only after a first round of the check already failed. It's still bad.

petermattis · 2020-04-14T13:39:34Z

I suspect the solution here would be to add a streaming RPC interface.

ajwerner · 2020-04-14T14:07:02Z

I suspect the solution here would be to add a streaming RPC interface.

This isn't going to be a readily back-portable change. This makes me nervous.

tbg · 2020-04-14T14:16:56Z

How big are the ranges in 20.1? We can always not send a range data snapshot if it would be too large. It would be unfortunate to see this in practice, but the data checkpoints would be there anyway, so we could reconstruct the diff manually later.

ajwerner · 2020-04-14T14:18:49Z

How big are the ranges in 20.1?

range_max_size defaults to 512 MiB and range_min_size defaults to 128 MiB. This is likely to be bad on tiny deployments but fine on most deployments.

petermattis · 2020-04-14T15:07:36Z

This isn't going to be a readily back-portable change. This makes me nervous.

This is a good point. Disabling sending range snapshot data for large ranges might be the only back-portable change. My suggestion was more about a long-term fix. We actually already have a facility for streaming range data that we use for sending snapshots. Doesn't seem like it could be easily adapted here, though.

erikgrinaker · 2022-09-01T08:55:38Z

We're doing some adjacent work over in #77432, and plan to remove the consistency checker diffs while we're at it, since they're pretty unwieldy at 512 MB. This involves removing the following fields and functionality:

roachpb.CheckConsistencyRequest.WithDiff
roachpb.ComputeChecksumRequest.Snapshot
roachpb.CollectChecksumResponse.Snapshot
roachpb.RaftSnapshotData

It might be possible to do this without requiring a version gate and keeping the old code around for mixed-version clusters, because the current code will gracefully ignore responses that don't have a snapshot:

cockroach/pkg/kv/kvserver/replica_consistency.go

Lines 160 to 168 in 7f61dfd

    
           minoritySnap := results[shaToIdxs[minoritySHA][0]].Response.Snapshot 
        
           curSnap := results[shaToIdxs[sha][0]].Response.Snapshot 
        
           if sha != minoritySHA && minoritySnap != nil && curSnap != nil { 
        
           	diff := diffRange(curSnap, minoritySnap) 
        
           	if report := r.store.cfg.TestingKnobs.ConsistencyTestingKnobs.BadChecksumReportDiff; report != nil { 
        
           		report(*r.store.Ident, diff) 
        
           	} 
        
           	buf.Printf("====== diff(%x, [minority]) ======\n%v", redact.Safe(sha), diff) 
        
           }

However, we'll need some mechanism to obtain such diffs for debugging. We already take Pebble checkpoints when inconsistencies are found, so we'll need to write e.g. a cockroach debug diff-checkpoints CLI commands to compare them between good/bad nodes:

cockroach/pkg/kv/kvserver/replica_consistency.go

Lines 785 to 798 in 7f61dfd

    
           if cc.Checkpoint { 
        
           	sl := stateloader.Make(r.RangeID) 
        
           	as, err := sl.LoadRangeAppliedState(ctx, snap) 
        
           	if err != nil { 
        
           		log.Warningf(ctx, "unable to load applied index, continuing anyway") 
        
           	} 
        
           	// NB: the names here will match on all nodes, which is nice for debugging. 
        
           	tag := fmt.Sprintf("r%d_at_%d", r.RangeID, as.RaftAppliedIndex) 
        
           	if dir, err := r.store.checkpoint(ctx, tag); err != nil { 
        
           		log.Warningf(ctx, "unable to create checkpoint %s: %+v", dir, err) 
        
           	} else { 
        
           		log.Warningf(ctx, "created checkpoint %s", dir) 
        
           	} 
        
           }

This could reuse the existing diff code:

cockroach/pkg/kv/kvserver/replica_consistency_diff.go

Lines 82 to 91 in ee57d72

    
           // diffs the two kv dumps between the lease holder and the replica. 
        
           func diffRange(l, r *roachpb.RaftSnapshotData) ReplicaSnapshotDiffSlice { 
        
           	if l == nil || r == nil { 
        
           		return nil 
        
           	} 
        
           	var diff ReplicaSnapshotDiffSlice 
        
           	diff = append(diff, diffKVs(l.KV, r.KV)...) 
        
           	diff = append(diff, diffRangeKeys(l.RangeKV, r.RangeKV)...) 
        
           	return diff 
        
           }

sumeerbhola · 2022-09-06T13:57:41Z

and plan to remove the consistency checker diffs while we're at it, since they're pretty unwieldy at 512 MB

Are we going to lose diff support?

pav-kv · 2022-09-06T14:06:57Z

Likely yes, in its current form (an API that transfers the entire range). But we want diffs to be available in some form out-of-band, e.g. have tooling for constructing it from the checkpoints saved on replicas.

We could also explore some forms of partial/sampled diffing, e.g. detect one sub-range with a mismatch, and diff only that part.

pav-kv · 2022-09-06T14:56:06Z

One option for a partial diff:

Have each replica compute a sequence of hashes instead of one hash, spaced out (deterministically) across the range so that there aren't too many of them, and each hash spans a relatively small size. ~10-100 would do.
Compare these sequences of hashes in the first phase of the diff algo. (where we previously compared a single hash)
When there is a mismatch, drill down into the first hash that mismatches: exchange the corresponding spans of keys and diff them. (like we did before for the whole range)

We could purposefully limit this to 1 mismatching key (it's the worst case anyway, if there is one large k/v entry), or to some reasonable byte-size worth of keys, and leave full analysis to the offline tooling.

One advantage of retaining some inline diffing (instead of fully removing it) is in having somewhat more informative consistency check failures should they occur in tests like #87106. @tbg How would a human debug those things? I imagine that I would maybe look at the first few mismatching entries in this diff and try to understand what's up with them, but there is probably benefit in seeing larger patterns in it too.

tbg · 2022-09-07T15:05:08Z

A human would always investigate the checkpoints. The diff would serve the guide them as to which keys are of interest, but there isn't a huge upshot in having the diff available upfront (and if there were we could build the retrieval into roachtest). The diff alone isn't that useful, though it would be useful, but not at the expense of lots of engineering.

pav-kv · 2022-09-28T13:08:58Z

A human would always investigate the checkpoints.

SGTM.

I’m thinking of the following flow when an inconsistency happens:

Like today, crash suspicious nodes (those in the minority when SHA mismatches).
- This will get surfaced to admins via some crash reporting mechanism.
- The crash message should have enough information to guide the investigation.
Checkpoint all replicas (we need at least one good, for diffing and analysis; but it’s better to have all of them, because even the majority can be “wrong” - e.g. leader may pass its corrupted snapshot to some followers).
Admin uses cockroach debug tool to analyse and eventually remove the checkpoints.

Issues:

Checkpoints become expensive over time, so they should be deleted timely (on the good nodes).
This may not scale if there are too many failures like this.
- Could be solved with some automatic checkpoint GC.
A minority node may never learn that it needs to crash because of an inconsistency (it may crash for another reason, get stuck, etc). So the admins may miss this event.
The thing that I still don’t like about the checkpoint is that it’s not granular per range, so checkpointing good replicas is costly.
- We may want to save "lightweight" checkpoints (specific to the range) on all the nodes that are not killed.

erikgrinaker · 2022-09-28T15:33:54Z

Checkpoints become expensive over time, so they should be deleted timely (on the good nodes).

We should clearly call this out in the crash message, so that admins know they'll have to remove this when they no longer need it. I think that should be sufficient for now. We may want to e.g. log a message on startup if there are any checkpoints too. It's worth pinging the SREs real quick about how they'd like this to be surfaced.

@joshimhoff When a consistency check fails, we leave behind a checkpoint of the current Pebble state on the relevant nodes (with replicas for that range). These are needed to debug the inconsistency. These are never removed, so over time they'll essentially be an old copy of the node's database. How would SREs prefer to be notified about these checkpoints, which will need to be removed manually when no longer needed for debugging? Currently, the inconsistent nodes go into a panic loop with a log message, so the simplest solution is to just call this out in the crash message -- but we'll also keep checkpoints on the good nodes that will need to be removed. Is that sufficient?

This may not scale if there are too many failures like this.

We essentially never see these failures, so let's not optimize for it.

The thing that I still don’t like about the checkpoint is that it’s not granular per range, so checkpointing good replicas is costly.

True, but they're cheap to take -- they're only expensive to keep for a long time. Let's leave the GC responsibility to admins for now, and not optimize for it.

joshimhoff · 2022-09-28T15:48:32Z

To answer your Q, I need to ask some Qs first:

If a consistency check fails, will we definitely get a panic crash loop that doesn't end until some operator takes an action? Or could we get a small number of crashes that go away without some operator taking action? I see you say "panic loop" but want to make sure I get what that means exactly.

If an operator will need to take an action, what actions might they take?

erikgrinaker · 2022-09-28T15:52:08Z

If a consistency check fails, will we definitely get a panic crash loop that doesn't end until some operator takes an action?

The node will keep crashing forever, by design. The node is borked and must be replaced.

However, only the node(s) with the inconsistencies will crash. We'll also leave checkpoints behind on the nodes with correct replicas.

If an operator will need to take an action, what actions might they take?

Make copies of the checkpoints, and either give them to CRL staff for debugging, or run some tools that produce a report to give to CRL staff.
Replace the crashing nodes that have inconsistencies.
Remove the checkpoints from all healthy nodes that have them.

erikgrinaker · 2022-09-28T16:06:30Z

The thing that I still don’t like about the checkpoint is that it’s not granular per range, so checkpointing good replicas is costly.

@pavelkalinnikov One consideration here is that we may not want to have to copy the entire checkpoint around to run the diff tool. It may be better to have a two-stage process: one command to extract replica data from a checkpoint, then copy that replica data somewhere and run a tool to diff them.

joshimhoff · 2022-09-28T19:32:28Z

The node will keep crashing forever, by design. The node is borked and must be replaced.

Ack! Then I'm confident SRE will notice the inconsistency, if the alerting stack functions as designed.

How would SREs prefer to be notified about these checkpoints, which will need to be removed manually when no longer needed for debugging?

Here are some options, in order of how likely it is that CRL actually cleans up the checkpoints in a cloud context:

Implement GC of checkpoints in CRDB itself.
Increment a prometheus gauge or similar that indicates when there are checkpoints on disk to clean up. SRE adds an alert in case the gauges fires. The alert links to a playbook that says to clean up the checkpoints, if they are no longer needed.
Just mention the checkpoints in CRDB logs. Also have an SRE playbook re: corruption that mentions the need to clean up the checkpoints.

If we go with 3, I'm not confident we will clean up checkpoints, but maybe that is okay right now, given that a corruption crash loop has never happened in cloud according to my memory, and given that the checkpoints just increase disk utilization.

tbg · 2022-09-29T07:31:21Z

A gauge seems doable and maybe a good way to solve it reliably.

diff --git a/pkg/kv/kvserver/store.go b/pkg/kv/kvserver/store.go
index fceda5df42..c6cd380766 100644
--- a/pkg/kv/kvserver/store.go
+++ b/pkg/kv/kvserver/store.go
@@ -3303,11 +3303,15 @@ func (s *Store) updateReplicationGauges(ctx context.Context) error {
 	return nil
 }
 
+func (s *Store) checkpointDir() string {
+	return filepath.Join(s.engine.GetAuxiliaryDir(), "checkpoints")
+}
+
 // checkpoint creates a RocksDB checkpoint in the auxiliary directory with the
 // provided tag used in the filepath. The filepath for the checkpoint directory
 // is returned.
 func (s *Store) checkpoint(ctx context.Context, tag string) (string, error) {
-	checkpointBase := filepath.Join(s.engine.GetAuxiliaryDir(), "checkpoints")
+	checkpointBase := s.checkpointDir()
 	_ = s.engine.MkdirAll(checkpointBase)
 
 	checkpointDir := filepath.Join(checkpointBase, tag)
@@ -3337,6 +3341,19 @@ func (s *Store) ComputeMetrics(ctx context.Context, tick int) error {
 	m := s.engine.GetMetrics()
 	s.metrics.updateEngineMetrics(m)
 
+	{
+		cpDirs, err := s.engine.List(s.checkpointDir())
+		var n int64
+		if err != nil {
+			// TODO probably need to tolerate NotFound here,
+			n = math.MaxInt64
+		} else {
+			n = int64(len(cpDirs))
+		}
+
+		someGauge.Update(n)
+	}
+
 	// Get engine Env stats.
 	envStats, err := s.engine.GetEnvStats()
 	if err != nil {

erikgrinaker · 2022-09-29T08:36:40Z

Yeah, let's do a gauge and a log message (on crash and on node startup). GC is going to be more work than it's worth for something that's very rare.

joshimhoff · 2022-09-29T13:16:12Z

Works for me! Send me the PR adding the gauge, and I'll make sure we add an alert on the CC side.

pav-kv · 2022-09-29T13:19:35Z

SGTM, thank you all for the input. A gauge seems strictly more reliable because of the issue that I pointed out above:

A minority node may never learn that it needs to crash because of an inconsistency (it may crash for another reason, get stuck, etc). So the admins may miss this event.

A gauge will be increased on all the alive replicas, so this signal will propagate.

88080: stmtdiagnostics: save the bundle on a statement timeout r=yuzefovich a=yuzefovich Previously if the traced statement is canceled due to a statement timeout, the statement bundle would be created but would fail on the insertion into the system table. This is suboptimal because we already did all the work to collect the bundle as well as it might be desired to see the partial trace, so this commit makes it so that the bundle is saved correctly. Fixes: #73477. Release note (bug fix): Previously, when a statement bundle was collected for a query that results in an error due to a `statement_timeout`, the bundle would not be saved, and this is now fixed. 88307: sql: retry validation of unique constraint r=yuzefovich a=yuzefovich Previously, when running the validation of unique constraint we would execute the query only once, and if that fails for any reason, we'd return an error. This was suboptimal because we might be performing the validation after having performed a lot of work (e.g. running an import), and the error could be a transient one (e.g. if a node that is used in the distributed validation query goes down). In order to improve the situation this commit adds a retry loop with 5 attempts with an allowlist of error that are swallowed. Fixes: #85711. Release note: None 88929: sql: add tracing spans for each statement executed within a UDF r=mgartner a=mgartner This commit creates a separate tracing span for each statement executed during evaluation of a UDF. Release note: None 88933: opt: add tests for FK delete cascade fast-path with UDF r=mgartner a=mgartner The FK delete cascade fast-path remaps the `DELETE` filter into a filter on child table, avoiding buffering the parent's deleted rows like in the non-fast-path plan. It is possible to plan this fast-path if the filters contain UDF invocations, as long as they are immutable or stable. (Of course, if the user creates a UDF with mislabeled volatility, data corruption is possible. To mitigate this and other corruption from mislabeled UDF volatility in the future, we're considering performing some analysis during `CREATE FUNCTION` to prevent users from incorrectly assigning volatility.) This commit adds tests for FK delete cascades with UDFs and removes an unnecessary TODO. Release note: None 88978: roachtest: tighten boundaries of tpch_concurrency r=yuzefovich a=yuzefovich This should remove one iteration which should keep the test under the default 10 hour timeout. Release note: None 88991: changefeedccl: Release overhead allocation r=miretskiy a=miretskiy To account for the overhead during the encoding stage, we allocate more memory resources than needed (`changefeed.event_memory_multiplier`). Those resources are released when they are amitted by the sink. Some sinks, such as file based sinks, batch many such events (to generate files of target size). This batching, when combined with compression, could result in a situation where maximum of allowed resources, `changefeed.memory.per_changefeed_limit` are all batched, making it impossible to ingest additional events without forcing a sink flush. The only way to avoid premature forced flush, is to increase the memory limit for changefeed -- but doing so is not without the costs (i.e. more events batched, more pressure on Go GC, etc). This PR adjust event resources to match the final size of the data that will be emitted into the sink -- that is, we release the overhead back into the pool, once the event processing is done. Release note: none 89035: ui: fix contention time tooltip on insights page r=ericharmeling a=ericharmeling This commit updates the High Contention Time tooltip on the insights page to accurately reflect the event's contention duration. https://cockroachlabs.slack.com/archives/G01Q9D01NTU/p1664477246961729?thread_ts=1664473625.621409&cid=G01Q9D01NTU https://www.loom.com/share/a8ce63ece586468e870d67118c0d8d5a Release note: None Release justification: bug fix 89036: eval: partial removal of usages of the stored context r=yuzefovich a=yuzefovich This PR contains several commits from #85782 which attempts to remove the stored `context.Context` from `eval.Context` object. I didn't finish that PR since I ran into two very thorny usages, so instead this PR takes some steps towards deprecating the usage of the stored context. Storing a `context.Context` is an anti-pattern that needs to have a justification for applying, and in case of `eval.Context` I think it was introduced that way specifically for scalar functions evaluations (which should use the same `context.Context`) and that was ok, but over time we introduced many more callsites out of convenience. Fixes: #46164 89066: kvserver: add storage checkpoints metric r=erikgrinaker a=pavelkalinnikov When an inconsistency between range's replicas is found, every replica's node leaves behind a storage checkpoint which can be used for finding the cause of the inconsistency. The checkpoints are cheap to make if hard links are used. However, the cost of a checkpoint on a live node increases over time because the state keeps diverging from it. Effectively it becomes a full copy of the old state, and consumes that much of disk capacity. It is important to act on the checkpoints and reclaim the space. This commit adds a store metric which exposes the number of checkpoints found, so that admins/SREs can be alerted and act on it. Touches #21128 Added a test, and also tested manually: tricked one replica to think that there is an inconsistency, this triggered all replicas to store a checkpoint and one crash; the metric on the other 2 replicas increased to 1, and I saw the corresponding directories in storage. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Eric Harmeling <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>

88969: sql: address minor typos in recent overflow fix r=mgartner a=mgartner This commit fixes minor typos introduced in #88199. Release note: None 89207: kvserver: rm consistency check diff report in tests r=erikgrinaker a=pavelkalinnikov This change removes the `BadChecksumReportDiff` testing knob. Previously, a node that runs a consistency check would report any diffs through this callback to tests. However, now the plan is to move away from computing the diff, and instead leave storage engine checkpoints on each replica so that later they can be analysed/diffed by tooling. When this plan is implemented, the initiating node will no longer be able to report diffs into the testing knob. This commit proactively refactors a test in such a way that it doesn't need the knob, and verifies the diff by reading from the storage directly. This approximates what we will do with tooling too. Touches #21128 Release note: None Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>

pav-kv · 2022-10-05T11:05:17Z

It may be better to have a two-stage process: one command to extract replica data from a checkpoint, then copy that replica data somewhere and run a tool to diff them.

For the replica extraction step, it looks like one of SSTWriter variants would be reusable:

cockroach/pkg/storage/sst_writer.go

Lines 74 to 76 in fdbeda5

    
           // MakeBackupSSTWriter creates a new SSTWriter tailored for backup SSTs which 
        
           // are typically only ever iterated in their entirety. 
        
           func MakeBackupSSTWriter(ctx context.Context, cs *cluster.Settings, f io.Writer) SSTWriter {

cockroach/pkg/storage/sst_writer.go

Lines 102 to 105 in fdbeda5

    
           // MakeIngestionSSTWriter creates a new SSTWriter tailored for ingestion SSTs. 
        
           // These SSTs have bloom filters enabled (as set in DefaultPebbleOptions) and 
        
           // format set to RocksDBv2. 
        
           func MakeIngestionSSTWriter(

Are there any other options? How are range snapshots encoded, is it by MakeIngestionSSTWriter or something else?

Between "ingestion" and "backup", I will probably start with the latter. The former is interesting though if we imagine wanting to recover inconsistent replicas after the investigation. We can look at this option later, as for now the recovery/upreplication will happen as a result of some nodes crashing.

erikgrinaker · 2022-10-05T11:14:48Z

I'd go with MakeIngestionSSTWriter, since it's more general purpose and uses typical options (it e.g. has bloom filters enabled and smaller blocks for faster seeks). Range snapshots use MakeIngestionSSTWriter.

sumeerbhola · 2022-10-05T13:47:09Z

Does the node crashing get triggered by a raft command on a replica? That would make that replica internally consistent in its state machine RaftAppliedIndex+state machine state, but not necessarily for other replicas that may be in the middle of doing a sequence of (atomic ingest of AddSSTable, atomic update of RaftAppliedIndex). Is the offline tool going to try to diff other replicas too?

erikgrinaker · 2022-10-05T13:50:43Z

Does the node crashing get triggered by a raft command on a replica

Yes.

Is the offline tool going to try to diff other replicas too?

No. As you point out, only this range's replicas are guaranteed to be consistent.

88539: sql: add telemetry for statistics forecast usage r=rytaft a=michae2 Add a few fields to the sampled_query telemetry events that will help us measure how useful table statistics forecasting is in practice. Fixes: #86356 Release note (ops change): Add five new fields to the sampled_query telemetry events: - `ScanCount`: Number of scans in the query plan. - `ScanWithStatsCount`: Number of scans using statistics (including forecasted statistics) in the query plan. - `ScanWithStatsForecastCount`: Number of scans using forecasted statistics in the query plan. - `TotalScanRowsWithoutForecastsEstimate`: Total number of rows read by all scans in the query, as estimated by the optimizer without using forecasts. - `NanosSinceStatsForecasted`: The greatest quantity of nanoseconds that have passed since the forecast time (or until the forecast time, if it is in the future, in which case it will be negative) for any table with forecasted stats scanned by this query. 89418: sqlstats: always enable tracing first time fingerprint is seen r=j82w a=j82w Fixes: #89185 The first time a fingerprint is seen tracing should be enabled. This currently is broken if sql.metrics.statement_details.plan_collection.enabled is set to false. This can cause crdb_internal.transaction_contention_events to be empty because tracing was never enabled to the contention event was never recorded. To properly fix this a new value needs to be returned on ShouldSample to tell if it is the first time a fingerprint is seen. This will remove the dependency on plan_collection feature switch. Release justification: Bug fixes and low-risk updates to new functionality. Release note (bug fix): Always enable tracing the frist time a fingerprint is seen. 89502: kvserver: do not report diff in consistency checks r=erikgrinaker,tbg a=pavelkalinnikov This commit removes reporting of the diff between replicas in case of consistency check failures. With the increased range sizes the previous approach has become infeasible. One possible way to inspect an inconsistency after this change is: 1. Run `cockroach debug range-data` tool to extract the range data from each replica's checkpoint. 2. Use standard OS tools like `diff` to analyse them. In the meantime, we are researching the UX of this alternative approach, and seeing if there can be a better tooling support. Part of #21128 Epic: none Release note (sql change): The `crdb_internal.check_consistency` function now does not include the diff between inconsistent replicas, should they occur. If an inconsistency occurs, the storage engine checkpoints should be inspected. This change is made because previously the range size limit has been increased from 64 MiB to O(GiB), so inlining diffs in consistency checks does not scale. 89529: changefeedccl: update parallel consumer metrics r=jayshrivastava a=jayshrivastava Previously, both changefeed.nprocs_flush_nanos and changefeed.nprocs_consume_event_nanos were counters that monotonically increased. This was not that useful when determining the average time it takes to consume or flush an event. Changing them to a histogram fixes this issue and allows for percentile values like p90, p99. This change also updates changefeed.nprocs_in_flight_count to sample values when incrementing inFlight variable. Previously, it was showing up at 0 in the UI. This change makes it show the actual value. Fixes #89654 Release note: None Epic: none 89660: tree: use FastIntSet during typechecking r=jordanlewis a=jordanlewis Previously, the typecheck phase used several slices of ordinals into lists. This is a perfect use case for FastIntSet, because the ordinals tend to be quite small. This commit switches to use FastIntSet instead. ``` name old time/op new time/op delta TypeCheck-10 3.68µs ± 3% 3.52µs ± 2% -4.52% (p=0.000 n=9+10) name old alloc/op new alloc/op delta TypeCheck-10 744B ± 0% 576B ± 0% -22.58% (p=0.000 n=10+10) name old allocs/op new allocs/op delta TypeCheck-10 32.0 ± 0% 18.0 ± 0% -43.75% (p=0.000 n=10+10) ``` Issue: None Epic: None Release note: None 89662: scop, scdep: Rename `IndexValidator` to `Validator` r=Xiang-Gu a=Xiang-Gu I've recently done work to enable adding/dropping check constraints in the declarative schema changer. It is a big PR with many commits. I think it's nicer to separate them further into multiple PRs. This is the first PR in that effort, which is merely renaming and should be easy to review: commit 1: ValidateCheckConstraint now uses ConstraintID, instead of constraint name; commit 2: Rename `IndexValidator` to `Validator`. We previously had a file under scdep called `index_validator.go` where we implement logic for validating an index. Now that we are going to validate a check constraint, we renamed them so they will also validate check constraints. Informs #89665 Release note: None Co-authored-by: Michael Erickson <[email protected]> Co-authored-by: j82w <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Jordan Lewis <[email protected]> Co-authored-by: Xiang Gu <[email protected]>

89747: restore: only read from manifests with startTime < AOST r=stevendanna a=msbutler Previously, all backup manifests in the chain could be added to a restoreSpanEntry during makeSimpleImportSpans, even if the the backup manifest's startTime was greater than the user's AOST. This caused an unecessary amount of data to get read from external storage and filtered out in the restore processor. This small patch prevents these newer manifests from being considered in makeSimpleImportSpans and thus unecessarily read from during the distsql flow. Release note: None 89888: tenantcostserver, filetable: replace some `*.executor`s with `InternalExecutorFactory` r=ajwerner a=ZhouXing19 This is part of migrating the existing usages of the internal executor to the new internal executor interfaces. The changes are based on this logic: trace up the `txn` in the exec functions of the internal executor, find out where they are from, and ensure that an IE is created along with this txn via `planner.WithInternalExecutor` or `sqlutil.InternalExecutorFactory.TxnWithExecutor`. The new interfaces allows the internal executor bound to the outer txn, along with txn-related metadata. Release note: None link Epic CRDB-19135 89899: kvserver: improve consistency check fail message r=erikgrinaker a=pavelkalinnikov Epic: none Touches #21128 Release note (ops change): Consistency check failure message is made more informative, and suggests a few actions that operators should/may do in the unlikely event it occurs. Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Jane Xing <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>

89813: kvserver: remove legacy snapshot and diff code r=erikgrinaker a=pavelkalinnikov This commit removes the proto and code for `RaftDataSnapshot`. This type was used for reporting replica snapshots and computing diffs, but now this functionality has been removed in favor of storage checkpoints and offline tooling. Part of #21128 Epic: none Release note: None 90006: roachtest: Introduce a test to overwhelm nodes r=irfansharif a=andrewbaptist This test is here to check behavior of the system as the SQL load greatly exceeds what the nodes are able to handle. In the future we want to evaulate how this is handled, but today this will cause nodes to OOM. Informs #89142. Release note: None 90063: roachtest: de-flake admission-control/tpcc-olap r=irfansharif a=irfansharif Fixes #89600. This test makes use of `workload querybench`, which is only available in the `workload` binary. This test started failing after \#89482 where we (attempted) to replace all uses of `./workload` with `./cockroach workload`. Release note: None Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: irfan sharif <[email protected]>

tbg added this to the 2.1 milestone Dec 29, 2017

petermattis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. labels Jul 21, 2018

tbg modified the milestones: 2.1, 2.2 Jul 31, 2018

petermattis removed this from the 2.2 milestone Oct 5, 2018

tbg mentioned this issue May 6, 2020

kv/kvserver: better throttling/pacing of the consistency queue for bigger disks #47290

Closed

jlinder added the T-kv KV Team label Jun 16, 2021

erikgrinaker added T-kv-replication and removed T-kv KV Team labels May 31, 2022

exalate-issue-sync bot assigned pav-kv Sep 1, 2022

erikgrinaker changed the title ~~storage: consistency checker diffs won't work with increased range size~~ kvserver: remove consistency checker diffs Sep 1, 2022

pav-kv mentioned this issue Sep 30, 2022

kvserver: add storage checkpoints metric #89066

Merged

pav-kv mentioned this issue Oct 3, 2022

kvserver: rm consistency check diff report in tests #89207

Merged

pav-kv mentioned this issue Oct 5, 2022

roachpb: rm unused RaftSnapshotData.LogEntries #89376

Merged

This was referenced Oct 6, 2022

cli: introduce the extract-range tool #89489

Closed

kvserver: do not report diff in consistency checks #89502

Merged

This was referenced Oct 12, 2022

kvserver: remove legacy snapshot and diff code #89813

Merged

kvserver: improve consistency check fail message #89899

Merged

exalate-issue-sync bot closed this as completed Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: remove consistency checker diffs #21128

kvserver: remove consistency checker diffs #21128

tbg commented Dec 29, 2017 •

edited by cockroach-jira-scripts

Loading

petermattis commented Apr 13, 2020

tbg commented Apr 14, 2020

petermattis commented Apr 14, 2020

ajwerner commented Apr 14, 2020

tbg commented Apr 14, 2020

ajwerner commented Apr 14, 2020

petermattis commented Apr 14, 2020

erikgrinaker commented Sep 1, 2022

sumeerbhola commented Sep 6, 2022

pav-kv commented Sep 6, 2022

pav-kv commented Sep 6, 2022 •

edited

Loading

tbg commented Sep 7, 2022

pav-kv commented Sep 28, 2022

erikgrinaker commented Sep 28, 2022

joshimhoff commented Sep 28, 2022

erikgrinaker commented Sep 28, 2022 •

edited

Loading

erikgrinaker commented Sep 28, 2022 •

edited

Loading

joshimhoff commented Sep 28, 2022 •

edited

Loading

tbg commented Sep 29, 2022

erikgrinaker commented Sep 29, 2022

joshimhoff commented Sep 29, 2022

pav-kv commented Sep 29, 2022 •

edited

Loading

pav-kv commented Oct 5, 2022 •

edited

Loading

erikgrinaker commented Oct 5, 2022

sumeerbhola commented Oct 5, 2022

erikgrinaker commented Oct 5, 2022

kvserver: remove consistency checker diffs #21128

kvserver: remove consistency checker diffs #21128

Comments

tbg commented Dec 29, 2017 • edited by cockroach-jira-scripts Loading

petermattis commented Apr 13, 2020

tbg commented Apr 14, 2020

petermattis commented Apr 14, 2020

ajwerner commented Apr 14, 2020

tbg commented Apr 14, 2020

ajwerner commented Apr 14, 2020

petermattis commented Apr 14, 2020

erikgrinaker commented Sep 1, 2022

sumeerbhola commented Sep 6, 2022

pav-kv commented Sep 6, 2022

pav-kv commented Sep 6, 2022 • edited Loading

tbg commented Sep 7, 2022

pav-kv commented Sep 28, 2022

erikgrinaker commented Sep 28, 2022

joshimhoff commented Sep 28, 2022

erikgrinaker commented Sep 28, 2022 • edited Loading

erikgrinaker commented Sep 28, 2022 • edited Loading

joshimhoff commented Sep 28, 2022 • edited Loading

tbg commented Sep 29, 2022

erikgrinaker commented Sep 29, 2022

joshimhoff commented Sep 29, 2022

pav-kv commented Sep 29, 2022 • edited Loading

pav-kv commented Oct 5, 2022 • edited Loading

erikgrinaker commented Oct 5, 2022

sumeerbhola commented Oct 5, 2022

erikgrinaker commented Oct 5, 2022

tbg commented Dec 29, 2017 •

edited by cockroach-jira-scripts

Loading

pav-kv commented Sep 6, 2022 •

edited

Loading

erikgrinaker commented Sep 28, 2022 •

edited

Loading

erikgrinaker commented Sep 28, 2022 •

edited

Loading

joshimhoff commented Sep 28, 2022 •

edited

Loading

pav-kv commented Sep 29, 2022 •

edited

Loading

pav-kv commented Oct 5, 2022 •

edited

Loading