Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
92466: settings,spanconfig: introduce a protobuf setting type r=irfansharif a=irfansharif

For this setting type:
- the `protoutil.Message` is held in memory,
- the byte representation is stored in `system.settings`, and
- the json representation is used when accepting input and rendering state (through `SHOW CLUSTER SETTING <setting-name>`, the raw form is visible when looking directly at `system.settings`)

We also use this setting type to support power a `spanconfig.store.fallback_config_override`, which overrides the fallback config used for ranges with no explicit span configs set. Previously we used a hardcoded value -- this makes it a bit more configurable. This is a partial and backportable workaround (read: hack) for #91238 and \#91239. In an internal escalation we observed "orphaned" ranges from dropped tables that were not being being referenced by span configs (by virtue of them originating from now-dropped tables/configs). Typically ranges of this sort are short-lived, they're emptied out through GC and merged into LHS neighbors. But if the neighboring ranges are large enough, or load just high enough, the merge does not occur. For such orphaned ranges we were using a hardcoded "fallback config", with a replication factor of three. This made for confusing semantics where if `RANGE DEFAULT` was configured to have a replication factor of five, our replication reports indicated there were under-replicated ranges. This is partly because replication reports today are powered by zone configs (thus looking at `RANGE DEFAULT`) -- this will change shortly as part of \#89987 where we'll instead consider span config data. In any case, we were warning users of under-replicated ranges but within KV we were not taking any action to upreplicate them -- KV was respecting the hard-coded fallback config. The issues above describe that we should apply each tenant's `RANGE DEFAULT` config to all such orphaned ranges, which is probably the right fix. This was alluded to in an earlier TODO but is still left for future work.

```go
  // TODO(irfansharif): We're using a static[1] fallback span config
  // here, we could instead have this directly track the host tenant's
  // RANGE DEFAULT config, or go a step further and use the tenant's own
  // RANGE DEFAULT instead if the key is within the tenant's keyspace.
  // We'd have to thread that through the KVAccessor interface by
  // reserving special keys for these default configs.
  //
  // [1]: Modulo the private spanconfig.store.fallback_config_override, which
  //      applies globally.
```

So this PR instead takes a shortcut -- it makes the static config configurable through a cluster setting. We can now do the following which alters what fallback config is applied to orphaned ranges, and in our example above, force such ranges to also have a replication factor of five.

```sql
  SET CLUSTER SETTING spanconfig.store.fallback_config_override = '
    {
      "gcPolicy": {"ttlSeconds": 3600},
      "numReplicas": 5,
      "rangeMaxBytes": "536870912",
      "rangeMinBytes": "134217728"
    }';
```

Release note: None


92585: roachtest: reduce frequency of index-overload test r=irfansharif a=irfansharif

Release note: None

92586: jobs: refresh job details before removing protected timestamps r=fqazi a=fqazi

Fixes: #92493

Previously, we added the protected timestamps manager into the jobs frame work, which made it easier to automatically add and remove protected timestamps for jobs. Unfortunately, the Unprotect API when invoked from a clean up function never properly refreshed the job. Which could lead to a race condition trying to remove protected timestamps for schema changes. To address this, the Unprotect function will take advantage of the job update function to confirm that a refreshed copy does have protected timestamps set.

Release note: None

92593: logictest: avoid a lint failure r=andreimatei a=knz

Fixes #92592

The go compiler treats calls to `bazel.BuiltWithBazel()` as a
compile-time constant. Therefore,

```go
if !bazel.BuiltWithBazel() {
   skip.XXX()
}
```

is considered to always terminate execution (because `skip` does its
job by raising a panic), and any code coming after that is treated as
dead/unreachable.

92599: multitenant: merge serverpb.RegionsServer into serverpb.TenantStatusServer r=knz a=ecwall

Fixes #92598

Merge these 2 interfaces in anticipation for future work where more tenant functionality will be added to TenantStatusServer instead of creating an interface per function.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
  • Loading branch information
5 people committed Nov 28, 2022
6 parents 3d28ffb + b0cfdc9 + 815aa8f + 02867ed + 8782424 + aff028a commit 1bc257b
Show file tree
Hide file tree
Showing 22 changed files with 386 additions and 69 deletions.
7 changes: 1 addition & 6 deletions pkg/ccl/kvccl/kvtenantccl/connector.go
Original file line number Diff line number Diff line change
Expand Up @@ -116,11 +116,6 @@ var _ rangecache.RangeDescriptorDB = (*Connector)(nil)
// network.
var _ config.SystemConfigProvider = (*Connector)(nil)

// Connector is capable of find the region of every node in the cluster.
// This is necessary for region validation for zone configurations and
// multi-region primitives.
var _ serverpb.RegionsServer = (*Connector)(nil)

// Connector is capable of finding debug information about the current
// tenant within the cluster. This is necessary for things such as
// debug zip and range reports.
Expand Down Expand Up @@ -415,7 +410,7 @@ func (c *Connector) RangeLookup(
return nil, nil, ctx.Err()
}

// Regions implements the serverpb.RegionsServer interface.
// Regions implements the serverpb.TenantStatusServer interface.
func (c *Connector) Regions(
ctx context.Context, req *serverpb.RegionsRequest,
) (resp *serverpb.RegionsResponse, _ error) {
Expand Down
9 changes: 3 additions & 6 deletions pkg/cmd/roachtest/tests/admission_control_index_overload.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,9 @@ import (
// queries, but the intent is to measure the impact of the index creation.
func registerIndexOverload(r registry.Registry) {
r.Add(registry.TestSpec{
Name: "admission-control/index-overload",
Owner: registry.OwnerAdmissionControl,
// TODO(baptist): After two weeks of nightly baking time, reduce
// this to a weekly cadence. This is a long-running test and serves only
// as a coarse-grained benchmark.
// Tags: []string{`weekly`},
Name: "admission-control/index-overload",
Owner: registry.OwnerAdmissionControl,
Tags: []string{`weekly`},
Cluster: r.MakeClusterSpec(4, spec.CPU(8)),
Run: func(ctx context.Context, t test.Test, c cluster.Cluster) {
crdbNodes := c.Spec().NodeCount - 1
Expand Down
27 changes: 16 additions & 11 deletions pkg/jobs/jobsprotectedts/jobs_protected_ts_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,7 @@ func (p *Manager) Protect(
return nil, err
}
return func(ctx context.Context) error {
// Remove the protected timestamp.
return p.Unprotect(ctx, job)
}, nil
}
Expand All @@ -201,19 +202,23 @@ func (p *Manager) Protect(
// record. Note: This should only be used for job cleanup if is not currently,
// executing.
func (p *Manager) Unprotect(ctx context.Context, job *jobs.Job) error {
return p.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
// Fetch the protected timestamp UUID from the job, if one exists.
protectedtsID := getProtectedTSOnJob(job.Details())
// Fetch the protected timestamp UUID from the job, if one exists.
if getProtectedTSOnJob(job.Details()) == nil {
return nil
}
// If we do find one then we need to clean up the protected timestamp,
// and remove it from the job.
return job.Update(ctx, nil, func(txn *kv.Txn, md jobs.JobMetadata, ju *jobs.JobUpdater) error {
// The job will get refreshed, so check one more time the protected
// timestamp still exists. The callback returned from Protect works
// on a previously cached copy.
protectedtsID := getProtectedTSOnJob(md.Payload.UnwrapDetails())
if protectedtsID == nil {
return nil
}
// If we do find one then we need to clean up the protected timestamp,
// and remove it from the job.
return job.Update(ctx, txn, func(txn *kv.Txn, md jobs.JobMetadata, ju *jobs.JobUpdater) error {
updatedDetails := setProtectedTSOnJob(job.Details(), nil)
md.Payload.Details = jobspb.WrapPayloadDetails(updatedDetails)
ju.UpdatePayload(md.Payload)
return p.protectedTSProvider.Release(ctx, txn, *protectedtsID)
})
updatedDetails := setProtectedTSOnJob(job.Details(), nil)
md.Payload.Details = jobspb.WrapPayloadDetails(updatedDetails)
ju.UpdatePayload(md.Payload)
return p.protectedTSProvider.Release(ctx, txn, *protectedtsID)
})
}
7 changes: 1 addition & 6 deletions pkg/kv/kvclient/kvtenant/connector.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,12 +58,7 @@ type Connector interface {
// (e.g. is the Range being requested owned by the requesting tenant?).
rangecache.RangeDescriptorDB

// RegionsServer provides access to a tenant's available regions. This is
// necessary for region validation for zone configurations and multi-region
// primitives.
serverpb.RegionsServer

// TenantStatusServer is the subset of the serverpb.StatusInterface that is
// TenantStatusServer is the subset of the serverpb.StatusServer that is
// used by the SQL system to query for debug information, such as tenant-specific
// range reports.
serverpb.TenantStatusServer
Expand Down
52 changes: 52 additions & 0 deletions pkg/kv/kvserver/client_spanconfigs_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,58 @@ func TestSpanConfigUpdateAppliedToReplica(t *testing.T) {
})
}

// TestFallbackSpanConfigOverride ensures that
// spanconfig.store.fallback_config_override works as expected.
func TestFallbackSpanConfigOverride(t *testing.T) {
defer leaktest.AfterTest(t)()

st := cluster.MakeTestingClusterSettings()
spanConfigStore := spanconfigstore.New(roachpb.TestingDefaultSpanConfig(), st, nil)
mockSubscriber := newMockSpanConfigSubscriber(spanConfigStore)

ctx := context.Background()
args := base.TestServerArgs{
Knobs: base.TestingKnobs{
Store: &kvserver.StoreTestingKnobs{
DisableMergeQueue: true,
DisableSplitQueue: true,
DisableGCQueue: true,
},
SpanConfig: &spanconfig.TestingKnobs{
StoreKVSubscriberOverride: mockSubscriber,
},
},
}
s, _, _ := serverutils.StartServer(t, args)
defer s.Stopper().Stop(context.Background())

_, err := s.InternalExecutor().(sqlutil.InternalExecutor).ExecEx(ctx, "inline-exec", nil,
sessiondata.InternalExecutorOverride{User: username.RootUserName()},
`SET CLUSTER SETTING spanconfig.store.enabled = true`)
require.NoError(t, err)

key, err := s.ScratchRange()
require.NoError(t, err)
store, err := s.GetStores().(*kvserver.Stores).GetStore(s.GetFirstStoreID())
require.NoError(t, err)
repl := store.LookupReplica(keys.MustAddr(key))
span := repl.Desc().RSpan().AsRawSpanWithNoLocals()

conf := roachpb.SpanConfig{NumReplicas: 5, NumVoters: 3}
spanconfigstore.FallbackConfigOverride.Override(ctx, &st.SV, &conf)

require.NotNil(t, mockSubscriber.callback)
mockSubscriber.callback(ctx, span) // invoke the callback
testutils.SucceedsSoon(t, func() error {
repl := store.LookupReplica(keys.MustAddr(key))
gotConfig := repl.SpanConfig()
if !gotConfig.Equal(conf) {
return errors.Newf("expected config=%s, got config=%s", conf.String(), gotConfig.String())
}
return nil
})
}

type mockSpanConfigSubscriber struct {
callback func(ctx context.Context, config roachpb.Span)
spanconfig.Store
Expand Down
5 changes: 5 additions & 0 deletions pkg/kv/kvserver/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -2129,6 +2129,11 @@ func (s *Store) Start(ctx context.Context, stopper *stop.Stopper) error {
}
}
})

// We also want to do it when the fallback config setting is changed.
spanconfigstore.FallbackConfigOverride.SetOnChange(&s.ClusterSettings().SV, func(ctx context.Context) {
s.applyAllFromSpanConfigStore(ctx)
})
}

if !s.cfg.TestingKnobs.DisableAutomaticLeaseRenewal {
Expand Down
1 change: 0 additions & 1 deletion pkg/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -896,7 +896,6 @@ func NewServer(cfg Config, stopper *stop.Stopper) (*Server, error) {
protectedtsProvider: protectedtsProvider,
rangeFeedFactory: rangeFeedFactory,
sqlStatusServer: sStatus,
regionsServer: sStatus,
tenantStatusServer: sStatus,
tenantUsageServer: tenantUsage,
monitorAndMetrics: sqlMonitorAndMetrics,
Expand Down
4 changes: 0 additions & 4 deletions pkg/server/server_sql.go
Original file line number Diff line number Diff line change
Expand Up @@ -337,9 +337,6 @@ type sqlServerArgs struct {
// Used to watch settings and descriptor changes.
rangeFeedFactory *rangefeed.Factory

// Used to query valid regions on the server.
regionsServer serverpb.RegionsServer

// Used to query status information useful for debugging on the server.
tenantStatusServer serverpb.TenantStatusServer

Expand Down Expand Up @@ -844,7 +841,6 @@ func newSQLServer(ctx context.Context, cfg sqlServerArgs) (*SQLServer, error) {
DistSQLSrv: distSQLServer,
NodesStatusServer: cfg.nodesStatusServer,
SQLStatusServer: cfg.sqlStatusServer,
RegionsServer: cfg.regionsServer,
SessionRegistry: cfg.sessionRegistry,
ClosedSessionCache: cfg.closedSessionCache,
ContentionRegistry: contentionRegistry,
Expand Down
12 changes: 3 additions & 9 deletions pkg/server/serverpb/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/util/errorutil"
)

// SQLStatusServer is a smaller version of the serverpb.StatusInterface which
// SQLStatusServer is a smaller version of the serverpb.StatusServer which
// includes only the methods used by the SQL subsystem.
type SQLStatusServer interface {
ListSessions(context.Context, *ListSessionsRequest) (*ListSessionsResponse, error)
Expand Down Expand Up @@ -71,20 +71,14 @@ type NodesStatusServer interface {
ListNodesInternal(context.Context, *NodesRequest) (*NodesResponse, error)
}

// RegionsServer is the subset of the serverpb.StatusInterface that is used
// by the SQL system to query for available regions.
// It is available for tenants.
type RegionsServer interface {
Regions(context.Context, *RegionsRequest) (*RegionsResponse, error)
}

// TenantStatusServer is the subset of the serverpb.StatusInterface that is
// TenantStatusServer is the subset of the serverpb.StatusServer that is
// used by tenants to query for debug information, such as tenant-specific
// range reports.
//
// It is available for all tenants.
type TenantStatusServer interface {
TenantRanges(context.Context, *TenantRangesRequest) (*TenantRangesResponse, error)
Regions(context.Context, *RegionsRequest) (*RegionsResponse, error)
}

// OptionalNodesStatusServer returns the wrapped NodesStatusServer, if it is
Expand Down
2 changes: 1 addition & 1 deletion pkg/server/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -1374,7 +1374,7 @@ func (s *statusServer) Profile(
return profileLocal(ctx, req, s.st)
}

// Regions implements the serverpb.Status interface.
// Regions implements the serverpb.StatusServer interface.
func (s *statusServer) Regions(
ctx context.Context, req *serverpb.RegionsRequest,
) (*serverpb.RegionsResponse, error) {
Expand Down
1 change: 0 additions & 1 deletion pkg/server/tenant.go
Original file line number Diff line number Diff line change
Expand Up @@ -909,7 +909,6 @@ func makeTenantSQLServerArgs(
circularJobRegistry: circularJobRegistry,
protectedtsProvider: protectedTSProvider,
rangeFeedFactory: rangeFeedFactory,
regionsServer: tenantConnect,
tenantStatusServer: tenantConnect,
costController: costController,
monitorAndMetrics: monitorAndMetrics,
Expand Down
4 changes: 4 additions & 0 deletions pkg/settings/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ go_library(
"float.go",
"int.go",
"masked.go",
"protobuf.go",
"registry.go",
"setting.go",
"string.go",
Expand All @@ -29,10 +30,13 @@ go_library(
deps = [
"//pkg/util/buildutil",
"//pkg/util/humanizeutil",
"//pkg/util/protoutil",
"//pkg/util/syncutil",
"@com_github_cockroachdb_errors//:errors",
"@com_github_cockroachdb_redact//:redact",
"@com_github_cockroachdb_redact//interfaces",
"@com_github_gogo_protobuf//jsonpb",
"@com_github_gogo_protobuf//proto",
],
)

Expand Down
Loading

0 comments on commit 1bc257b

Please sign in to comment.