Skip to content

Commit

Permalink
sql: refrain from performing invalid splits during tenant creation
Browse files Browse the repository at this point in the history
Normally, `MakeTenantPrefix(ID).PrefixEnd()` is the same as
`MakeTenantPrefix(ID + 1)`. However, this isn't true if the ID
corresponds to one of the boundary points in `EncodeUvarintAscending`.
Prior to this patch, various places in the code did not appreciate this
subtlety. This proved problematic when we started creating split points
at `MakeTenantPrefix(ID).PrefixEnd()`. Now, if the tenant ID was equal
to one of these boundary points (e.g. 109), decoding the key
`MakeTenantPrefix(ID).PrefixEnd()` does not return the expected tenant
ID (in our case, 110). Instead, it results in an error.

Worse yet, various places in KV assume doing such tenant decoding at a
range's start key is kosher. We've since disallowed KV from accepting
such splits in cockroachdb#104802.
This patch fixes the root cause of why these splits were being issued
in the first place -- over in
cockroachdb@ac54eba,
we started creating spilt points at the end of a tenant's keyspace
(in addition to the start of it). We did so using `PrefixEnd()`, which
we have since learned is not what we wanted here.

In addition to the initial split point, we also fix the span config
records we seed during tenant creation. We've also added a test to
ensure all split points created during cluster creation are kosher.
The test uses randomized tenant IDs -- I confirmed that it fails with
tenant ID = 109 (without my changes).

Lastly, there's a bit more auditing work that needs to be done here
about these assumptions. That's captured in a followup issue
cockroachdb#104928.

Prevents cockroachdb#104606 from
happening.

Release note (bug fix): Fixes a bug where tenant creation for certain
IDs would always fail because of invalid split points. Additionally,
such tenant creation failure could leave the host cluster's span
config state entirely busted -- we prevent that as well.
  • Loading branch information
arulajmani committed Jun 14, 2023
1 parent 3441110 commit 42fc585
Show file tree
Hide file tree
Showing 8 changed files with 172 additions and 29 deletions.
15 changes: 13 additions & 2 deletions pkg/keys/sql.go
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,11 @@ type SQLCodec struct {
// value of a sqlEncoder will panic.
type sqlEncoder struct {
buf *roachpb.Key
// endKey is the end key of the tenant, in the context of which the sqlEncoder
// exists, that it can access. It is set to /Max for the system tenant. We
// compute it once, and store it here, instead of having to do so every time
// we need access to it.
endKey *roachpb.Key
}

// sqlDecoder implements the decoding logic for SQL keys.
Expand All @@ -144,9 +149,14 @@ type sqlDecoder struct {
// MakeSQLCodec creates a new SQLCodec suitable for manipulating SQL keys.
func MakeSQLCodec(tenID roachpb.TenantID) SQLCodec {
k := MakeTenantPrefix(tenID)
ek := MaxKey
if !tenID.IsSystem() {
// NB: We use the next tenant's (inclusive) start key as the end key.
ek = MakeTenantPrefix(roachpb.MustMakeTenantID(tenID.ToUint64() + 1))
}
k = k[:len(k):len(k)] // bound capacity, avoid aliasing
return SQLCodec{
sqlEncoder: sqlEncoder{&k},
sqlEncoder: sqlEncoder{&k, &ek},
sqlDecoder: sqlDecoder{&k},
}
}
Expand All @@ -167,7 +177,8 @@ func (e sqlEncoder) TenantPrefix() roachpb.Key {
// TenantSpan returns a span representing the tenant's keyspace.
func (e sqlEncoder) TenantSpan() roachpb.Span {
key := *e.buf
return roachpb.Span{Key: key, EndKey: key.PrefixEnd()}
endKey := *e.endKey
return roachpb.Span{Key: key, EndKey: endKey}
}

// TablePrefix returns the key prefix used for the table's data.
Expand Down
2 changes: 2 additions & 0 deletions pkg/rpc/auth_tenant.go
Original file line number Diff line number Diff line change
Expand Up @@ -482,6 +482,8 @@ func contextWithoutClientTenant(ctx context.Context) context.Context {

func tenantPrefix(tenID roachpb.TenantID) roachpb.RSpan {
// TODO(nvanbenschoten): consider caching this span.
// TODO(multitenant): Requests which use codec.TenantSpan() might run into
// this check. See https://github.com/cockroachdb/cockroach/issues/104928.
prefix := roachpb.RKey(keys.MakeTenantPrefix(tenID))
return roachpb.RSpan{
Key: prefix,
Expand Down
15 changes: 14 additions & 1 deletion pkg/sql/catalog/bootstrap/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,30 @@ go_library(

go_test(
name = "bootstrap_test",
srcs = ["bootstrap_test.go"],
srcs = [
"bootstrap_test.go",
"main_test.go",
"splits_test.go",
],
args = ["-test.timeout=295s"],
data = glob(["testdata/**"]),
embed = [":bootstrap"],
deps = [
"//pkg/base",
"//pkg/clusterversion",
"//pkg/config/zonepb",
"//pkg/keys",
"//pkg/roachpb",
"//pkg/security/securityassets",
"//pkg/security/securitytest",
"//pkg/server",
"//pkg/testutils",
"//pkg/testutils/datapathutils",
"//pkg/testutils/serverutils",
"//pkg/testutils/testcluster",
"//pkg/util/leaktest",
"//pkg/util/log",
"//pkg/util/randutil",
"@com_github_cockroachdb_datadriven//:datadriven",
"@com_github_stretchr_testify//require",
],
Expand Down
9 changes: 9 additions & 0 deletions pkg/sql/catalog/bootstrap/bootstrap_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/testutils/datapathutils"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/datadriven"
"github.com/stretchr/testify/require"
)
Expand All @@ -35,6 +36,9 @@ import (
// These can be obtained from the test output file for the data-driven
// TestInitialValuesToString test in the corresponding release branch.
func TestSupportedReleases(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

expected := make(map[roachpb.Version]struct{})
earliest := clusterversion.ByKey(clusterversion.BinaryMinSupportedVersionKey)
latest := clusterversion.ByKey(clusterversion.BinaryVersionKey)
Expand Down Expand Up @@ -72,6 +76,8 @@ func TestSupportedReleases(t *testing.T) {

func TestInitialValuesToString(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

datadriven.Walk(t, datapathutils.TestDataPath(t), func(t *testing.T, path string) {
datadriven.RunTest(t, path, func(t *testing.T, d *datadriven.TestData) string {
codec := keys.SystemSQLCodec
Expand Down Expand Up @@ -109,6 +115,9 @@ schema has changed. Assuming that this is expected:
}

func TestRoundTripInitialValuesStringRepresentation(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

t.Run("system", func(t *testing.T) {
roundTripInitialValuesStringRepresentation(t, 0 /* tenantID */)
})
Expand Down
34 changes: 34 additions & 0 deletions pkg/sql/catalog/bootstrap/main_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// Copyright 2023 The Cockroach Authors.
//
// Use of this software is governed by the Business Source License
// included in the file licenses/BSL.txt.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

package bootstrap_test

import (
"os"
"testing"

"github.com/cockroachdb/cockroach/pkg/security/securityassets"
"github.com/cockroachdb/cockroach/pkg/security/securitytest"
"github.com/cockroachdb/cockroach/pkg/server"
"github.com/cockroachdb/cockroach/pkg/testutils/serverutils"
"github.com/cockroachdb/cockroach/pkg/testutils/testcluster"
"github.com/cockroachdb/cockroach/pkg/util/randutil"
)

//go:generate ../../../util/leaktest/add-leaktest.sh *_test.go

func TestMain(m *testing.M) {
securityassets.SetLoader(securitytest.EmbeddedAssets)
randutil.SeedForTests()
serverutils.InitTestServerFactory(server.TestServerFactory)
serverutils.InitTestClusterFactory(testcluster.TestClusterFactory)

os.Exit(m.Run())
}
5 changes: 3 additions & 2 deletions pkg/sql/catalog/bootstrap/metadata.go
Original file line number Diff line number Diff line change
Expand Up @@ -239,8 +239,9 @@ func (ms MetadataSchema) GetInitialValues() ([]roachpb.KeyValue, []roachpb.RKey)
splits = append(splits, roachpb.RKey(ms.codec.TablePrefix(id)))
}
} else {
tenantStartKey := roachpb.RKey(ms.codec.TenantPrefix())
tenantEndKey := tenantStartKey.PrefixEnd()
tenantSpan := ms.codec.TenantSpan()
tenantStartKey := roachpb.RKey(tenantSpan.Key)
tenantEndKey := roachpb.RKey(tenantSpan.EndKey)
splits = []roachpb.RKey{tenantStartKey, tenantEndKey}
}

Expand Down
71 changes: 71 additions & 0 deletions pkg/sql/catalog/bootstrap/splits_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
// Copyright 2023 The Cockroach Authors.
//
// Use of this software is governed by the Business Source License
// included in the file licenses/BSL.txt.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

package bootstrap_test

import (
"context"
"math/rand"
"testing"

"github.com/cockroachdb/cockroach/pkg/base"
"github.com/cockroachdb/cockroach/pkg/config/zonepb"
"github.com/cockroachdb/cockroach/pkg/keys"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/sql/catalog/bootstrap"
"github.com/cockroachdb/cockroach/pkg/testutils"
"github.com/cockroachdb/cockroach/pkg/testutils/testcluster"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/randutil"
"github.com/stretchr/testify/require"
)

// TestInitialSplitPoints ensures that any split points that are generated
// during cluster creation are valid split points. The test runs for both the
// system tenant and secondary tenants; the ID of the latter is generated at
// random -- see https://github.com/cockroachdb/cockroach/issues/104928 for why
// this is interesting.
func TestInitialSplitPoints(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

testutils.RunTrueAndFalse(t, "tenant", func(t *testing.T, isSystem bool) {
codec := keys.SystemSQLCodec
if !isSystem {
seed := randutil.NewPseudoSeed()
t.Logf("seed is %d", seed)
rng := rand.New(rand.NewSource(seed))
tenID := rng.Uint64()
t.Logf("tenant ID is %d", tenID)
codec = keys.MakeSQLCodec(roachpb.MustMakeTenantID(tenID))
}
opts := bootstrap.InitialValuesOpts{
DefaultZoneConfig: zonepb.DefaultZoneConfigRef(),
DefaultSystemZoneConfig: zonepb.DefaultZoneConfigRef(),
OverrideKey: 0, // most recent
Codec: codec,
}

ctx := context.Background()
tc := testcluster.StartTestCluster(t, 1, base.TestClusterArgs{
ReplicationMode: base.ReplicationManual,
})
defer tc.Stopper().Stop(ctx)

_, splits, err := opts.GenerateInitialValues()
require.NoError(t, err)

for _, split := range splits {
_, _, err = tc.SplitRange(roachpb.Key(split))
require.NoError(t, err)
}
})
}
50 changes: 26 additions & 24 deletions pkg/sql/tenant_creation.go
Original file line number Diff line number Diff line change
Expand Up @@ -462,33 +462,35 @@ func CreateTenantRecord(
}
toUpsert := []spanconfig.Record{startRecord}

// We want to ensure we have a split at the non-inclusive end of the tenant's
// keyspace, which also happens to be the inclusive start of the next
// tenant's (with ID=ours+1). If we didn't do anything here, and we were the
// tenant with the highest ID thus far, our last range would extend to /Max.
// If a tenant with a higher ID was created, when installing its initial span
// config record, it would carve itself off (and our last range would only
// extend to that next tenant's boundary), but this is a bit awkward for code
// that wants to rely on the invariant that ranges for a tenant only extend
// to the tenant's well-defined end key.
// We want to ensure we have a split at the start of the next tenant's
// (with ID=ours+1) keyspace. This ensures our ranges do not straddle tenant
// boundaries, into the next one.
// We want to ensure our ranges do not straddle tenant boundaries, either into
// the next one or the previous one. Tenant's creation installs a span
// configuration at the start of a tenant's keyspace, above, so that handles
// the latter. The former only needs handling if the tenant we're creating
// here has the highest ID thus far. Such a tenant's range would extend until
// /Max. We've deemed this edge case undesirable, so we need to do something
// here. In particular, we choose to install a split point at the end of the
// tenant's keyspace. We do so by installing a span config record, the start
// key of which will serve as a split point.
//
// So how do we ensure this split at this tenant's non-inclusive end key?
// Hard splits are controlled by the start keys on span config records[^1],
// so we'll try insert one accordingly. But we cannot do this blindly. We
// cannot assume that tenants are created in ID order, so it's possible that
// the tenant with the next ID is already present + running. If so, it may
// already have span config records that start at the key at which we want
// to write a span config record for. Over-writing it blindly would be a
// mistake -- there's no telling what config that next tenant has associated
// for that span. So we'll do something simple -- we'll check transactionally
// whether there's anything written already, and if so, do nothing. We
// already have the split we need.
// Note that we need to be careful about which key to put in the record's
// start key here. The key needs to be such that it is safe to split at;
// otherwise. Notably, this makes tenantPrefix.PrefixEnd() an unsuitable
// candidate -- see https://github.com/cockroachdb/cockroach/issues/104928 for
// more context about why.
//
// [^1]: See ComputeSplitKey in spanconfig.StoreReader.
tenantPrefixEnd := tenantPrefix.PrefixEnd()
// Note that we're creating a span config record that controls the next
// tenant's keyspace here. If such a tenant exists, writing a span config
// record here blindly wouldn't be safe -- we could be clobbering something
// that tenant has already reconciled. We instead need to transactionally
// check if a record exists or not before writing one. If something already
// exists, we do nothing; otherwise, we write the record.
nextTenantPrefix := keys.MakeTenantPrefix(roachpb.MustMakeTenantID(tenID + 1))
endRecordTarget := spanconfig.MakeTargetFromSpan(roachpb.Span{
Key: tenantPrefixEnd,
EndKey: tenantPrefixEnd.Next(),
Key: nextTenantPrefix,
EndKey: nextTenantPrefix.Next(),
})

// Check if a record exists for the next tenant's startKey from when the next
Expand Down

0 comments on commit 42fc585

Please sign in to comment.