Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
102842: sql: do not purge range cache when using follower reads r=yuzefovich a=yuzefovich

Previously, when we used the closest replica oracle and happened to pick a follower for a distributed plan, the TableReaders would still produce "misplanned ranges" metadata. This would then lead to purging the range cache of supposedly stale entries on the gateway even though the entries weren't stale. This is now fixed by setting `ignoreMisplannedRanges` flag on the TableReader spec.

One of the approaches to solve this issue mentioned propagating a set of `ClientRangeInfo`s as part of the TableReader spec, but that seems more complicated and much more involved than this one, so the simpler approach was chosen. In particular, for that idea to work we'd need to include all Infos for all ranges intersecting with the spans to read (which can be non-trivial in size) as well as to add code on the TableReader side to compare the gateway's `ClientRangeInfo` with the range cache of the node where the TableReader was placed.

Fixes: #61313.

Release note: None

104303: roachtest: update CI to enable metamorphic arm64 and fips r=smg260 a=srosenberg

Support for metamorphic arm64 and fips roachtests has been added in [1]. This change refactors the CI wrapper script to add the missing arm64 and fips builds for the required binaries and libs.

In addition, we change roachtest's CLI arg. `--cockroach-short` to `cockroach-ea` which better captures the fact that this binary is compiled with _enabled assertions_, i.e., crdb_test using crdb_test build tag. The fact that it's a "short" variant, i.e., without UI, is immaterial at this time. In future, we might explore roachtests which involve the UI, in which case `--cockroach-ea` might be replaced by the ("long") crdb binary.

[1] #103710

Epic: none

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Stan Rosenberg <[email protected]>
  • Loading branch information
3 people committed Jun 6, 2023
3 parents 861bca3 + f93f473 + 1e3101e commit d660d51
Show file tree
Hide file tree
Showing 23 changed files with 340 additions and 169 deletions.
1 change: 0 additions & 1 deletion build/teamcity-roachtest-invoke.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ set +e
# passed by the caller, the passed one takes precedence.
bin/roachtest run \
--teamcity \
--workload="${PWD}/bin/workload" \
--os-volume-size=32 \
"$@"
code=$?
Expand Down
82 changes: 59 additions & 23 deletions build/teamcity/cockroach/nightlies/roachtest_compile_bits.sh
Original file line number Diff line number Diff line change
@@ -1,27 +1,63 @@
#!/usr/bin/env bash

# Builds all bits needed for roachtests, stages them in bin/ and lib.docker_amd64/.
platform=${1:-crosslinux}

bazel build --config $platform --config ci --config with_ui -c opt --config force_build_cdeps \
//pkg/cmd/cockroach //pkg/cmd/workload //pkg/cmd/roachtest \
//c-deps:libgeos
bazel build --config $platform --config ci -c opt //pkg/cmd/cockroach-short --crdb_test
BAZEL_BIN=$(bazel info bazel-bin --config $platform --config ci --config with_ui -c opt)
# Move this stuff to bin for simplicity.
# Builds all bits needed for roachtests, stages them in bin/ and lib/.
cross_builds=(crosslinux crosslinuxarm crosslinuxfips)

# Prepare the bin/ and lib/ directories.
mkdir -p bin
chmod o+rwx bin
cp $BAZEL_BIN/pkg/cmd/cockroach/cockroach_/cockroach bin
# Copy the binary with enabled assertions while adding "-ea" suffix (which
# stands for "enabled assertions").
cp $BAZEL_BIN/pkg/cmd/cockroach-short/cockroach-short_/cockroach-short bin/cockroach-short-ea
cp $BAZEL_BIN/pkg/cmd/roachtest/roachtest_/roachtest bin
cp $BAZEL_BIN/pkg/cmd/workload/workload_/workload bin
chmod a+w bin/cockroach bin/cockroach-short-ea bin/roachtest bin/workload
# Stage the geos libs in the appropriate spot.
mkdir -p lib.docker_amd64
chmod o+rwx lib.docker_amd64
cp $BAZEL_BIN/c-deps/libgeos_foreign/lib/libgeos.so lib.docker_amd64
cp $BAZEL_BIN/c-deps/libgeos_foreign/lib/libgeos_c.so lib.docker_amd64
chmod a+w lib.docker_amd64/libgeos.so lib.docker_amd64/libgeos_c.so
ln -s lib.docker_amd64 lib
mkdir -p lib
chmod o+rwx lib

for platform in "${cross_builds[@]}"; do
if [[ $platform == crosslinux ]]; then
os=linux
arch=amd64
elif [[ $platform == crosslinuxarm ]]; then
os=linux
arch=arm64
elif [[ $platform == crosslinuxfips ]]; then
os=linux
arch=amd64-fips
else
echo "unknown or unsupported platform $platform"
exit 1
fi

echo "Building $platform, os=$os, arch=$arch..."
# Build cockroach, workload and geos libs.
bazel build --config $platform --config ci --config with_ui -c opt --config force_build_cdeps \
//pkg/cmd/cockroach //pkg/cmd/workload \
//c-deps:libgeos
# N.B. roachtest is currently executed only on a TC agent running linux/amd64, so we build it once.
if [[ $os == "linux" && $arch == "amd64" ]]; then
bazel build --config $platform --config ci -c opt //pkg/cmd/roachtest
fi
# Build cockroach-short with assertions enabled.
bazel build --config $platform --config ci -c opt //pkg/cmd/cockroach-short --crdb_test
# Copy the binaries.
BAZEL_BIN=$(bazel info bazel-bin --config $platform --config ci --config with_ui -c opt)
# N.B. currently, we support only one roachtest binary, so elide the suffix.
if [[ $os == "linux" && $arch == "amd64" ]]; then
cp $BAZEL_BIN/pkg/cmd/roachtest/roachtest_/roachtest bin/roachtest
# Make it writable to simplify cleanup and copying (e.g., scp retry).
chmod a+w bin/roachtest
fi
cp $BAZEL_BIN/pkg/cmd/cockroach/cockroach_/cockroach bin/cockroach.$os-$arch
cp $BAZEL_BIN/pkg/cmd/workload/workload_/workload bin/workload.$os-$arch
# N.B. "-ea" suffix stands for enabled-assertions.
cp $BAZEL_BIN/pkg/cmd/cockroach-short/cockroach-short_/cockroach-short bin/cockroach-ea.$os-$arch
# Make it writable to simplify cleanup and copying (e.g., scp retry).
chmod a+w bin/cockroach.$os-$arch bin/workload.$os-$arch bin/cockroach-ea.$os-$arch

# Copy geos libs.
cp $BAZEL_BIN/c-deps/libgeos_foreign/lib/libgeos.so lib/libgeos.$os-$arch.so
cp $BAZEL_BIN/c-deps/libgeos_foreign/lib/libgeos_c.so lib/libgeos_c.$os-$arch.so
# Make it writable to simplify cleanup and copying (e.g., scp retry).
chmod a+w lib/libgeos.$os-$arch.so lib/libgeos_c.$os-$arch.so

done

ls -l bin
ls -l lib

13 changes: 1 addition & 12 deletions build/teamcity/cockroach/nightlies/roachtest_nightly_impl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,7 @@ if [[ ! -f ~/.ssh/id_rsa.pub ]]; then
ssh-keygen -q -C "roachtest-nightly-bazel $(date)" -N "" -f ~/.ssh/id_rsa
fi

if [[ ${FIPS_ENABLED:-0} == 1 ]]; then
platform=crosslinuxfips
fips_flag="--fips"
else
platform=crosslinux
fips_flag=""
fi

source $root/build/teamcity/cockroach/nightlies/roachtest_compile_bits.sh $platform
source $root/build/teamcity/cockroach/nightlies/roachtest_compile_bits.sh

artifacts=/artifacts
source $root/build/teamcity/util/roachtest_util.sh
Expand All @@ -30,10 +22,7 @@ build/teamcity-roachtest-invoke.sh \
--parallelism="${PARALLELISM}" \
--cpu-quota="${CPUQUOTA}" \
--cluster-id="${TC_BUILD_ID}" \
--cockroach="${PWD}/bin/cockroach" \
--cockroach-short="${PWD}/bin/cockroach-short-ea" \
--artifacts=/artifacts \
--artifacts-literal="${LITERAL_ARTIFACTS_DIR:-}" \
--slack-token="${SLACK_TOKEN}" \
$fips_flag \
"${TESTS}" ${FILTER}
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ build/teamcity-roachtest-invoke.sh \
tag:aws-weekly \
--cloud="${CLOUD}" \
--cluster-id "${TC_BUILD_ID}" \
--cockroach "$PWD/bin/cockroach" \
--artifacts=/artifacts \
--artifacts-literal="${LITERAL_ARTIFACTS_DIR:-}" \
--slack-token="${SLACK_TOKEN}"
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ timeout -s INT $((7800*60)) build/teamcity-roachtest-invoke.sh \
tag:weekly \
--cluster-id "${TC_BUILD_ID}" \
--zones "us-central1-b,us-west1-b,europe-west2-b" \
--cockroach "$PWD/bin/cockroach" \
--artifacts=/artifacts \
--artifacts-literal="${LITERAL_ARTIFACTS_DIR:-}" \
--parallelism 5 \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@ build/teamcity-roachtest-invoke.sh \
--parallelism="${PARALLELISM}" \
--cpu-quota="${CPUQUOTA}" \
--cluster-id="${TC_BUILD_ID}" \
--cockroach="${PWD}/bin/cockroach" \
--cockroach-short="${PWD}/bin/cockroach-short-ea" \
--artifacts=/artifacts \
--artifacts-literal="${LITERAL_ARTIFACTS_DIR:-}" \
"${TESTS}"
2 changes: 1 addition & 1 deletion pkg/ccl/kvccl/kvfollowerreadsccl/followerreads.go
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ func (o *followerReadOracle) ChoosePreferredReplica(
leaseholder *roachpb.ReplicaDescriptor,
ctPolicy roachpb.RangeClosedTimestampPolicy,
queryState replicaoracle.QueryState,
) (roachpb.ReplicaDescriptor, error) {
) (_ roachpb.ReplicaDescriptor, ignoreMisplannedRanges bool, _ error) {
var oracle replicaoracle.Oracle
if o.useClosestOracle(txn, ctPolicy) {
oracle = o.closest
Expand Down
47 changes: 46 additions & 1 deletion pkg/ccl/kvccl/kvfollowerreadsccl/followerreads_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
"fmt"
"math"
"net/url"
"strings"
"testing"
"time"

Expand Down Expand Up @@ -679,7 +680,7 @@ func TestOracle(t *testing.T) {
Clock: clock,
})

res, err := o.ChoosePreferredReplica(ctx, c.txn, desc, c.lh, c.ctPolicy, replicaoracle.QueryState{})
res, _, err := o.ChoosePreferredReplica(ctx, c.txn, desc, c.lh, c.ctPolicy, replicaoracle.QueryState{})
require.NoError(t, err)
require.Equal(t, c.exp, res)
})
Expand All @@ -692,6 +693,7 @@ func TestOracle(t *testing.T) {
// encountering this situation, and then follower reads work.
func TestFollowerReadsWithStaleDescriptor(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

ctx := context.Background()
// The test uses follower_read_timestamp().
Expand Down Expand Up @@ -826,6 +828,49 @@ func TestFollowerReadsWithStaleDescriptor(t *testing.T) {
})
require.NoError(t, err)
require.Greater(t, followerReadsCountAfter, followerReadsCountBefore)

// Now verify that follower reads aren't mistakenly counted as "misplanned
// ranges" (#61313).

// First, run a query on n3 to populate its cache.
n3 := sqlutils.MakeSQLRunner(tc.Conns[2])
n3.Exec(t, "SELECT * from test WHERE k=1")
n3Cache := tc.Server(2).DistSenderI().(*kvcoord.DistSender).RangeDescriptorCache()
entry = n3Cache.GetCached(ctx, tablePrefix, false /* inverted */)
require.NotNil(t, entry)
require.False(t, entry.Lease().Empty())
require.Equal(t, roachpb.StoreID(1), entry.Lease().Replica.StoreID)
require.Equal(t, []roachpb.ReplicaDescriptor{
{NodeID: 1, StoreID: 1, ReplicaID: 1},
{NodeID: 3, StoreID: 3, ReplicaID: 3, Type: roachpb.NON_VOTER},
}, entry.Desc().Replicas().Descriptors())

// Enable DistSQL so that we have a distributed plan with a single flow on
// n3 (local plans ignore the misplanned ranges).
n4.Exec(t, "SET distsql=on")

// Run a historical query and assert that it's served from the follower (n3).
// n4 should choose n3 to plan the TableReader on because we pretend n3 has
// a lower latency (see testing knob).

// Note that this query is such that the physical planning needs to fetch
// the ReplicaInfo twice for the same range. This allows us to verify that
// the cached - in the spanResolverIterator - information is correctly
// preserved.
historicalQuery = `SELECT * FROM [SELECT * FROM test WHERE k=2 UNION ALL SELECT * FROM test WHERE k=3] AS OF SYSTEM TIME follower_read_timestamp()`
n4.Exec(t, historicalQuery)
rec = <-recCh

// Sanity check that the plan was distributed.
require.True(t, strings.Contains(rec.String(), "creating DistSQL plan with isLocal=false"))
// Look at the trace and check that we've served a follower read.
require.True(t, kv.OnlyFollowerReads(rec), "query was not served through follower reads: %s", rec)
// Verify that we didn't produce the "misplanned ranges" metadata that would
// purge the non-stale entries from the range cache on n4.
require.False(t, strings.Contains(rec.String(), "clearing entries overlapping"))
// Also confirm that we didn't even check for the "misplanned ranges"
// metadata on n3.
require.False(t, strings.Contains(rec.String(), "checking range cache to see if range info updates should be communicated to the gateway"))
}

// TestSecondaryTenantFollowerReadsRouting ensures that secondary tenants route
Expand Down
52 changes: 31 additions & 21 deletions pkg/cmd/roachtest/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,10 @@ var (
cockroachPath string
// maps cpuArch to the corresponding crdb binary's absolute path
cockroach = make(map[vm.CPUArch]string)
// user-specified path to short crdb binary
cockroachShortPath string
// maps cpuArch to the corresponding short crdb (i.e., without UI) binary's absolute path
cockroachShort = make(map[vm.CPUArch]string)
// user-specified path to crdb binary with runtime assertions enabled (EA)
cockroachEAPath string
// maps cpuArch to the corresponding crdb binary with runtime assertions enabled (EA)
cockroachEA = make(map[vm.CPUArch]string)
// user-specified path to workload binary
workloadPath string
// maps cpuArch to the corresponding workload binary's absolute path
Expand Down Expand Up @@ -320,15 +320,15 @@ func initBinariesAndLibraries() {
}
fmt.Printf("Locating and verifying binaries for os=%q, arch=%q\n", defaultOSName, defaultArch)

// Finds and validates a binary. If the binary 'isRequired', but not found, exit and print the error.
resolveBinary := func(binName string, userSpecified string, arch vm.CPUArch, isRequired bool, checkEA bool) (string, error) {
// Finds and validates a binary.
resolveBinary := func(binName string, userSpecified string, arch vm.CPUArch, exitOnErr bool, checkEA bool) (string, error) {
path := binName
if userSpecified != "" {
path = userSpecified
}
abspath, err := findBinary(path, defaultOSName, arch, checkEA)
if err != nil {
if isRequired {
if exitOnErr {
fmt.Fprintf(os.Stderr, "ERROR: unable to find required binary %q for %q: %v\n", binName, arch, err)
os.Exit(1)
}
Expand All @@ -340,12 +340,18 @@ func initBinariesAndLibraries() {
}
// Bail out if a path other than the user-specified was found.
userPath, err := filepath.Abs(userSpecified)
if err != nil {
if exitOnErr {
fmt.Fprintf(os.Stderr, "ERROR: unable to find required binary %q for %q: %v\n", binName, arch, err)
os.Exit(1)
}
return "", err
}
if userPath != abspath {
err = errors.Errorf("found %q at: %q instead of the user-specified path: %q\n", binName, abspath, userSpecified)

if err != nil || userPath != abspath {
err = errors.Wrapf(err, "ERROR: found %q at: %s instead of the user-specified path: %q\n", binName, abspath, userSpecified)

if isRequired {
fmt.Fprintf(os.Stderr, "%v", err)
if exitOnErr {
fmt.Fprintf(os.Stderr, "ERROR: unable to find required binary %q for %q: %v\n", binName, arch, err)
os.Exit(1)
}
return "", err
Expand All @@ -357,29 +363,29 @@ func initBinariesAndLibraries() {

cockroach[defaultArch], _ = resolveBinary("cockroach", cockroachPath, defaultArch, true, false)
workload[defaultArch], _ = resolveBinary("workload", workloadPath, defaultArch, true, false)
cockroachShort[defaultArch], err = resolveBinary("cockroach-short", cockroachShortPath, defaultArch, false, true)
cockroachEA[defaultArch], err = resolveBinary("cockroach-ea", cockroachEAPath, defaultArch, false, true)
if err != nil {
fmt.Fprintf(os.Stderr, "WARN: unable to find %q for %q: %s\n", "cockroach-short", defaultArch, err)
fmt.Fprintf(os.Stderr, "WARN: unable to find %q for %q: %s\n", "cockroach-ea", defaultArch, err)
}

if arm64Probability > 0 && defaultArch != vm.ArchARM64 {
fmt.Printf("Locating and verifying binaries for os=%q, arch=%q\n", defaultOSName, vm.ArchARM64)
// We need to verify we have all the required binaries for arm64.
cockroach[vm.ArchARM64], _ = resolveBinary("cockroach", cockroachPath, vm.ArchARM64, true, false)
workload[vm.ArchARM64], _ = resolveBinary("workload", workloadPath, vm.ArchARM64, true, false)
cockroachShort[vm.ArchARM64], err = resolveBinary("cockroach-short", cockroachShortPath, vm.ArchARM64, false, true)
cockroachEA[vm.ArchARM64], err = resolveBinary("cockroach-ea", cockroachEAPath, vm.ArchARM64, false, true)
if err != nil {
fmt.Fprintf(os.Stderr, "WARN: unable to find %q for %q: %s\n", "cockroach-short", vm.ArchARM64, err)
fmt.Fprintf(os.Stderr, "WARN: unable to find %q for %q: %s\n", "cockroach-ea", vm.ArchARM64, err)
}
}
if fipsProbability > 0 && defaultArch != vm.ArchFIPS {
fmt.Printf("Locating and verifying binaries for os=%q, arch=%q\n", defaultOSName, vm.ArchFIPS)
// We need to verify we have all the required binaries for fips.
cockroach[vm.ArchFIPS], _ = resolveBinary("cockroach", cockroachPath, vm.ArchFIPS, true, false)
workload[vm.ArchFIPS], _ = resolveBinary("workload", workloadPath, vm.ArchFIPS, true, false)
cockroachShort[vm.ArchFIPS], err = resolveBinary("cockroach-short", cockroachShortPath, vm.ArchFIPS, false, true)
cockroachEA[vm.ArchFIPS], err = resolveBinary("cockroach-ea", cockroachEAPath, vm.ArchFIPS, false, true)
if err != nil {
fmt.Fprintf(os.Stderr, "WARN: unable to find %q for %q: %s\n", "cockroach-short", vm.ArchFIPS, err)
fmt.Fprintf(os.Stderr, "WARN: unable to find %q for %q: %s\n", "cockroach-ea", vm.ArchFIPS, err)
}
}

Expand Down Expand Up @@ -417,9 +423,9 @@ func initBinariesAndLibraries() {
fmt.Printf("\tworkload %q at: %s\n", arch, path)
}
}
for arch, path := range cockroachShort {
for arch, path := range cockroachEA {
if path != "" {
fmt.Printf("\tcockroach-short %q at: %s\n", arch, path)
fmt.Printf("\tcockroach-ea %q at: %s\n", arch, path)
}
}
for arch, paths := range libraryFilePaths {
Expand Down Expand Up @@ -1950,7 +1956,11 @@ func (c *clusterImpl) PutLibraries(
if !contains(libraries, nil, libName) {
continue
}
putPath := filepath.Join(libraryDir, libName)
// Get the last extension (e.g., .so) to create a destination file.
// N.B. The optional arch-specific extension is elided since the destination doesn't need it, nor does it know
// how to resolve it. (E.g., see findLibraryDirectories in geos.go)
ext := filepath.Ext(filepath.Base(libraryFilePath))
putPath := filepath.Join(libraryDir, libName+ext)
if err := c.PutE(
ctx,
c.l,
Expand Down
6 changes: 3 additions & 3 deletions pkg/cmd/roachtest/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -183,11 +183,11 @@ func main() {
"Username to use as a cluster name prefix. "+
"If blank, the current OS user is detected and specified.")
rootCmd.PersistentFlags().StringVar(
&cockroachPath, "cockroach", "", "path to cockroach binary to use")
&cockroachPath, "cockroach", "", "absolute path to cockroach binary to use")
rootCmd.PersistentFlags().StringVar(
&cockroachShortPath, "cockroach-short", "", "path to cockroach-short binary (compiled with crdb_test build tag) to use")
&cockroachEAPath, "cockroach-ea", "", "absolute path to cockroach binary with enabled (runtime) assertions (i.e, compiled with crdb_test)")
rootCmd.PersistentFlags().StringVar(
&workloadPath, "workload", "", "path to workload binary to use")
&workloadPath, "workload", "", "absolute path to workload binary to use")
rootCmd.PersistentFlags().Float64Var(
&encryptionProbability, "metamorphic-encryption-probability", defaultEncryptionProbability,
"probability that clusters will be created with encryption-at-rest enabled "+
Expand Down
2 changes: 1 addition & 1 deletion pkg/cmd/roachtest/test_runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -699,7 +699,7 @@ func (r *testRunner) runWorker(
t := &testImpl{
spec: &testToRun.spec,
cockroach: cockroach[arch],
cockroachShort: cockroachShort[arch],
cockroachShort: cockroachEA[arch],
deprecatedWorkload: workload[arch],
buildVersion: binaryVersion,
artifactsDir: artifactsDir,
Expand Down
Loading

0 comments on commit d660d51

Please sign in to comment.