Merge #84324 #84398 #84430 #84433

84324: sqlsmith: make order-dependent aggregation functions deterministic r=msirek,mgartner a=michae2 **cmd: add smith command** Add a new top-level command `smith` which dumps randomly-generated sqlsmith queries. This is useful for testing modifications to sqlsmith. Assists: #83024 Release note: None **sqlsmith: make order-dependent aggregation functions deterministic** Some aggregation functions (e.g. string_agg) have results that depend on the order of input rows. To make sqlsmith more deterministic, add ORDER BY clauses to these aggregation functions whenever their argument is a column reference. (When their argument is a constant, ordering doesn't matter.) Fixes: #83024 Release note: None 84398: colflow: prevent deadlocks when many queries spill to disk at same time r=yuzefovich a=yuzefovich **colflow: prevent deadlocks when many queries spill to disk at same time** This commit fixes a long-standing issue which could cause memory-intensive queries to deadlock on acquiring the file descriptors quota when vectorized execution spills to disk. This bug has been present since the introduction of disk-spilling (over two and a half years ago, introduced in #45318 and partially mitigated in #45892), but we haven't seen this in any user reports, only in `tpch_concurrency` roachtest runs, so the severity seems pretty minor. Consider the following query plan: ``` Node 1 Node 2 TableReader TableReader | | HashRouter HashRouter | \ ___________ / | | \/__________ | | / \ | HashAggregator HashAggregator ``` and let's imagine that each hash aggregator has to spill to disk. This would require acquiring the file descriptors quota. Now, imagine that because of that hash aggregators' spilling, each of the hash routers has slow outputs causing them to spill too. As a result, this query plan can require `A + 2 * R` number of FDs of a single node to succeed where `A` is the quota for a single hash aggregator (equal to 16 - with the default value of `COCKROACH_VEC_MAX_OPEN_FDS` environment variable which is 256) and `R` is the quota for a single router output (2). This means that we can estimate that 20 FDs from each node are needed for the query to finish execution with 16 FDs being acquired first. Now imagine that this query is run with concurrency of 16. We can end up in such a situation that all hash aggregators have spilled, fully exhausting the global node limit on each node, so whenever the hash router outputs need to spill, they block forever since no FDs will ever be released, until a query is canceled or a node is shutdown. In other words, we have a deadlock. This commit fixes this situation by introducing a retry mechanism to exponentially backoff when trying to acquire the FD quota, until a time out. The randomizations provided by the `retry` package should be sufficient so that some of the queries succeed while others result in an error. Unfortunately, I don't see a way to prevent this deadlock from occurring in the first place without possible increase in latency in some case. The difficult thing is that we currently acquire FDs only once we need them, meaning once a particular component spills to disk. We could acquire the maximum number of FDs that a query might need up-front, before the query execution starts, but that could lead to starvation of the queries that ultimately won't spill to disk. This seems like a much worse impact than receiving timeout errors on some analytical queries when run with high concurrency. We're not an OLAP database, so this behavior seems ok. Fixes: #80290. Release note (bug fix): Previously, CockroachDB could deadlock when evaluating analytical queries f multiple queries had to spill to disk at the same time. This is now fixed by making some of the queries error out instead. **roachtest: remove some debugging printouts in tpch_concurrency** This was added to track down the deadlock fixed in the previous commit, so we no longer need it. Release note: None 84430: sql/schemachanger/scplan: allow plan to move to NonRevertible earlier r=ajwerner a=ajwerner This is critical for DROP COLUMN. In `DROP COLUMN` we cannot perform the primary index swap until the dropping column reaches `DELETE_ONLY`, which is not revertible. The primary index swap itself is revertible. Given this fact, we need a mechanism to be able to "promote" revertible operations (operations which don't destroy information or cause irrevocable visible side effects) to be grouped with or after non-revertible operations. This commit makes that happen naturally. Release note: None 84433: rowexec: increase the batch size for join reader ordering strategy r=yuzefovich a=yuzefovich This commit increases the default value of `sql.distsql.join_reader_ordering_strategy.batch_size` cluster setting (which determines the input row batch size for the lookup joins when ordering needs to be maintained) from 10KiB to 100KiB since the previous number is likely to have been too conservative. We have seen support cases (https://github.com/cockroachlabs/support/issues/1627) where bumping up this setting was needed to get reasonable performance, we also change this setting to 100KiB in our TPC-E setup (https://github.com/cockroachlabs/tpc-e/blob/d47d3ea5ce71ecb1be5e0e466a8aa7c94af63d17/tier-a/src/schema.rs#L404). I did some historical digging, and I believe that the number 10KiB was chosen somewhat arbitrarily with no real justification in #48058. That PR changed how we measure the input row batches from the number of rows to the memory footprint of the input rows. Prior to that change we had 100 rows limit, so my guess that 10KiB was somewhat dependent on that number. The reason we don't want this batch size to be too large is that we buffer all looked up rows in a disk-backed container, so if too many responses come back (because we included too many input rows in the batch), that container has to spill to disk. To make sure we don't regress in such scenarios this commit teaches the join reader to lower the batch size bytes limit if the container does spill to disk, until 10 KiB which is treated as the minimum. Release note: None Co-authored-by: Michael Erickson <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
cockroachdb · Jul 14, 2022 · e8818d5 · e8818d5
5 parents 2546b2a + f118f5d + 6189bb3 + 20acd12 + e9f1c05
commit e8818d5
Show file tree

Hide file tree

Showing 84 changed files with 1,120 additions and 621 deletions.
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -271,6 +271,7 @@
 /pkg/cmd/skip-test/          @cockroachdb/test-eng
 /pkg/cmd/skiperrs/           @cockroachdb/sql-experience
 /pkg/cmd/skipped-tests/      @cockroachdb/test-eng
+/pkg/cmd/smith/              @cockroachdb/sql-queries
 /pkg/cmd/smithcmp/           @cockroachdb/sql-queries
 /pkg/cmd/smithtest/          @cockroachdb/sql-queries
 /pkg/cmd/teamcity-trigger/   @cockroachdb/dev-inf

diff --git a/pkg/BUILD.bazel b/pkg/BUILD.bazel
@@ -860,6 +860,8 @@ GO_TARGETS = [
     "//pkg/cmd/skiperrs:skiperrs_lib",
     "//pkg/cmd/skipped-tests:skipped-tests",
     "//pkg/cmd/skipped-tests:skipped-tests_lib",
+    "//pkg/cmd/smith:smith",
+    "//pkg/cmd/smith:smith_lib",
     "//pkg/cmd/smithcmp:smithcmp",
     "//pkg/cmd/smithcmp:smithcmp_lib",
     "//pkg/cmd/smithtest:smithtest",
@@ -2107,6 +2109,7 @@ GET_X_DATA_TARGETS = [
     "//pkg/cmd/skip-test:get_x_data",
     "//pkg/cmd/skiperrs:get_x_data",
     "//pkg/cmd/skipped-tests:get_x_data",
+    "//pkg/cmd/smith:get_x_data",
     "//pkg/cmd/smithcmp:get_x_data",
     "//pkg/cmd/smithtest:get_x_data",
     "//pkg/cmd/teamcity-trigger:get_x_data",

diff --git a/pkg/ccl/schemachangerccl/testdata/end_to_end/create_index b/pkg/ccl/schemachangerccl/testdata/end_to_end/create_index
@@ -285,7 +285,7 @@ notified job registry to adopt jobs: [1]
 begin transaction #2
 commit transaction #2
 begin transaction #3
-## PostCommitPhase stage 1 of 7 with 3 MutationType ops
+## PostCommitPhase stage 1 of 6 with 3 MutationType ops
 upsert descriptor #104
   ...
        - ABSENT
@@ -314,14 +314,14 @@ upsert descriptor #104
      unexposedParentSchemaId: 101
   -  version: "2"
   +  version: "3"
-update progress of schema change job #1: "PostCommitPhase stage 2 of 7 with 1 BackfillType op pending"
+update progress of schema change job #1: "PostCommitPhase stage 2 of 6 with 1 BackfillType op pending"
 commit transaction #3
 begin transaction #4
-## PostCommitPhase stage 2 of 7 with 1 BackfillType op
+## PostCommitPhase stage 2 of 6 with 1 BackfillType op
 backfill indexes [2] from index #1 in table #104
 commit transaction #4
 begin transaction #5
-## PostCommitPhase stage 3 of 7 with 3 MutationType ops
+## PostCommitPhase stage 3 of 6 with 3 MutationType ops
 upsert descriptor #104
   ...
        - PUBLIC
@@ -350,10 +350,10 @@ upsert descriptor #104
      unexposedParentSchemaId: 101
   -  version: "3"
   +  version: "4"
-update progress of schema change job #1: "PostCommitPhase stage 4 of 7 with 1 MutationType op pending"
+update progress of schema change job #1: "PostCommitPhase stage 4 of 6 with 1 MutationType op pending"
 commit transaction #5
 begin transaction #6
-## PostCommitPhase stage 4 of 7 with 3 MutationType ops
+## PostCommitPhase stage 4 of 6 with 3 MutationType ops
 upsert descriptor #104
   ...
        - PUBLIC
@@ -382,29 +382,35 @@ upsert descriptor #104
      unexposedParentSchemaId: 101
   -  version: "4"
   +  version: "5"
-update progress of schema change job #1: "PostCommitPhase stage 5 of 7 with 1 BackfillType op pending"
+update progress of schema change job #1: "PostCommitPhase stage 5 of 6 with 1 BackfillType op pending"
 commit transaction #6
 begin transaction #7
-## PostCommitPhase stage 5 of 7 with 1 BackfillType op
+## PostCommitPhase stage 5 of 6 with 1 BackfillType op
 merge temporary indexes [3] into backfilled indexes [2] in table #104
 commit transaction #7
 begin transaction #8
-## PostCommitPhase stage 6 of 7 with 1 ValidationType op
+## PostCommitPhase stage 6 of 6 with 1 ValidationType op
 validate forward indexes [2] in table #104
 commit transaction #8
 begin transaction #9
-## PostCommitPhase stage 7 of 7 with 4 MutationType ops
+## PostCommitNonRevertiblePhase stage 1 of 2 with 5 MutationType ops
 upsert descriptor #104
   ...
        - PUBLIC
        - PUBLIC
   -    - MERGE_ONLY
   -    - ABSENT
        - PUBLIC
+  -    - WRITE_ONLY
+       - PUBLIC
+       - PUBLIC
+  +    - TRANSIENT_DELETE_ONLY
+       - PUBLIC
+       - PUBLIC
   +    - PUBLIC
   +    - PUBLIC
-       - WRITE_ONLY
-       - PUBLIC
+       jobId: "1"
+       relevantStatements:
   ...
              BY LIST (id) (PARTITION p1 VALUES IN (1))
            statementTag: CREATE INDEX
@@ -448,7 +454,8 @@ upsert descriptor #104
   +    version: 4
   +  modificationTime: {}
      mutations:
-     - direction: ADD
+  -  - direction: ADD
+  +  - direction: DROP
        index:
   -      constraintId: 2
   -      createdExplicitly: true
@@ -485,35 +492,6 @@ upsert descriptor #104
   -    index:
          constraintId: 3
          createdExplicitly: true
-  ...
-       time: {}
-     unexposedParentSchemaId: 101
-  -  version: "5"
-  +  version: "6"
-update progress of schema change job #1: "PostCommitNonRevertiblePhase stage 1 of 2 with 1 MutationType op pending"
-set schema change job #1 to non-cancellable
-commit transaction #9
-begin transaction #10
-## PostCommitNonRevertiblePhase stage 1 of 2 with 3 MutationType ops
-upsert descriptor #104
-  ...
-       - PUBLIC
-       - PUBLIC
-  -    - WRITE_ONLY
-  +    - TRANSIENT_DELETE_ONLY
-       - PUBLIC
-       - PUBLIC
-  ...
-       - money
-       version: 4
-  -  modificationTime:
-  -    wallTime: "1640995200000000009"
-  +  modificationTime: {}
-     mutations:
-  -  - direction: ADD
-  +  - direction: DROP
-       index:
-         constraintId: 3
   ...
          version: 4
        mutationId: 1
@@ -524,11 +502,12 @@ upsert descriptor #104
   ...
        time: {}
      unexposedParentSchemaId: 101
-  -  version: "6"
-  +  version: "7"
+  -  version: "5"
+  +  version: "6"
 update progress of schema change job #1: "PostCommitNonRevertiblePhase stage 2 of 2 with 2 MutationType ops pending"
-commit transaction #10
-begin transaction #11
+set schema change job #1 to non-cancellable
+commit transaction #9
+begin transaction #10
 ## PostCommitNonRevertiblePhase stage 2 of 2 with 4 MutationType ops
 upsert descriptor #104
   ...
@@ -694,7 +673,7 @@ upsert descriptor #104
        - money
        version: 4
   -  modificationTime:
-  -    wallTime: "1640995200000000010"
+  -    wallTime: "1640995200000000009"
   -  mutations:
   -  - direction: DROP
   -    index:
@@ -737,12 +716,12 @@ upsert descriptor #104
   ...
        time: {}
      unexposedParentSchemaId: 101
-  -  version: "7"
-  +  version: "8"
+  -  version: "6"
+  +  version: "7"
 write *eventpb.FinishSchemaChange to event log for descriptor 104
 create job #2 (non-cancelable: true): "GC for "
   descriptor IDs: [104]
 update progress of schema change job #1: "all stages completed"
-commit transaction #11
+commit transaction #10
 notified job registry to adopt jobs: [2]
 # end PostCommitPhase
diff --git a/pkg/cmd/dev/build.go b/pkg/cmd/dev/build.go
@@ -99,6 +99,9 @@ var buildTargetMapping = map[string]string{
 	"roachprod-stress": "//pkg/cmd/roachprod-stress:roachprod-stress",
 	"roachtest":        "//pkg/cmd/roachtest:roachtest",
 	"short":            "//pkg/cmd/cockroach-short:cockroach-short",
+	"smith":            "//pkg/cmd/smith:smith",
+	"smithcmp":         "//pkg/cmd/smithcmp:smithcmp",
+	"smithtest":        "//pkg/cmd/smithtest:smithtest",
 	"staticcheck":      "@co_honnef_go_tools//cmd/staticcheck:staticcheck",
 	"stress":           stressTarget,
 	"swagger":          "@com_github_go_swagger_go_swagger//cmd/swagger:swagger",

diff --git a/pkg/cmd/roachtest/tests/query_comparison_util.go b/pkg/cmd/roachtest/tests/query_comparison_util.go
@@ -177,7 +177,7 @@ func runOneRoundQueryComparison(
 
 	// Initialize a smither that generates only deterministic SELECT statements.
 	smither, err := sqlsmith.NewSmither(conn, rnd,
-		sqlsmith.DisableMutations(), sqlsmith.DisableImpureFns(), sqlsmith.DisableLimits(),
+		sqlsmith.DisableMutations(), sqlsmith.DisableNondeterministicFns(), sqlsmith.DisableLimits(),
 		sqlsmith.UnlikelyConstantPredicate(), sqlsmith.FavorCommonData(),
 		sqlsmith.UnlikelyRandomNulls(), sqlsmith.DisableCrossJoins(),
 		sqlsmith.DisableIndexHints(), sqlsmith.DisableWith(),

diff --git a/pkg/cmd/roachtest/tests/tlp.go b/pkg/cmd/roachtest/tests/tlp.go
@@ -147,7 +147,7 @@ func runOneTLP(
 
 	// Initialize a smither that will never generate mutations.
 	tlpSmither, err := sqlsmith.NewSmither(conn, rnd,
-		sqlsmith.DisableMutations(), sqlsmith.DisableImpureFns())
+		sqlsmith.DisableMutations(), sqlsmith.DisableNondeterministicFns())
 	if err != nil {
 		t.Fatal(err)
 	}

diff --git a/pkg/cmd/roachtest/tests/tpch_concurrency.go b/pkg/cmd/roachtest/tests/tpch_concurrency.go
@@ -13,7 +13,6 @@ package tests
 import (
 	"context"
 	"fmt"
-	"strings"
 	"time"
 
 	"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster"
@@ -107,22 +106,6 @@ func registerTPCHConcurrency(r registry.Registry) {
 			// Run each query once on each connection.
 			for queryNum := 1; queryNum <= tpch.NumQueries; queryNum++ {
 				t.Status("running Q", queryNum)
-				// To aid during the debugging later, we'll print the DistSQL
-				// diagram of the query.
-				rows, err := conn.Query("EXPLAIN (DISTSQL) " + tpch.QueriesByNumber[queryNum])
-				if err != nil {
-					return err
-				}
-				defer rows.Close()
-				for rows.Next() {
-					var line string
-					if err = rows.Scan(&line); err != nil {
-						t.Fatal(err)
-					}
-					if strings.Contains(line, "Diagram:") {
-						t.Status(line)
-					}
-				}
 				// The way --max-ops flag works is as follows: the global ops
 				// counter is incremented **after** each worker completes a
 				// single operation, so it is possible for all connections start

diff --git a/pkg/cmd/smith/BUILD.bazel b/pkg/cmd/smith/BUILD.bazel
@@ -0,0 +1,22 @@
+load("//build/bazelutil/unused_checker:unused.bzl", "get_x_data")
+load("@io_bazel_rules_go//go:def.bzl", "go_binary", "go_library")
+
+go_library(
+    name = "smith_lib",
+    srcs = ["main.go"],
+    importpath = "github.com/cockroachdb/cockroach/pkg/cmd/smith",
+    visibility = ["//visibility:private"],
+    deps = [
+        "//pkg/internal/sqlsmith",
+        "//pkg/util/randutil",
+        "@com_github_cockroachdb_errors//:errors",
+    ],
+)
+
+go_binary(
+    name = "smith",
+    embed = [":smith_lib"],
+    visibility = ["//visibility:public"],
+)
+
+get_x_data(name = "get_x_data")