ccl/workloadccl/allccl: TestDeterministicInitialData failed #93958

cockroach-teamcity · 2022-12-20T08:14:37Z

ccl/workloadccl/allccl.TestDeterministicInitialData failed with artifacts on master @ 61f7c816574848f804f1adf7f42f7315bfcaaeba:

=== RUN   TestDeterministicInitialData
--- FAIL: TestDeterministicInitialData (8.73s)
=== RUN   TestDeterministicInitialData/tpch
    all_test.go:305: 
        	Error Trace:	/home/roach/.cache/bazel/_bazel_roach/c5a4e7d36696d9cd970af2045211a7df/sandbox/processwrapper-sandbox/3236/execroot/com_github_cockroachdb_cockroach/bazel-out/k8-fastbuild/bin/pkg/ccl/workloadccl/allccl/allccl_test_/allccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/workloadccl/allccl/all_test.go:305
        	Error:      	Not equal: 
        	            	expected: 0xe013881749bb67e8
        	            	actual  : 0x7373fe217bd5bdb
        	Test:       	TestDeterministicInitialData/tpch
    --- FAIL: TestDeterministicInitialData/tpch (5.95s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/test-eng @cockroachdb/sql-sessions _{This test on roachdash | Improve this report!

Jira issue: CRDB-22615}

The text was updated successfully, but these errors were encountered:

nicktrav · 2022-12-21T22:26:09Z

Was feeling festive and charitable, so I did a bisect, which lead me to 7c9dd1b. Looks like the flake rate increased after this.

I'm using the following to reproduce:

$ ./dev test ./pkg/ccl/workloadccl/allccl --filter TestDeterministicInitialData/tpch --stress

cc: @rytaft

I'm skipping this test until I have time to debug the reason for the flake. Informs cockroachdb#93958 Release note: None

93089: sql, clusterversion: add new index usage column on system.statement_statistics r=maryliag a=maryliag This commits adds a new index `indexes_usage` virtual column on value `(statistics-> 'statistics'-> 'indexes')` with respective `indexes_usage_idx` index on `system.statement_statistics` table. Part Of #93087 Performance Tests: YCSB Workload | branch | elapsed | errors | ops(total) | ops/sec(cum) | avg(ms) | p50(ms) | p95(ms) | p99(ms) | pMax(ms)| | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | main | 1860.0s | 0 | 37623701 | 20227.8 | 1.0 | 0.8 | 2.5 | 4.5 | 134.2 | | idx | 1860.0s | 0 | 44831436 | 24102.9 | 0.8 | 0.7 | 2.1 | 3.7 | 352.3 | TPC C Workload | branch | elapsed | tpmC | efc | avg(ms) | p50(ms) | p90(ms) | p95(ms) | p99(ms) | pMax(ms)| | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | main | 1860.0s | 12.4 | 96.3% | 29.8 | 29.4 | 35.7 | 39.8 | 44.0 | 46.1 | | idx | 1860.0s | 12.3 | 95.3% | 28.0 | 28.3 | 37.7 | 39.8 | 46.1 | 52.4 | Release note (sql change): Add column `indexes_usage` and index `indexes_usage_idx` on value on table `system.statement_statistics`. 93990: colexecjoin: clarify handling of INTERSECT ALL joins r=yuzefovich a=yuzefovich This commit clarifies how the INTERSECT ALL joins are handled by the hash joiner. Previously, this type of join would be handled via the same collection method as other non-outer joins with non-distinct keys. However, there is a crucial difference - `Same` slice would always remain unpopulated for INTERSECT ALL, so the collection method would be reduced to a much simpler version. This is the case since for this type of join we only need to include the left tuple once if it has a match and there is no need to traverse the hash chain trying to find more matches. In fact, the collection method for INTERSECT ALL is very similar to that of EXCEPT ALL with only a condition being negated, so this commit takes advantage of this observation and refactors `collectLeftAnti` to also power the INTERSECT ALL collection. This makes the code easier to follow and also allows us to not instantiate `Same` array for the INTERSECT ALL joins. Additionally, it allows us to remove some duplicated code in the `Check` function which was previously conditioned on `Same` being non-nil even though both branches had exactly the same code. The dead code in `distinctCollect` is also now removed. Epic: None Release note: None 94108: go.mod: bump Raft to 65a0bf3 r=nvanbenschoten a=nvanbenschoten This picks up etcd-io/raft#8. However, it doesn't enabled the `AsyncStorageWrites` configuration, so we do not expect any impact from the change. Let's hope there are no surprises. Release note: None Epic: CRDB-6037 94109: workloadccl: skip TestDeterministicInitialData to avoid flakes r=rytaft a=rytaft I'm skipping this test until I have time to debug the reason for the flake. Informs #93958 Release note: None Co-authored-by: maryliag <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Rebecca Taft <[email protected]>

I'm skipping this test until I have time to debug the reason for the flake. Informs #93958 Release note: None

I'm skipping this test until I have time to debug the reason for the flake. Informs cockroachdb#93958 Release note: None

srosenberg · 2022-12-22T07:12:41Z

Non-determinism arises from namePerm which may get reset [1] when the corresponding sync.Pool item is empty [2].
The probability of the pool item getting GCed during a test run is much lower than p=0.5, hence the observed flakiness.
Consequently, P_NAME column [3] may end up with different byte values for any pair of runs with the identical PRNG seed.

It's not necessary to pass namePerm by reference across invocations of randPartName. The following uses a modified Fischer-Yates shuffle to generate an unbiased, random nPartNames-permutation. The proposed solution removes namePerm from sync.Pool and the underlying source of non-determinism.

func randPartName(rng *rand.Rand, a *bufalloc.ByteAllocator) []byte {
	namePerm := make([]int, len(randPartNames))
	for i := range namePerm {
		namePerm[i] = i
	}
	for i := 0; i < nPartNames; i++ {
                // N.B. Correctness requires that i <= j < len(namePerm)
		j := rng.Intn(len(namePerm) - i) + i
		namePerm[i], namePerm[j] = namePerm[j], namePerm[i]
	}
       // rest of the code to copy the permuted words into buf
        ...
}

[1]

cockroach/pkg/workload/tpch/tpch.go

Line 206 in 15bc0c4

namePerm[i] = i

[2]

cockroach/pkg/workload/tpch/generate.go

Line 148 in 15bc0c4

l := w.localsPool.Get().(*generateLocals)

[3]

cockroach/pkg/workload/tpch/generate.go

Line 159 in 15bc0c4

cb.ColVec(1).Bytes().Set(0, randPartName(rng, l.namePerm, a))

86891: bazci: enable --flaky_test_attempts=3 for Unit tests on release branches r=rickystewart,healthy-pod,renatolabs a=srosenberg When a test fails, --flaky_test_attempts=3 will cause up to two retry attempts by executing the corresponding test binary (shard). If a test passes on either attempt, it is reported 'FLAKY' in the test result summary. In addition, each attempt yields an (XML) log under 'test_attempts' subdirectory. The attempts are now copied into the artifactsDir. Note, while the test result summary may say the test is 'FLAKY', it's reported as 'PASS' as per main test.xml. Thus, a 'FLAKY' test will not fail the CI; i.e., a test succeeding after up to two retries is assumed to have _passed_ whereas three consecutive failures result in a failed test. Consequently, CI now has a higher probability of passing than before this change. However, the two additional attempts per test binary (shard) could slow down the build. Hence, initially the plan is to enable only on release branches and not staging (i.e., bors). This change supports [1]. In a follow-up PR, a test reporter could be augmented to parse attempt logs thereby obtaining a set of FLAKY tests which should be skipped on subsequent builds. [1] #81293 Release justification: CI improvement Release note: None Epic: None 92955: opt: inline UDFs as subqueries r=mgartner a=mgartner UDFs are now inlined as subqueries by a normalization rule when possible, speeding up their evaluation. Epic: CRDB-20370 Release note (performance improvement): Some types of user-defined functions are now inlined in query plans as relation expressions, which speeds up their evaluation. UDFs must be non-volatile and have a single statement in the function body to be inlined. 94660: colserde: perform memory accounting for scratch slices in the converter r=yuzefovich a=yuzefovich This commit introduces the memory accounting for values and offsets scratch slices that are being reused across `BatchToArrow` calls since recently. This required a lot of plumbing to propagate the memory account from all places. The converter is careful to only grow and shrink the account by its own usage, so the same memory account can be reused across multiple converters (as long as there is no concurrency going on, and we only can have concurrency for hash router outputs, so there we give each output a separate account). An additional minor change is that now in `diskQueue.Enqueue` we properly `Close` the `FileSerializer` before `nil`-ing it out. This isn't a problem per se since it is the caller's responsibility to close the account used by the serializer, but it's nice to properly release the accounted for bytes. Epic: None Release note: None 94702: workload: fix non-determinism in TPC-H data generation r=rytaft a=rytaft This commit fixes the non-determinism in the TPC-H data generation by using a local slice for the random permutation of indexes into `randPartNames`. It also fixes the permutation algorithm to use a modified Fisher–Yates shuffle. Fixes #93958 Release note: None 94713: sql: add user-facing error for invalid job type in job_payload_type builtin r=jayshrivastava a=jayshrivastava ### sql: add user-facing error for invalid job type in job_payload_type builtin Previously, the `job_payload_type` builtin would return an internal error if the payload could be unmarshalled, but the type could not be determined. This changes ensures the builtin returns an appropriate user-facing error in this case. Epic: none Fixes: #94680 Release note: None 94723: kvserver: check lease rebalance target exists in cached storelist r=erikgrinaker a=kvoli From #91593 it was possible for a new lease target to have its store descriptor dereferenced despite not existing in a list of candidate stores earlier. This patch resolves the dereference. fixes: #94688 Release note: None Co-authored-by: Stan Rosenberg <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Austen McClernon <[email protected]>

This commit fixes the non-determinism in the TPC-H data generation by using a local slice for the random permutation of indexes into randPartNames. It also fixes the permutation algorithm to use a modified Fisher–Yates shuffle. Fixes #93958 Release note: None

cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Dec 20, 2022

cockroach-teamcity added this to the 23.1 milestone Dec 20, 2022

blathers-crl bot added the T-testeng TestEng Team label Dec 20, 2022

This was referenced Dec 20, 2022

ccl/workloadccl/allccl: TestDeterministicInitialData failed #93961

Closed

ccl/workloadccl/allccl: TestDeterministicInitialData failed #94035

Closed

nicktrav mentioned this issue Dec 21, 2022

go.mod: bump Pebble to caecab07d786 #94096

Merged

rytaft added a commit to rytaft/cockroach that referenced this issue Dec 21, 2022

workloadccl: skip TestDeterministicInitialData to avoid flakes

bbba0ce

I'm skipping this test until I have time to debug the reason for the flake. Informs cockroachdb#93958 Release note: None

rytaft mentioned this issue Dec 21, 2022

workloadccl: skip TestDeterministicInitialData to avoid flakes #94109

Merged

blathers-crl bot pushed a commit that referenced this issue Dec 22, 2022

workloadccl: skip TestDeterministicInitialData to avoid flakes

7e7c5a4

I'm skipping this test until I have time to debug the reason for the flake. Informs #93958 Release note: None

This was referenced Dec 22, 2022

release-22.1: workloadccl: skip TestDeterministicInitialData to avoid flakes #94117

Merged

release-22.2: workloadccl: skip TestDeterministicInitialData to avoid flakes #94118

Merged

blathers-crl bot pushed a commit that referenced this issue Dec 22, 2022

workloadccl: skip TestDeterministicInitialData to avoid flakes

d383d84

I'm skipping this test until I have time to debug the reason for the flake. Informs #93958 Release note: None

adityamaru pushed a commit to adityamaru/cockroach that referenced this issue Dec 22, 2022

workloadccl: skip TestDeterministicInitialData to avoid flakes

b8086b6

I'm skipping this test until I have time to debug the reason for the flake. Informs cockroachdb#93958 Release note: None

rafiss added the skipped-test label Dec 22, 2022

blathers-crl bot added the T-sql-queries SQL Queries Team label Dec 22, 2022

DrewKimball assigned rytaft Jan 3, 2023

rytaft mentioned this issue Jan 4, 2023

workload: fix non-determinism in TPC-H data generation #94702

Merged

craig bot closed this as completed in 06bc01c Jan 4, 2023

This was referenced Jan 4, 2023

release-22.1: workload: fix non-determinism in TPC-H data generation #94732

Merged

release-22.2: workload: fix non-determinism in TPC-H data generation #94733

Merged

exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label Jan 4, 2023

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ccl/workloadccl/allccl: TestDeterministicInitialData failed #93958

ccl/workloadccl/allccl: TestDeterministicInitialData failed #93958

cockroach-teamcity commented Dec 20, 2022 •

edited by cockroach-jira-scripts

Loading

nicktrav commented Dec 21, 2022

srosenberg commented Dec 22, 2022 •

edited

Loading

ccl/workloadccl/allccl: TestDeterministicInitialData failed #93958

ccl/workloadccl/allccl: TestDeterministicInitialData failed #93958

Comments

cockroach-teamcity commented Dec 20, 2022 • edited by cockroach-jira-scripts Loading

nicktrav commented Dec 21, 2022

srosenberg commented Dec 22, 2022 • edited Loading

cockroach-teamcity commented Dec 20, 2022 •

edited by cockroach-jira-scripts

Loading

srosenberg commented Dec 22, 2022 •

edited

Loading