Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/sql/schemachanger/schemachanger_test: TestConcurrentDeclarativeSchemaChanges failed #106732

Closed
cockroach-teamcity opened this issue Jul 12, 2023 · 8 comments · Fixed by #107636
Assignees
Labels
branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. skipped-test T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jul 12, 2023

pkg/sql/schemachanger/schemachanger_test.TestConcurrentDeclarativeSchemaChanges failed with artifacts on master @ b19302fd436f7e8667b289677723e337d333bb50:

=== RUN   TestConcurrentDeclarativeSchemaChanges
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/3e4c90800e6eae84b5bf83507e7ef6e8/logTestConcurrentDeclarativeSchemaChanges1906842654
    test_log_scope.go:81: use -show-logs to present logs inline
    schemachanger_test.go:169: 
        	Error Trace:	pkg/sql/schemachanger/schemachanger_test_test/pkg/sql/schemachanger/schemachanger_test.go:169
        	Error:      	Should not be zero, but was 0
        	Test:       	TestConcurrentDeclarativeSchemaChanges
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/3e4c90800e6eae84b5bf83507e7ef6e8/logTestConcurrentDeclarativeSchemaChanges1906842654
--- FAIL: TestConcurrentDeclarativeSchemaChanges (3.74s)

The failing assertion is

	// The ALTER PRIMARY KEY schema change must have been blocked.
	require.NotZero(t, alterPrimaryKeyBlockedCounter.Load())
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-29694

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Jul 12, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.2 milestone Jul 12, 2023
@rafiss
Copy link
Collaborator

rafiss commented Jul 14, 2023

This was just recently added in #106175

@rafiss
Copy link
Collaborator

rafiss commented Jul 14, 2023

@postamar ccing you for any ideas since you added the test recently. Since you're out today I'll give this a look until you're back Monday.

@postamar postamar added release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 labels Jul 17, 2023
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 17, 2023
The newly-rewritten TestConcurrentDeclarativeSchemaChanges test had
a concurrency bug in it, which caused it to fail in cases where the
two schema changes actually take place serially. It's possible for the
later one to start after the earlier one has already finished.

Fixes: cockroachdb#106732.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 17, 2023
The newly-rewritten TestConcurrentDeclarativeSchemaChanges test had
a concurrency bug in it, which caused it to fail in cases where the
two schema changes actually take place serially. It's possible for the
later one to start after the earlier one has already finished.

Fixes: cockroachdb#106732.

Release note: None
@DrewKimball
Copy link
Collaborator

Hi @postamar, is the fix for this expected to merge soon? If not I can put up a skip in the meantime.

@cockroach-teamcity
Copy link
Member Author

pkg/sql/schemachanger/schemachanger_test.TestConcurrentDeclarativeSchemaChanges failed with artifacts on master @ 30acaf93015008a672667c0d3fdc00ac95f446a6:

=== RUN   TestConcurrentDeclarativeSchemaChanges
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/3e4c90800e6eae84b5bf83507e7ef6e8/logTestConcurrentDeclarativeSchemaChanges145910907
    test_log_scope.go:81: use -show-logs to present logs inline
    schemachanger_test.go:169: 
        	Error Trace:	pkg/sql/schemachanger/schemachanger_test_test/pkg/sql/schemachanger/schemachanger_test.go:169
        	Error:      	Should not be zero, but was 0
        	Test:       	TestConcurrentDeclarativeSchemaChanges
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/3e4c90800e6eae84b5bf83507e7ef6e8/logTestConcurrentDeclarativeSchemaChanges145910907
--- FAIL: TestConcurrentDeclarativeSchemaChanges (4.77s)
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Jul 19, 2023
Skips `TestConcurrentDeclarativeSchemaChanges`.

Informs cockroachdb#106732

Release note: None
craig bot pushed a commit that referenced this issue Jul 19, 2023
106860: ssmemstorage: improve lock contention on RecordStatement r=j82w a=j82w

1. The stats object lock was being held even during the insights event
creation and insert which has it's own lock. The insights logic is now
moved above the stats lock.
2. Moved latency calculation outside the stats lock.
3. Moved error code calculation outside the stats lock.
4. ss_mem_storage converted to use a RWMutex. Each object in the map
has it's own lock. Read lock allows mulitple threads to read from the
map at the same time which is the common case. Writes are only done the
first time a statement is seen.

Fixes: #106789, Fixes: #55285

```
./dev bench pkg/sql/sqlstats/sslocal -f BenchmarkRecordStatement  --ignore-cache --verbose  --test-args='-test.mutexprofile=mutex_pr.out -test.cpuprofile=cpu_pr.out -test.benchtime=30s' --timeout "10m"
```

```
// Master no changes
BenchmarkRecordStatement/RecordStatement:_Parallelism_10-10         	64775088	       558.8 ns/op
    sql_stats_test.go:655: -- test log scope end --
PASS
================================================================================
INFO: Elapsed time: 37.529s, Critical Path: 37.26s
INFO: 2 processes: 1 internal, 1 darwin-sandbox.
INFO: Build completed successfully, 2 total actions
//pkg/sql/sqlstats/sslocal:sslocal_test                                  PASSED in 37.2s
```
```
// Changes after the PR
BenchmarkRecordStatement/RecordStatement:_Parallelism_10-10         	67960656	       523.8 ns/op
    sql_stats_test.go:655: -- test log scope end --
PASS
================================================================================
INFO: Elapsed time: 37.031s, Critical Path: 36.69s
INFO: 2 processes: 1 internal, 1 darwin-sandbox.
INFO: Build completed successfully, 2 total actions
//pkg/sql/sqlstats/sslocal:sslocal_test                                  PASSED in 36.7s
```

Release note (performance improvement): Reduced lock contention on `ssmemstorage.RecordStatement`. This is useful for workloads that execute the same statement concurrently on the same SQL instance.

106912: telemetryccl: Split `TestTelemetry` into smaller sub-tests r=Santamaura a=Santamaura

This change breaks up `TestTelemetry` and the `multiregion` test into four tests due to the test being quite large. The tests are: `multiregion`, `multiregion_db`, `multiregion_table`, and `multiregion_row`.

Informs: #103004

Release note: None

107099: sql: fix `CREATE TABLE AS` sourcing `SHOW <show_subcmd> <table>` job failures r=chengxiong-ruan a=ecwall

Fixes #106260

Previously `CREATE TABLE AS`/`CREATE MATERIALIZED VIEW AS` sourcing from `SHOW <show_subcmd> <table>` generated a failing schema change job with a `relation "tbl" does not exist error` because the SHOW source table was not fully qualified.

This PR fixes this by respecting the `Builder.qualifyDataSourceNamesInAST` flag in `delegate.TryDelegate()` which implements the failing SHOW commands.

Release note (bug fix): Fix failing schema change job when CREATE TABLE AS or CREATE MATERAILIZED VIEW AS sources from a SHOW command:
1. CREATE TABLE t AS SELECT * FROM [SHOW CREATE TABLE tbl];
2. CREATE TABLE t AS SELECT * FROM [SHOW INDEXES FROM tbl];
3. CREATE TABLE t AS SELECT * FROM [SHOW COLUMNS FROM tbl];
4. CREATE TABLE t AS SELECT * FROM [SHOW CONSTRAINTS FROM tbl];
5. CREATE TABLE t AS SELECT * FROM [SHOW PARTITIONS FROM TABLE tbl];
6. CREATE TABLE t AS SELECT * FROM [SHOW PARTITIONS FROM INDEX tbl@tbl_pkey];

107170: ui: fix creation index syntax on console r=maryliag a=maryliag

With an update to make the table name fully qualified, the index name was also using the fully qualified name, which contains ".", and that causes an invalid syntax since index name can't have periods.
Having the qualified name is not important for the index name itself, so this commit fixes by ignoring that part and using just the table name, how it was doing previously. It also fixes an invalid syntax when there were spaces on the name generate.

It also add a little more observability into indexes being created with STORING clause, adding that to the suffix of the index creation.

Fixes #107168

Before
<img width="583" alt="Screenshot 2023-07-19 at 10 23 21 AM" src="https://github.com/cockroachdb/cockroach/assets/1017486/e5792419-31af-4e5f-994e-13ddb716009c">


After
https://www.loom.com/share/33da8b46fc9e4f72944c1d0ab943dea0

Release note (bug fix): Index recommendation on the UI no longer uses the full qualified name of a table to create an index name, allowing the creating of indexes directly from the Console to work.

107189: tests: fix stories files r=maryliag a=maryliag

Stories files used for testing were missing some fixtures. This commit adds all missing parameters.

Epic: none

Release note: None

107197: schemachanger: skip flakey test r=rafiss a=DrewKimball

Skips `TestConcurrentDeclarativeSchemaChanges`.

Informs #106732

Release note: None

107207: bazel: make `fastbuild` the default `-c` again; never strip for `dev` builds r=rail a=rickystewart

This is a partial revert of `60e8bcec61269c6648dc40c8094fbf303b8a02aa`.

In that commit I made a change to make un-stripped binaries the default by making `dbg` the default `--compilation_mode`. This had unexpected consequences as this actually disables inlining and optimization, thereby breaking some tests, and overall it is surprising that a Go binary built by Bazel would be un-optimized by default since this is the opposite of `go build`'s default behavior.

Instead, we make `fastbuild` the default (again), and to solve the problem of binaries being unstripped we set `--strip never` for `dev` builds. (`dev doctor` makes you clarify that your build is a `dev` build on setup so no one can miss this.)

Now we have the following behavior for each `compilation_mode`:

* `dbg`: disable optimizations and inlining.
* `fastbuild` and `opt`: will produce the same binary, except `fastbuild` defaults to stripping in non-`dev` configurations.

If you want an un-optimized binary for some reason, it still works to build with `-c dbg`.

Epic: CRDB-17171
Release note: None

Co-authored-by: j82w <[email protected]>
Co-authored-by: Alex Santamaura <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: maryliag <[email protected]>
Co-authored-by: Drew Kimball <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@postamar
Copy link
Contributor

Sorry for not being realistic and skipping this test sooner. I'll un-skip it and fix it early next week.

@DrewKimball
Copy link
Collaborator

Sorry for not being realistic and skipping this test sooner. I'll un-skip it and fix it early next week.

No worries :)

@Xiang-Gu
Copy link
Contributor

The failure comes from an uncareful setup of the test. Namely, the test did something like

go another_func(){
...
}()

unblock func()

with the intention that another_func() is going to run, and hit some knob, and then func() is going to proceed to completion.

The rare race occurs when another_func() takes a while to be scheduled and by the time it's run, func() has already finished its execution, and thus another_func() won't hit the desired knob, causing the test to fail.

(yeah, func is one schema change that's executed and paused, and another_func is going to issue another schema change to test concurrent schema change behavior)

Overall this test can use some refactoring and I've rewritten it here that is tighter on interactions between goroutins and easier to read. Any feedback is welcome.

@postamar
Copy link
Contributor

I overlooked this latest comment, sorry! You're right, it's a concurrency bug. I like your rewrite except for the fact that it doesn't test that the schema changer actually blocks. The existing code doesn't either, for that matter, it tests that the schema changer has blocked. As you noticed this doesn't always happen because due to timing issues the schema changes might get executed in sequence.

craig bot pushed a commit that referenced this issue Jul 26, 2023
105857: sql: add implicit SELECT FOR SHARE locking to FK checks r=DrewKimball,nvanbenschoten,rytaft,mgartner a=michae2

**explain: add transaction information to EXPLAIN ANALYZE**

Add transaction isolation, priority, and quality-of-service to the
output of `EXPLAIN ANALYZE`.

Release note (sql change): `EXPLAIN ANALYZE` output now includes:
- the isolation level of the statement's transaction
- the priority of the statement's transaction
- the quality of service level of the statement's transaction

---

**opt: do not use LockDurabilityGuaranteed under serializable isolation**

This is a follow-up from #103734.

We do not want to use guaranteed-durable (a.k.a. replicated) locking
under serializable isolation, because it prevents pipelining and other
optimizations, and is unnecessary for correctness. This commit amends
8cbc6d1 to only set durability for
`SELECT FOR UPDATE` locking under weaker isolation levels.

This means that query plans will be slightly different under different
isolation levels, and so we must add isolation level to the optimizer
memo staleness calculation.

Furthermore, this commit changes the error message added by
e633d5e to be about guaranteed-durable
locking rather than `SELECT FOR UPDATE`, because in a later commit this
specific error will also be triggered by foreign key checks under weaker
isolation levels.

Informs: #100144, #100156, #100193, #100194

Release note: None

---

**opt: show locking durability in EXPLAIN (OPT) output**

Because the "guaranteed-durable locking not yet implemented" error
condition is checked in execbuilder, it prevents not only execution but
also `EXPLAIN` of queries using guaranteed-durable locking. Thankfully
`EXPLAIN (OPT)` bypasses execbuilder, and hence still works, so use this
for now to verify that we are enabling durable locking for `SELECT FOR
UPDATE` under read committed isolation.

(Note that we have not yet fixed the `SELECT FOR UPDATE` plans to use
more precise locking, that will come in a later PR.)

Informs: #100194

Release note: None

---

**sql: add implicit SELECT FOR SHARE locking to FK parent checks**

Add SELECT FOR SHARE locking to FK parent checks. Under serializable
isolation, this locking is only used when
`enable_implicit_fk_locking_for_serializable` is set. Under weaker
isolation levels (snapshot and read committed) this locking is always
used.

We only need to lock during the insertion-side FK checks, which verify
the existence of a parent row. Deletion-side FK checks verify the
non-existence of a child row, and these do not need to lock. Instead, to
prevent concurrent inserts or updates to the child that would violate
the FK constraint, we rely on the intent(s) created by the deletion
conflicting with the FK locking of those concurrent inserts or updates.

Fixes: #80683
Informs: #100156

Epic: CRDB-25322

Release note (sql change): Add a new session variable,
`enable_implicit_fk_locking_for_serializable`, which controls locking
during foreign key checks under serializable isolation. With this set to
true, foreign key checks of the referenced (parent) table, such as those
performed during an INSERT or UPDATE of the referencing (child) table,
will lock the referenced row using SELECT FOR SHARE locking. (This is
somewhat analogous to the existing `enable_implicit_select_for_update`
variable but applies to the foreign key checks of a mutation statement
instead of the initial row fetch.)

Under weaker isolation levels such as read committed, SELECT FOR SHARE
locking will always be used to ensure the database maintains the foreign
key constraint, regardless of the current setting of
`enable_implicit_fk_locking_for_serializable`.

107212: ui-e2e-tests: steps to enable cypress tests r=maryliag a=rickystewart

This doesn't get the job fully working yet, but it's an improvement.

Epic: none
Part of #106584
Release note: None

107517: roachtest: add read committed variants of ycsb r=michae2 a=nvanbenschoten

Closes #107112.

This PR adds the following six roachtest variants:
```
ycsb/A/nodes=3/cpu=32/isolation-level=read-committed
ycsb/B/nodes=3/cpu=32/isolation-level=read-committed
ycsb/C/nodes=3/cpu=32/isolation-level=read-committed
ycsb/D/nodes=3/cpu=32/isolation-level=read-committed
ycsb/E/nodes=3/cpu=32/isolation-level=read-committed
ycsb/F/nodes=3/cpu=32/isolation-level=read-committed
```

It does so after adding an `--isolation-level` flag to ycsb, which controls the isolation level to run the workload transactions under. If unset, the workload will run with the default isolation level of the database.

Release note: None

107636: schemachanger: deflake TestConcurrentDeclarativeSchemaChanges r=postamar a=postamar

This commit deflakes this test by checking that the second schema change actually does block because of the first one, rather than checking that it has blocked. The bug was that the latter wasn't always guaranteed to happen because we didn't force the schema changes to run in parallel.

Fixes #106732.

Release note: None

Co-authored-by: Michael Erickson <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Marius Posta <[email protected]>
@craig craig bot closed this as completed in abc0966 Jul 26, 2023
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 27, 2023
This commit deflakes this test by checking that the second schema change
actually does block because of the first one, rather than checking that
it has blocked. The bug was that the latter wasn't always guaranteed to
happen because we didn't force the schema changes to run in parallel.

Fixes cockroachdb#106732.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 27, 2023
This commit deflakes this test by checking that the second schema change
actually does block because of the first one, rather than checking that
it has blocked. The bug was that the latter wasn't always guaranteed to
happen because we didn't force the schema changes to run in parallel.

Fixes cockroachdb#106732.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. skipped-test T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants