-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/sql/schemachanger/schemachanger_test: TestConcurrentDeclarativeSchemaChanges failed #106732
Comments
This was just recently added in #106175 |
@postamar ccing you for any ideas since you added the test recently. Since you're out today I'll give this a look until you're back Monday. |
The newly-rewritten TestConcurrentDeclarativeSchemaChanges test had a concurrency bug in it, which caused it to fail in cases where the two schema changes actually take place serially. It's possible for the later one to start after the earlier one has already finished. Fixes: cockroachdb#106732. Release note: None
The newly-rewritten TestConcurrentDeclarativeSchemaChanges test had a concurrency bug in it, which caused it to fail in cases where the two schema changes actually take place serially. It's possible for the later one to start after the earlier one has already finished. Fixes: cockroachdb#106732. Release note: None
Hi @postamar, is the fix for this expected to merge soon? If not I can put up a skip in the meantime. |
pkg/sql/schemachanger/schemachanger_test.TestConcurrentDeclarativeSchemaChanges failed with artifacts on master @ 30acaf93015008a672667c0d3fdc00ac95f446a6:
Same failure on other branches
|
Skips `TestConcurrentDeclarativeSchemaChanges`. Informs cockroachdb#106732 Release note: None
106860: ssmemstorage: improve lock contention on RecordStatement r=j82w a=j82w 1. The stats object lock was being held even during the insights event creation and insert which has it's own lock. The insights logic is now moved above the stats lock. 2. Moved latency calculation outside the stats lock. 3. Moved error code calculation outside the stats lock. 4. ss_mem_storage converted to use a RWMutex. Each object in the map has it's own lock. Read lock allows mulitple threads to read from the map at the same time which is the common case. Writes are only done the first time a statement is seen. Fixes: #106789, Fixes: #55285 ``` ./dev bench pkg/sql/sqlstats/sslocal -f BenchmarkRecordStatement --ignore-cache --verbose --test-args='-test.mutexprofile=mutex_pr.out -test.cpuprofile=cpu_pr.out -test.benchtime=30s' --timeout "10m" ``` ``` // Master no changes BenchmarkRecordStatement/RecordStatement:_Parallelism_10-10 64775088 558.8 ns/op sql_stats_test.go:655: -- test log scope end -- PASS ================================================================================ INFO: Elapsed time: 37.529s, Critical Path: 37.26s INFO: 2 processes: 1 internal, 1 darwin-sandbox. INFO: Build completed successfully, 2 total actions //pkg/sql/sqlstats/sslocal:sslocal_test PASSED in 37.2s ``` ``` // Changes after the PR BenchmarkRecordStatement/RecordStatement:_Parallelism_10-10 67960656 523.8 ns/op sql_stats_test.go:655: -- test log scope end -- PASS ================================================================================ INFO: Elapsed time: 37.031s, Critical Path: 36.69s INFO: 2 processes: 1 internal, 1 darwin-sandbox. INFO: Build completed successfully, 2 total actions //pkg/sql/sqlstats/sslocal:sslocal_test PASSED in 36.7s ``` Release note (performance improvement): Reduced lock contention on `ssmemstorage.RecordStatement`. This is useful for workloads that execute the same statement concurrently on the same SQL instance. 106912: telemetryccl: Split `TestTelemetry` into smaller sub-tests r=Santamaura a=Santamaura This change breaks up `TestTelemetry` and the `multiregion` test into four tests due to the test being quite large. The tests are: `multiregion`, `multiregion_db`, `multiregion_table`, and `multiregion_row`. Informs: #103004 Release note: None 107099: sql: fix `CREATE TABLE AS` sourcing `SHOW <show_subcmd> <table>` job failures r=chengxiong-ruan a=ecwall Fixes #106260 Previously `CREATE TABLE AS`/`CREATE MATERIALIZED VIEW AS` sourcing from `SHOW <show_subcmd> <table>` generated a failing schema change job with a `relation "tbl" does not exist error` because the SHOW source table was not fully qualified. This PR fixes this by respecting the `Builder.qualifyDataSourceNamesInAST` flag in `delegate.TryDelegate()` which implements the failing SHOW commands. Release note (bug fix): Fix failing schema change job when CREATE TABLE AS or CREATE MATERAILIZED VIEW AS sources from a SHOW command: 1. CREATE TABLE t AS SELECT * FROM [SHOW CREATE TABLE tbl]; 2. CREATE TABLE t AS SELECT * FROM [SHOW INDEXES FROM tbl]; 3. CREATE TABLE t AS SELECT * FROM [SHOW COLUMNS FROM tbl]; 4. CREATE TABLE t AS SELECT * FROM [SHOW CONSTRAINTS FROM tbl]; 5. CREATE TABLE t AS SELECT * FROM [SHOW PARTITIONS FROM TABLE tbl]; 6. CREATE TABLE t AS SELECT * FROM [SHOW PARTITIONS FROM INDEX tbl@tbl_pkey]; 107170: ui: fix creation index syntax on console r=maryliag a=maryliag With an update to make the table name fully qualified, the index name was also using the fully qualified name, which contains ".", and that causes an invalid syntax since index name can't have periods. Having the qualified name is not important for the index name itself, so this commit fixes by ignoring that part and using just the table name, how it was doing previously. It also fixes an invalid syntax when there were spaces on the name generate. It also add a little more observability into indexes being created with STORING clause, adding that to the suffix of the index creation. Fixes #107168 Before <img width="583" alt="Screenshot 2023-07-19 at 10 23 21 AM" src="https://github.com/cockroachdb/cockroach/assets/1017486/e5792419-31af-4e5f-994e-13ddb716009c"> After https://www.loom.com/share/33da8b46fc9e4f72944c1d0ab943dea0 Release note (bug fix): Index recommendation on the UI no longer uses the full qualified name of a table to create an index name, allowing the creating of indexes directly from the Console to work. 107189: tests: fix stories files r=maryliag a=maryliag Stories files used for testing were missing some fixtures. This commit adds all missing parameters. Epic: none Release note: None 107197: schemachanger: skip flakey test r=rafiss a=DrewKimball Skips `TestConcurrentDeclarativeSchemaChanges`. Informs #106732 Release note: None 107207: bazel: make `fastbuild` the default `-c` again; never strip for `dev` builds r=rail a=rickystewart This is a partial revert of `60e8bcec61269c6648dc40c8094fbf303b8a02aa`. In that commit I made a change to make un-stripped binaries the default by making `dbg` the default `--compilation_mode`. This had unexpected consequences as this actually disables inlining and optimization, thereby breaking some tests, and overall it is surprising that a Go binary built by Bazel would be un-optimized by default since this is the opposite of `go build`'s default behavior. Instead, we make `fastbuild` the default (again), and to solve the problem of binaries being unstripped we set `--strip never` for `dev` builds. (`dev doctor` makes you clarify that your build is a `dev` build on setup so no one can miss this.) Now we have the following behavior for each `compilation_mode`: * `dbg`: disable optimizations and inlining. * `fastbuild` and `opt`: will produce the same binary, except `fastbuild` defaults to stripping in non-`dev` configurations. If you want an un-optimized binary for some reason, it still works to build with `-c dbg`. Epic: CRDB-17171 Release note: None Co-authored-by: j82w <[email protected]> Co-authored-by: Alex Santamaura <[email protected]> Co-authored-by: Evan Wall <[email protected]> Co-authored-by: maryliag <[email protected]> Co-authored-by: Drew Kimball <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
Sorry for not being realistic and skipping this test sooner. I'll un-skip it and fix it early next week. |
No worries :) |
The failure comes from an uncareful setup of the test. Namely, the test did something like
with the intention that The rare race occurs when (yeah, Overall this test can use some refactoring and I've rewritten it here that is tighter on interactions between goroutins and easier to read. Any feedback is welcome. |
I overlooked this latest comment, sorry! You're right, it's a concurrency bug. I like your rewrite except for the fact that it doesn't test that the schema changer actually blocks. The existing code doesn't either, for that matter, it tests that the schema changer has blocked. As you noticed this doesn't always happen because due to timing issues the schema changes might get executed in sequence. |
105857: sql: add implicit SELECT FOR SHARE locking to FK checks r=DrewKimball,nvanbenschoten,rytaft,mgartner a=michae2 **explain: add transaction information to EXPLAIN ANALYZE** Add transaction isolation, priority, and quality-of-service to the output of `EXPLAIN ANALYZE`. Release note (sql change): `EXPLAIN ANALYZE` output now includes: - the isolation level of the statement's transaction - the priority of the statement's transaction - the quality of service level of the statement's transaction --- **opt: do not use LockDurabilityGuaranteed under serializable isolation** This is a follow-up from #103734. We do not want to use guaranteed-durable (a.k.a. replicated) locking under serializable isolation, because it prevents pipelining and other optimizations, and is unnecessary for correctness. This commit amends 8cbc6d1 to only set durability for `SELECT FOR UPDATE` locking under weaker isolation levels. This means that query plans will be slightly different under different isolation levels, and so we must add isolation level to the optimizer memo staleness calculation. Furthermore, this commit changes the error message added by e633d5e to be about guaranteed-durable locking rather than `SELECT FOR UPDATE`, because in a later commit this specific error will also be triggered by foreign key checks under weaker isolation levels. Informs: #100144, #100156, #100193, #100194 Release note: None --- **opt: show locking durability in EXPLAIN (OPT) output** Because the "guaranteed-durable locking not yet implemented" error condition is checked in execbuilder, it prevents not only execution but also `EXPLAIN` of queries using guaranteed-durable locking. Thankfully `EXPLAIN (OPT)` bypasses execbuilder, and hence still works, so use this for now to verify that we are enabling durable locking for `SELECT FOR UPDATE` under read committed isolation. (Note that we have not yet fixed the `SELECT FOR UPDATE` plans to use more precise locking, that will come in a later PR.) Informs: #100194 Release note: None --- **sql: add implicit SELECT FOR SHARE locking to FK parent checks** Add SELECT FOR SHARE locking to FK parent checks. Under serializable isolation, this locking is only used when `enable_implicit_fk_locking_for_serializable` is set. Under weaker isolation levels (snapshot and read committed) this locking is always used. We only need to lock during the insertion-side FK checks, which verify the existence of a parent row. Deletion-side FK checks verify the non-existence of a child row, and these do not need to lock. Instead, to prevent concurrent inserts or updates to the child that would violate the FK constraint, we rely on the intent(s) created by the deletion conflicting with the FK locking of those concurrent inserts or updates. Fixes: #80683 Informs: #100156 Epic: CRDB-25322 Release note (sql change): Add a new session variable, `enable_implicit_fk_locking_for_serializable`, which controls locking during foreign key checks under serializable isolation. With this set to true, foreign key checks of the referenced (parent) table, such as those performed during an INSERT or UPDATE of the referencing (child) table, will lock the referenced row using SELECT FOR SHARE locking. (This is somewhat analogous to the existing `enable_implicit_select_for_update` variable but applies to the foreign key checks of a mutation statement instead of the initial row fetch.) Under weaker isolation levels such as read committed, SELECT FOR SHARE locking will always be used to ensure the database maintains the foreign key constraint, regardless of the current setting of `enable_implicit_fk_locking_for_serializable`. 107212: ui-e2e-tests: steps to enable cypress tests r=maryliag a=rickystewart This doesn't get the job fully working yet, but it's an improvement. Epic: none Part of #106584 Release note: None 107517: roachtest: add read committed variants of ycsb r=michae2 a=nvanbenschoten Closes #107112. This PR adds the following six roachtest variants: ``` ycsb/A/nodes=3/cpu=32/isolation-level=read-committed ycsb/B/nodes=3/cpu=32/isolation-level=read-committed ycsb/C/nodes=3/cpu=32/isolation-level=read-committed ycsb/D/nodes=3/cpu=32/isolation-level=read-committed ycsb/E/nodes=3/cpu=32/isolation-level=read-committed ycsb/F/nodes=3/cpu=32/isolation-level=read-committed ``` It does so after adding an `--isolation-level` flag to ycsb, which controls the isolation level to run the workload transactions under. If unset, the workload will run with the default isolation level of the database. Release note: None 107636: schemachanger: deflake TestConcurrentDeclarativeSchemaChanges r=postamar a=postamar This commit deflakes this test by checking that the second schema change actually does block because of the first one, rather than checking that it has blocked. The bug was that the latter wasn't always guaranteed to happen because we didn't force the schema changes to run in parallel. Fixes #106732. Release note: None Co-authored-by: Michael Erickson <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Marius Posta <[email protected]>
This commit deflakes this test by checking that the second schema change actually does block because of the first one, rather than checking that it has blocked. The bug was that the latter wasn't always guaranteed to happen because we didn't force the schema changes to run in parallel. Fixes cockroachdb#106732. Release note: None
This commit deflakes this test by checking that the second schema change actually does block because of the first one, rather than checking that it has blocked. The bug was that the latter wasn't always guaranteed to happen because we didn't force the schema changes to run in parallel. Fixes cockroachdb#106732. Release note: None
pkg/sql/schemachanger/schemachanger_test.TestConcurrentDeclarativeSchemaChanges failed with artifacts on master @ b19302fd436f7e8667b289677723e337d333bb50:
The failing assertion is
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-29694
The text was updated successfully, but these errors were encountered: