Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpcc/mixed-headroom/multiple-upgrades/n5cpu16 failed #92230

Closed
cockroach-teamcity opened this issue Nov 20, 2022 · 6 comments · Fixed by #92597
Closed

roachtest: tpcc/mixed-headroom/multiple-upgrades/n5cpu16 failed #92230

cockroach-teamcity opened this issue Nov 20, 2022 · 6 comments · Fixed by #92597
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-testeng TestEng Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Nov 20, 2022

roachtest.tpcc/mixed-headroom/multiple-upgrades/n5cpu16 failed with artifacts on master @ cfb5ae9a96e1770daa4aef1615a46e212b561a84:

test artifacts and logs in: /artifacts/tpcc/mixed-headroom/multiple-upgrades/n5cpu16/run_1
(test_impl.go:291).Fatal: 2: expected version 1000022.2-8, got 1000022.1-16
(test_impl.go:291).Fatal: monitor failure: monitor task failed: output in run_170102.906212042_n5_cockroach_workload_run_tpcc: ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=1h40m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4} returned: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-21665

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Nov 20, 2022
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Nov 20, 2022
@blathers-crl blathers-crl bot added the T-testeng TestEng Team label Nov 20, 2022
@renatolabs renatolabs removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Nov 21, 2022
@renatolabs
Copy link
Contributor

renatolabs commented Nov 21, 2022

17:05:57 versionupgrade.go:503: 1000022.2-8: waiting for cluster to auto-upgrade
17:10:57 test_impl.go:347: test failure #1: (test_impl.go:291).Fatal: 2: expected version 1000022.2-8, got 1000022.1-16

Related to #92153. I believe the upgrade can occasionally take more than 5 minutes here given the amount of data. I'll make a patch soon to increase the timeout.

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 1a6e9f885baa124d5ff2996adb966ea15a1a9b2b:

test artifacts and logs in: /artifacts/tpcc/mixed-headroom/n5cpu16/run_1
(test_impl.go:291).Fatal: 1: expected version 1000022.2-8, got 1000022.1-16
(test_impl.go:291).Fatal: monitor failure: monitor task failed: output in run_171818.828838702_n5_cockroach_workload_run_tpcc: ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=1h40m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4} returned: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=zfs , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/multiple-upgrades/n5cpu16 failed with artifacts on master @ 1a6e9f885baa124d5ff2996adb966ea15a1a9b2b:

test artifacts and logs in: /artifacts/tpcc/mixed-headroom/multiple-upgrades/n5cpu16/run_1
(test_impl.go:314).Errorf: 
	Error Trace:	/go/src/github.com/cockroachdb/cockroach/tpcc.go:174
	            				/go/src/github.com/cockroachdb/cockroach/tpcc.go:202
	            				/go/src/github.com/cockroachdb/cockroach/tpcc.go:254
	            				/go/src/github.com/cockroachdb/cockroach/tpcc.go:384
	            				/go/src/github.com/cockroachdb/cockroach/mixed_version_jobs.go:62
	            				/go/src/github.com/cockroachdb/cockroach/monitor.go:105
	            				/go/src/github.com/cockroachdb/cockroach/errgroup.go:75
	            				/go/src/github.com/cockroachdb/cockroach/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	EOF
	Test:       	tpcc/mixed-headroom/multiple-upgrades/n5cpu16
(test_impl.go:303).FailNow: FailNow called
(test_impl.go:291).Fatal: EOF
(test_impl.go:314).Errorf: test timed out (0s)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_fs=zfs , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/multiple-upgrades/n5cpu16 failed with artifacts on master @ 1a6e9f885baa124d5ff2996adb966ea15a1a9b2b:

test artifacts and logs in: /artifacts/tpcc/mixed-headroom/multiple-upgrades/n5cpu16/run_1
(test_impl.go:291).Fatal: 1: expected version 22.2, got 22.1-16
(test_impl.go:291).Fatal: monitor failure: monitor task failed: output in run_160020.959491117_n5_cockroach_workload_run_tpcc: ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=10m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4} returned: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=zfs , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 1a6e9f885baa124d5ff2996adb966ea15a1a9b2b:

test artifacts and logs in: /artifacts/tpcc/mixed-headroom/n5cpu16/run_1
(test_impl.go:291).Fatal: EOF
(test_impl.go:314).Errorf: 
	Error Trace:	/go/src/github.com/cockroachdb/cockroach/tpcc.go:174
	            				/go/src/github.com/cockroachdb/cockroach/tpcc.go:202
	            				/go/src/github.com/cockroachdb/cockroach/tpcc.go:254
	            				/go/src/github.com/cockroachdb/cockroach/tpcc.go:384
	            				/go/src/github.com/cockroachdb/cockroach/mixed_version_jobs.go:62
	            				/go/src/github.com/cockroachdb/cockroach/monitor.go:105
	            				/go/src/github.com/cockroachdb/cockroach/errgroup.go:75
	            				/go/src/github.com/cockroachdb/cockroach/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	EOF
	Test:       	tpcc/mixed-headroom/n5cpu16
(test_impl.go:303).FailNow: FailNow called
(test_impl.go:314).Errorf: test timed out (0s)

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=zfs , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/multiple-upgrades/n5cpu16 failed with artifacts on master @ 1a6e9f885baa124d5ff2996adb966ea15a1a9b2b:

test artifacts and logs in: /artifacts/tpcc/mixed-headroom/multiple-upgrades/n5cpu16/run_1
(test_impl.go:291).Fatal: 2: expected version 22.2, got 22.1-12
(test_impl.go:291).Fatal: monitor failure: monitor task failed: output in run_170000.769733852_n5_cockroach_workload_run_tpcc: ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=10m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4} returned: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=true , ROACHTEST_fs=zfs , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue Nov 28, 2022
92289: catalog,tabledesc: add & adopt Constraint subtypes r=Xiang-Gu a=postamar

These new interfaces help encapsulate the descriptor protobufs for table
constraints:
  - CheckConstraint,
  - ForeignKeyConstraint,
  - UniqueWithoutIndexConstraint,
  - UniqueWithIndexConstraint.

These are the leaves of the Constraint type tree, and Constraint is the
interface at the root of it. The changes in this commit mimic what was
done for Column & Index, which respectively wrap descpb.ColumnDescriptor
and descpb.IndexDescriptor.

These new constraint interfaces are adopted liberally throughout the
code base, but not completely: many references to descpb-protobufs still
need to be replaced.

This commit also removes the somewhat confusing concept of "active"
versus "inactive" constraint. Not only was it confusing to me, evidently
it was also confusing to other contributors considering how several mild
bugs and inconsistencies made their way into the code. This commit
replaces this concept with that of an "enforced" constraint:
  - A constraint is enforced if it applies to data written to the table,
    regardless of whether it has been validated on the data already in
    the table prior to the constraint coming into existence.

The implementation is somewhat awkward for constraints in that the same
constraint descriptor can be featured twice inside
a descpb.TableDescriptor, in its active constraint slice (such as
Checks, etc.) and also in the Mutations slice. In these cases the
interface wraps the non-mutation constraint descriptor protobuf, with no
observed changes to the database's behavior.

This commit required alterations to the table descriptor's
post-deserialization changes. The change which assigns constraint IDs
to table descriptors which don't have themhad some subtle bugs which went
unnoticed until this commit forced a heavier reliance on these fields.
This commit also adds a new change which ensures that a check
constraint's ColumnIDs slice is populated based on the columns
referenced in its expression, replacing ancient code which predates the
concept of post-deserialization changes and which would compute this
lazily.

As a more general observation: it appears that the handling of adding
and dropping constraints following other schema changes was mildly
inconsistent and this commit tries to improve this. For instance, it was
possible to drop a column which had been set as NOT NULL in the same
transaction, but not do the same for any other kind of constraint. Also
it was possible to reuse the name of a dropping constraint, despite this
constraint still being enforced. The new behavior is more strict, yet
this should be barely noticeable by the user in most practical cases:
it's rare that one wants to add a constraint referencing a column and
also drop that column in the same transaction, for instance.

Fixes #91918.

Release note: None

92597: upgrades: remove upgrade granting CREATELOGIN role opt r=andreimatei a=andreimatei

This patch removes a permanent upgrade that was granting the CREATELOGIN role option to users that had the CREATEROLE option. This upgrade was introduced in 20.2, meant to grant the then-new CREATELOGIN option to users created in 20.1 and prior that had the CREATEROLE option (CREATELOGIN was being split from CREATEROLE).

This code stayed around as a startupmigration since it was introduced, even though it didn't make sense for it to stay around after 20.2. Technically, I think this startup migration should have had the flag `IncludedInBootstrap=v20.2`, since we don't want it to run for clusters created at or after 20.2; this migration is not idempotent in the general sense, since it potentially picks up new, unintended users when it runs. Since 22.2, this migration would fail to run on anything but an empty system.role_options table because it would attempt to put NULLs into a non-nullable column. This was all benign since the startupmigrations had protection in place, preventing them for running a 2nd time after a completed attempt. So, for upgraded cluster, the migration only when the first 20.2 node came up, and for new clusters it would be a no-op since the system.role_options table starts empty.

This migration became problematic in #91627 when I've turned the startupmigrations into upgrades. These upgrades run once when upgrading to 23.1; I'm relying on their idempotence.

Fixes #92230
Fixes #92371
Fixes #92569

Release note: None
Epic: None

Co-authored-by: Marius Posta <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
@craig craig bot closed this as completed in 56d404e Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-testeng TestEng Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants