Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: multitenant-upgrade failed #133282

Closed
cockroach-teamcity opened this issue Oct 23, 2024 · 10 comments · Fixed by #136319
Closed

roachtest: multitenant-upgrade failed #133282

cockroach-teamcity opened this issue Oct 23, 2024 · 10 comments · Fixed by #136319
Assignees
Labels
branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-db-server

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Oct 23, 2024

roachtest.multitenant-upgrade failed with artifacts on release-24.3 @ f4bf28879d49979d438967379b44c2fe3464f968:

(mixedversion.go:732).Run: mixed-version test failure while running step 47 (run "run workload on tenants"): full command output in run_184450.815848678_n6_v23213cockroach-work.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/multitenant-upgrade/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-43533

@cockroach-teamcity cockroach-teamcity added branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery labels Oct 23, 2024
@msbutler msbutler removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Oct 24, 2024
@msbutler
Copy link
Collaborator

test infra bug:

Wraps: (6) Node 6. Command with error:
  | ```
  | v23.2.13/cockroach workload init tpcc --warehouses 10 {pgurl:6:tenant-b}
  | ```
  | stdout: <empty>
  | stderr:I241023 18:44:51.466567 1 workload/cli/run.go:639  [-] 1  random seed: 10051024827245800161
  | Error: pq: certificate authentication failed for user "roachprod"

@cockroach-teamcity
Copy link
Member Author

roachtest.multitenant-upgrade failed with artifacts on release-24.3 @ 047a8e99eb7a35e3f13ecf6b113f1965f206e3b9:

(mixedversion.go:732).Run: mixed-version test failure while running step 35 (run "run workload on tenants"): full command output in run_181822.089606910_n7_v23213cockroach-work.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/multitenant-upgrade/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@rimadeodhar
Copy link
Collaborator

Same error as before:

run_181822.089606910_n7_v23213cockroach-work: 2024/10/25 18:18:22 cluster.go:2473: > v23.2.13/cockroach workload init tpcc --warehouses 10 {pgurl:7,6:tenant-c}
I241025 18:18:22.753921 1 workload/cli/run.go:639  [-] 1  random seed: 5691668929565251295
Error: pq: certificate authentication failed for user "roachprod"
run_181822.089606910_n7_v23213cockroach-work: 2024/10/25 18:18:22 cluster.go:2486: > result: COMMAND_PROBLEM: exit status 1

@rimadeodhar rimadeodhar added the P-2 Issues/test failures with a fix SLA of 3 months label Oct 28, 2024
@cockroach-teamcity
Copy link
Member Author

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.multitenant-upgrade failed with artifacts on release-24.3 @ bbce415047c9896ee3b33b1eb4c06e3d2cab5bd6:

(mixedversion.go:732).Run: mixed-version test failure while running step 19 (run "run workload on tenants"): full command output in run_195619.143967933_n7_v23214cockroach-work.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/multitenant-upgrade/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_runtimeAssertionsBuild=true
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@rimadeodhar
Copy link
Collaborator

The latest error is due to:

COMMAND_PROBLEM: exit status 1
(1) test failed:
  | test random seed: -1842611663578615860 (use COCKROACH_RANDOM_SEED to reproduce)
  |
  |                       n1          n2          n3          n4
  | released versions     v24.1.6     v24.1.6     v24.1.6     v24.1.6
  | binary versions       24.1        24.1        24.1        24.1
  | cluster versions      24.1        24.1        24.1        24.1
Wraps: (2) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/mixedversion.(*testRunner).stepError
  | 	pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go:346
  | [...repeated from below...]
Wraps: (3) mixed-version test failure while running step 19 (run "run workload on tenants")
Wraps: (4) attached stack trace
  -- stack trace:
  | main.(*clusterImpl).RunE
  | 	pkg/cmd/roachtest/cluster.go:2495
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runMultitenantUpgrade.func1
  | 	pkg/cmd/roachtest/tests/multitenant_upgrade.go:128
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runMultitenantUpgrade.func6
  | 	pkg/cmd/roachtest/tests/multitenant_upgrade.go:225
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/mixedversion.runHookStep.Run
  | 	pkg/cmd/roachtest/roachtestutil/mixedversion/steps.go:407
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/mixedversion.(*testRunner).runSingleStep.func2
  | 	pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go:299
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/mixedversion.panicAsError
  | 	pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go:952
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/mixedversion.(*testRunner).runSingleStep
  | 	pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go:298
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/mixedversion.(*testRunner).runStep
  | 	pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go:273
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/mixedversion.(*testRunner).runStep
  | 	pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go:238
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/mixedversion.(*testRunner).runStep
  | 	pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go:238
  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/mixedversion.(*testRunner).run.func3
  | 	pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go:190
  | runtime.goexit
  | 	src/runtime/asm_amd64.s:1695
Wraps: (5) full command output in run_195619.143967933_n7_v23214cockroach-work.log
Wraps: (6) Node 7. Command with error:
  | ```
  | v23.2.14/cockroach workload init tpcc --warehouses 10 {pgurl:7,5-6:tenant-c}
  | ```
  | stdout: <empty>
  | stderr:I241103 19:56:20.633337 1 workload/cli/run.go:639  [-] 1  random seed: 7247546636526671819
  | Error: dial tcp 10.142.0.103:29004: connect: connection refused

@rimadeodhar
Copy link
Collaborator

rimadeodhar commented Nov 4, 2024

Looks like the connection was refused as the tenant upgrade to 24.1 failed as the node was still on 23.2:

cockroach start: Sun Nov  3 19:55:16 UTC 2024, logging to logs-tenant-c-0
*
* WARNING: Running a server without --sql-addr, with a combined RPC/SQL listener, is deprecated.
* This feature will be removed in a later version of CockroachDB.
*
ERROR: server startup failed: cockroach server exited with error: initializing cluster version: cannot upgrade to 24.1: node running 23.2
Failed running "mt start-sql"
cockroach exited with code 1: Sun Nov  3 19:55:16 UTC 2024

@cockroach-teamcity
Copy link
Member Author

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.multitenant-upgrade failed with artifacts on release-24.3 @ 8b93c1b0640d51d6fe64d67355d35c3da2980638:

(mixedversion.go:737).Run: mixed-version test failure while running step 33 (run "run workload on tenants"): full command output in run_135335.908389232_n5_v23215cockroach-work.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/multitenant-upgrade/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=system-only
  • mvtVersions=v22.2.19 → v23.1.28 → v23.2.15 → v24.1.6 → release-24.3
  • runtimeAssertionsBuild=true
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.multitenant-upgrade failed with artifacts on release-24.3 @ a8238bebce529c9fbd1c795a246fbcc7f96135e4:

(mixedversion.go:737).Run: mixed-version test failure while running step 24 (run "run workload on tenants"): full command output in run_131758.074878717_n7_v23216cockroach-work.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/multitenant-upgrade/run_1

Parameters:

  • arch=amd64
  • cloud=gce
  • coverageBuild=false
  • cpu=4
  • encrypted=false
  • fs=ext4
  • localSSD=true
  • mvtDeploymentMode=system-only
  • mvtVersions=v22.2.19 → v23.1.29 → v23.2.16 → v24.1.7 → release-24.3
  • runtimeAssertionsBuild=true
  • ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@rimadeodhar rimadeodhar self-assigned this Nov 25, 2024
@rimadeodhar
Copy link
Collaborator

Latest error is the same certificate authentication one:

Wraps: (6) Node 5. Command with error:
  | ```
  | v23.2.15/cockroach workload init tpcc --warehouses 10 {pgurl:5:tenant-c}
  | ```
  | stdout: <empty>
  | stderr:I241116 13:53:36.737967 1 workload/cli/run.go:639  [-] 1  random seed: 2451120498625645882
  | Error: pq: certificate authentication failed for user "roachprod"
Wraps: (7) COMMAND_PROBLEM
Wraps: (8) exit status 1
Error types: (1) *hintdetail.withDetail (2) *withstack.withStack (3) *errutil.withPrefix (4) *withstack.withStack (5) *errutil.withPrefix (6) *hintdetail.withDetail (7) errors.Cmd (8) *exec.ExitError

I'm looking into the test but I don't see anything obvious. I'm going to check with test-eng for pointers.

DarrylWong added a commit to DarrylWong/fork that referenced this issue Nov 27, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: cockroachdb#133282
Epic: none
Relese note: none
DarrylWong added a commit to DarrylWong/fork that referenced this issue Nov 27, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids of were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: cockroachdb#133282
Epic: none
Relese note: none
DarrylWong added a commit to DarrylWong/fork that referenced this issue Nov 27, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids of were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: cockroachdb#133282
Epic: none
Relese note: none
DarrylWong added a commit to DarrylWong/fork that referenced this issue Dec 2, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids of were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: cockroachdb#133282
Epic: none
Relese note: none
craig bot pushed a commit that referenced this issue Dec 2, 2024
136304: roachtest: remove duplicated if branch in connection latency r=srosenberg,herkolategan a=DarrylWong

This branch previously set up user certs or a user password depending on the mode of auth. However as of fd6d12c, the test uses the default user/certs created by the roachprod framework and the logic is no longer needed.

Fixes: #133301
Epic: none
Release note: none

136319: roachprod: add sufficient tenant ids when creating v22.2 client certs r=srosenberg,herkolategan a=DarrylWong

In v22.2, tenant ids must be specified when creating client certs. Previously, only a select number tenant ids of were specified. Those ids were chosen to match the hardcoded ids used by the old multitenant roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that creating tenants with ids not specified causes auth issues on clusters bootstrapped on 22.2. Since there should be no drawback to assigning more valid tenant ids than needed, we now add tenants 1 to 100. This should be more than enough for roachprod/roachtest.

Fixes: #133282
Epic: none
Relese note: none

136437: logictest: remove unnecessary flaky assertion from synthetic_privileges test r=rafiss a=rafiss

There's no need to read from the system table directly; the test checks what it needs by using the has_table_privilege function.

fixes #133912
fixes #136183
Release note: None

Co-authored-by: DarrylWong <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
@craig craig bot closed this as completed in 75758a6 Dec 2, 2024
Copy link

blathers-crl bot commented Dec 2, 2024

Based on the specified backports for linked PR #136319, I applied the following new label(s) to this issue: branch-release-24.1, branch-release-24.2. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 labels Dec 2, 2024
blathers-crl bot pushed a commit that referenced this issue Dec 2, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids of were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: #133282
Epic: none
Relese note: none
DarrylWong added a commit to DarrylWong/fork that referenced this issue Dec 3, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: cockroachdb#133282
Epic: none
Relese note: none
DarrylWong added a commit to DarrylWong/fork that referenced this issue Dec 3, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: cockroachdb#133282
Epic: none
Relese note: none
DarrylWong added a commit to DarrylWong/fork that referenced this issue Dec 3, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: cockroachdb#133282
Epic: none
Relese note: none
DarrylWong added a commit to DarrylWong/fork that referenced this issue Dec 3, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: cockroachdb#133282
Epic: none
Relese note: none
herkolategan pushed a commit to herkolategan/cockroach that referenced this issue Dec 4, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids of were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: cockroachdb#133282
Epic: none
Relese note: none
herkolategan pushed a commit to herkolategan/cockroach that referenced this issue Dec 4, 2024
In v22.2, tenant ids must be specified when creating client certs.
Previously, only a select number tenant ids of were specified. Those ids
were chosen to match the hardcoded ids used by the old multitenant
roachprod framework.

Now that the new mt framework assigns ids sequentially, we see that
creating tenants with ids not specified causes auth issues on clusters
bootstrapped on 22.2. Since there should be no drawback to assigning
more valid tenant ids than needed, we now add tenants 1 to 100. This
should be more than enough for roachprod/roachtest.

Fixes: cockroachdb#133282
Epic: none
Relese note: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-db-server
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants