Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: oom due to WorkloadKVConverter #68965

Closed
cockroach-teamcity opened this issue Aug 15, 2021 · 19 comments
Closed

roachtest: oom due to WorkloadKVConverter #68965

cockroach-teamcity opened this issue Aug 15, 2021 · 19 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-disaster-recovery

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Aug 15, 2021

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ ee3efd6b1e24a3e1676778f5028fa0a35266f683:

		  |      0          170.4          180.4     19.9     35.7     50.3     56.6 payment
		  |    94.0s        0           22.0           18.2     32.5     54.5     65.0     65.0 stockLevel
		  |    95.0s        0           15.0           18.3     67.1    109.1    113.2    113.2 delivery
		  |    95.0s        0          201.7          155.3     33.6     56.6     79.7     83.9 newOrder
		  |    95.0s        0           21.0           19.1      6.8      7.6      8.1      8.1 orderStatus
		  |    95.0s        0          207.7          180.6     18.9     31.5     44.0     50.3 payment
		  |    95.0s        0           17.0           18.1     31.5     50.3     60.8     60.8 stockLevel
		  |    96.0s        0           16.0           18.3     52.4     79.7    125.8    125.8 delivery
		  |    96.0s        0          198.2          155.8     29.4     44.0     79.7    104.9 newOrder
		  |    96.0s        0           23.0           19.2      7.3     10.0     10.5     10.5 orderStatus
		  |    96.0s        0          188.2          180.7     16.3     24.1     71.3     75.5 payment
		  |    96.0s        0           23.0           18.2     28.3     46.1     65.0     65.0 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |    97.0s        0           16.0           18.2     50.3     75.5     79.7     79.7 delivery
		  |    97.0s        0          214.1          156.4     30.4     44.0     54.5     58.7 newOrder
		  |    97.0s        0           18.0           19.2      7.1      9.4      9.4      9.4 orderStatus
		  |    97.0s        0          194.1          180.9     16.8     26.2     33.6     41.9 payment
		  |    97.0s        0           22.0           18.2     31.5     44.0     48.2     48.2 stockLevel
		  |    98.0s        0           18.0           18.2     56.6     67.1     67.1     67.1 delivery
		  |    98.0s        0          222.9          157.1     28.3     39.8     44.0     46.1 newOrder
		  |    98.0s        0           22.0           19.2      7.1     11.0     12.1     12.1 orderStatus
		  |    98.0s        0          188.9          180.9     16.3     22.0     24.1     26.2 payment
		  |    98.0s        0           15.0           18.2     27.3     46.1     50.3     50.3 stockLevel
		  |    99.0s        0           14.0           18.2     52.4     71.3     96.5     96.5 delivery
		  |    99.0s        0          201.0          157.5     29.4     41.9     60.8     67.1 newOrder
		  |    99.0s        0           14.0           19.1      6.0      8.4     12.1     12.1 orderStatus
		  |    99.0s        0          174.0          180.9     15.7     24.1     30.4     44.0 payment
		  |    99.0s        0           23.0           18.3     26.2     54.5     60.8     60.8 stockLevel
		  |   100.0s        0           18.0           18.2     56.6     75.5     88.1     88.1 delivery
		  |   100.0s        0          181.9          157.8     28.3     37.7     44.0     56.6 newOrder
		  |   100.0s        0           13.0           19.1      6.8      7.6      9.4      9.4 orderStatus
		  |   100.0s        0          193.9          181.0     15.2     23.1     28.3     44.0 payment
		  |   100.0s        0           14.0           18.2     24.1     35.7     37.7     37.7 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   101.0s        0           16.0           18.2     50.3     60.8     62.9     62.9 delivery
		  |   101.0s        0          183.1          158.0     28.3     39.8     52.4     60.8 newOrder
		  |   101.0s        0           23.0           19.1      7.9     12.1     16.8     16.8 orderStatus
		  |   101.0s        0          184.1          181.0     15.7     27.3     33.6     37.7 payment
		  |   101.0s        0           26.0           18.3     26.2     39.8     41.9     41.9 stockLevel
		  |   102.0s        0           26.0           18.2     52.4     96.5    113.2    113.2 delivery
		  |   102.0s        0          180.0          158.2     28.3     54.5     71.3     92.3 newOrder
		  |   102.0s        0           19.0           19.1      6.0      9.4     21.0     21.0 orderStatus
		  |   102.0s        0          202.0          181.2     16.3     31.5     46.1     52.4 payment
		  |   102.0s        0           12.0           18.2     26.2     39.8     50.3     50.3 stockLevel
		Wraps: (8) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (9) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString
Reproduce

See: roachtest README

See: CI job to stress roachtests

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^tpcc/mixed-headroom/n5cpu16$` * Parameters / `env.COUNT`: <number of runs>

Same failure on other branches

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-9386

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Aug 15, 2021
@tbg
Copy link
Member

tbg commented Aug 17, 2021

14:24:17 test_impl.go:323: test failure: versionupgrade.go:497,versionupgrade.go:213,tpcc.go:444,test_runner.go:777: pq: running migration for 21.1-130: running iterate callback: rpc error: code = Unknown desc = r293 was not found on s3

@irfansharif is it expected that this error would bubble up? It is expected that replicas could move around.

@irfansharif
Copy link
Contributor

running migration for 21.1-130

Looks like it's occurring for PostSeparatedIntentsMigration, introduced in #66445 (+cc @itsbilal).

Unwinding the error trace:

cockroach/pkg/kv/txn.go

Lines 598 to 600 in 2556ac6

if err := f(rows); err != nil {
return errors.Wrap(err, "running iterate callback")
}

Where f is:

return txn.Iterate(ctx, keys.MetaMin, keys.MetaMax, blockSize,
func(rows []kv.KeyValue) error {

Where the closure invoked is:

// Invoke fn with the current chunk (of size ~blockSize) of
// range descriptors.
return fn(descriptors...)

Coming from (where the RangeNotFound error is generated):

if err := deps.DB.Migrate(ctx, start, end, cv.Version); err != nil {
return err
}


I'm not actually sure at what level this error should be handled. The DB.Migrate command is addressing a keyrange, yet it's encountering a RangeNotFound error (which I was actually surprised by -- should I be?). Should we handle this error in the migration itself or is this something the migration infrastructure should know to expect and automatically re-run the closure?

@tbg tbg removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Aug 20, 2021
@nvanbenschoten
Copy link
Member

yet it's encountering a RangeNotFound error (which I was actually surprised by -- should I be?)

I'm also surprised by this. A RangeNotFound error should be retried in the DistSender, here:

case *roachpb.RangeNotFoundError:
// The store we routed to doesn't have this replica. This can happen when
// our descriptor is outright outdated, but it can also be caused by a
// replica that has just been added but needs a snapshot to be caught up.
//
// We'll try other replicas which typically gives us the leaseholder, either
// via the NotLeaseHolderError or nil error paths, both of which update the
// leaseholder in the range cache.

We're not addressing the Migrate request to a specific range or anything, and we should be routing the Migrate request through a DistSender, so how is it bubbling all the way back up to here?

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@tbg
Copy link
Member

tbg commented Nov 5, 2021

Recent failures were fixed by #72432

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 40f11fead0a0453969634f8ddb0502c1f78b2806:

	monitor.go:128,versionupgrade.go:692,versionupgrade.go:213,tpcc.go:413,test_runner.go:779: monitor failure: monitor task failed: output in run_040850.718263590_n1_v2120cockroach_workload_fixtures_import_bank: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3804493-1637959164-89-n5cpu16:1 -- v21.2.0/cockroach workload fixtures import bank --payload-bytes=10240 --rows=32552083 --seed=4 --db=bigbank returned: exit status 20
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.importLargeBankStep.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:692
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:213
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:413
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:172
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.(*clusterImpl).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2054
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.importLargeBankStep.func1.1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:689
		  | main.(*monitorImpl).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:106
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (6) output in run_040850.718263590_n1_v2120cockroach_workload_fixtures_import_bank
		Wraps: (7) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3804493-1637959164-89-n5cpu16:1 -- v21.2.0/cockroach workload fixtures import bank --payload-bytes=10240 --rows=32552083 --seed=4 --db=bigbank returned
		  | stderr:
		  | I211127 04:08:52.479710 1 ccl/workloadccl/fixture.go:345  [-] 1  starting import of 1 tables
		  | Error: importing fixture: importing table bank: dial tcp 127.0.0.1:26257: connect: connection refused
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | v21.2.0/cockroach workload fixtures import bank --payload-bytes=10240 --rows=32552083 --seed=4 --db=bigbank
		  |   | ``````
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (8) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *exec.ExitError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ b450fea83a7db1e06403b2563c13f38c9284b932:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpcc/mixed-headroom/n5cpu16/run_1
	monitor.go:128,versionupgrade.go:692,versionupgrade.go:213,tpcc.go:413,test_runner.go:779: monitor failure: unexpected node event: 3: dead (exit status 7)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.importLargeBankStep.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:692
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:213
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:413
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 3: dead (exit status 7)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 3b30a0e12f9a14b08ee8ad55b50299aca50c67a2:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpcc/mixed-headroom/n5cpu16/run_1
	monitor.go:128,versionupgrade.go:692,versionupgrade.go:213,tpcc.go:413,test_runner.go:779: monitor failure: unexpected node event: 4: dead (exit status 1)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.importLargeBankStep.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:692
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:213
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:413
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 4: dead (exit status 1)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 2c014c47c1a242f504f6d595bfd79c0edc20b90a:

	monitor.go:128,versionupgrade.go:692,versionupgrade.go:213,tpcc.go:413,test_runner.go:779: monitor failure: monitor task failed: output in run_152809.379964125_n1_v2120cockroach_workload_fixtures_import_bank: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3810278-1638169717-89-n5cpu16:1 -- v21.2.0/cockroach workload fixtures import bank --payload-bytes=10240 --rows=32552083 --seed=4 --db=bigbank returned: exit status 20
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.importLargeBankStep.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:692
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:213
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:413
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:172
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.(*clusterImpl).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2054
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.importLargeBankStep.func1.1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:689
		  | main.(*monitorImpl).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:106
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (6) output in run_152809.379964125_n1_v2120cockroach_workload_fixtures_import_bank
		Wraps: (7) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3810278-1638169717-89-n5cpu16:1 -- v21.2.0/cockroach workload fixtures import bank --payload-bytes=10240 --rows=32552083 --seed=4 --db=bigbank returned
		  | stderr:
		  | I211129 15:28:11.006963 1 ccl/workloadccl/fixture.go:345  [-] 1  starting import of 1 tables
		  | Error: importing fixture: importing table bank: dial tcp 127.0.0.1:26257: connect: connection refused
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | v21.2.0/cockroach workload fixtures import bank --payload-bytes=10240 --rows=32552083 --seed=4 --db=bigbank
		  |   | ``````
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (8) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) *exec.ExitError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@AlexTalks
Copy link
Contributor

Recent failures seem due to the GCE disk space issue mentioned in #73204, #73205, and #73222.

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ e89328d92398a3e2d6487179845a51e7f1caa435:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpcc/mixed-headroom/n5cpu16/run_1
	monitor.go:128,versionupgrade.go:686,versionupgrade.go:207,tpcc.go:413,test_runner.go:779: monitor failure: unexpected node event: 2: dead (exit status 137)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | main.(*monitorImpl).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.importLargeBankStep.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:686
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:207
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:413
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 2: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@tbg
Copy link
Member

tbg commented Dec 9, 2021

Hmm, this isn't good, n2 got oom killed

[ 2181.064172] Out of memory: Killed process 14460 (cockroach) total-vm:19115156kB, anon-rss:10974264kB, file-rss:38736kB, shmem-rss:0kB, UID:1000 pgtables:35412kB oom_score_adj:0
[ 2181.414749] oom_reaper: reaped process 14460 (cockroach), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

@AlexTalks could you check if the heap profiles tell us anything?

Note that this cluster was running 21.2 (not upgraded to master yet), so there wasn't a ./cockroach binary and that's why some of the usual artifacts are missing. But we have the log directories which contains the heap profiles.

@tbg
Copy link
Member

tbg commented Dec 15, 2021

https://share.polarsignals.com/a059987/

inuse_space:

image

We see that the hog here is *importccl.WorkloadKVConverter

@tbg tbg changed the title roachtest: tpcc/mixed-headroom/n5cpu16 failed roachtest: oom due to WorkloadKVConverter Dec 15, 2021
@blathers-crl
Copy link

blathers-crl bot commented Dec 15, 2021

cc @cockroachdb/bulk-io

@tbg
Copy link
Member

tbg commented Dec 15, 2021

Changed the title so that future test failures aren't directed at Bulk I/O, this is the first time I've seen this oom and I don't think it will a failure mode exclusive to this test.

@dt
Copy link
Member

dt commented Oct 26, 2022

haven't heard anyone complain about this lately to going to close this old DR issue.

Any new workload improvement work would likely go to... sql-exp? test-eng?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-disaster-recovery
Projects
No open projects
Archived in project
Development

No branches or pull requests

6 participants