Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvnemesis: TestKVNemesisSingleNode_ReproposalChaos failed #107860

Closed
cockroach-teamcity opened this issue Jul 30, 2023 · 10 comments · Fixed by #109610
Closed

kv/kvnemesis: TestKVNemesisSingleNode_ReproposalChaos failed #107860

cockroach-teamcity opened this issue Jul 30, 2023 · 10 comments · Fixed by #109610
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jul 30, 2023

kv/kvnemesis.TestKVNemesisSingleNode_ReproposalChaos failed with artifacts on master @ f295bd861a3a427652b19c2254d2401ebb4a3c8e:

=== RUN   TestKVNemesisSingleNode_ReproposalChaos
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisSingleNode_ReproposalChaos4009697673
    test_log_scope.go:81: use -show-logs to present logs inline
    kvnemesis_test.go:283: seed: 9086399838791617203
    test_server_shim.go:98: 
        Test server was configured to route SQL queries to a secondary tenant (virtual cluster).
        If you are only seeing a test failure when this message appears, there may be a problem
        specific to cluster virtualization or multi-tenancy.
        
        To investigate, consider using "COCKROACH_TEST_TENANT=true" to force-enable just
        the secondary tenant in all runs (or, alternatively, "false" to force-disable), or use
        "COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING=false" to disable all random test variables altogether.
*
* WARNING: test tenant requested by configuration, but code organization prevents start!
* OSS binaries do not include enterprise features
*
    kvnemesis_test.go:197: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3343639818
    kvnemesis.go:168: error applying x.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
          txn.SetIsoLevel(isolation.Snapshot)
          txn.Scan(ctx, tk(2337864312806892353), tk(18385962923748999403), 0) // failed indeterminate commit recovery: programming error: timestamp change by implicitly committed transaction: 1690710151.114247405,1->1690710151.113480788,0
          {
            b := &kv.Batch{}
            b.Del(tk(5292134198012247393) /* @s29 */) // omitted
            txn.Run(ctx, b) // omitted
          }
          {
            b := &kv.Batch{}
            b.Del(tk(13931089840599312697) /* @s30 */) // omitted
            b.Put(tk(15374219157100306618), sv(31)) // omitted
            txn.Run(ctx, b) // omitted
          }
          b := &kv.Batch{}
          txn.CommitInBatch(ctx, b) // omitted
          return nil
        }) // failed indeterminate commit recovery: programming error: timestamp change by implicitly committed transaction: 1690710151.114247405,1->1690710151.113480788,0: failed indeterminate commit recovery: programming error: timestamp change by implicitly committed transaction: 1690710151.114247405,1->1690710151.113480788,0
    kvnemesis.go:168: error applying x.Scan(ctx, tk(2337864312806892353), tk(18385962923748999403), 0) // failed indeterminate commit recovery: programming error: timestamp change by implicitly committed transaction: 1690710151.114247405,1->1690710151.113480788,0: failed indeterminate commit recovery: programming error: timestamp change by implicitly committed transaction: 1690710151.114247405,1->1690710151.113480788,0
    kvnemesis.go:203: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3343639818/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3343639818/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3343639818/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis3343639818/kvs-scan.txt
    kvnemesis_test.go:310: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:310
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:247
        	Error:      	Should be zero, but was 2
        	Test:       	TestKVNemesisSingleNode_ReproposalChaos
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisSingleNode_ReproposalChaos4009697673
--- FAIL: TestKVNemesisSingleNode_ReproposalChaos (70.38s)

Parameters: TAGS=bazel,gss , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-30226

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team labels Jul 30, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.2 milestone Jul 30, 2023
@nvanbenschoten
Copy link
Member

@AlexTalks assigning you to this, because I think this can be explained by #103817.

We could stress this test before and after your proposed fix to get a feel for whether the problem persists.

@pav-kv pav-kv added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv Anything in KV that doesn't belong in a more specific category. A-kv-transactions Relating to MVCC and the transactional model. and removed A-kv Anything in KV that doesn't belong in a more specific category. labels Aug 1, 2023
@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisSingleNode_ReproposalChaos failed with artifacts on master @ 8d2d2bf47cb3da1c27baa140a5afb5d28a5ed8f6:

=== RUN   TestKVNemesisSingleNode_ReproposalChaos
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisSingleNode_ReproposalChaos1978623890
    test_log_scope.go:81: use -show-logs to present logs inline
    kvnemesis_test.go:283: seed: 5371465003841823008
    test_server_shim.go:98: 
        Test server was configured to route SQL queries to a secondary tenant (virtual cluster).
        If you are only seeing a test failure when this message appears, there may be a problem
        specific to cluster virtualization or multi-tenancy.
        
        To investigate, consider using "COCKROACH_TEST_TENANT=true" to force-enable just
        the secondary tenant in all runs (or, alternatively, "false" to force-disable), or use
        "COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING=false" to disable all random test variables altogether.
*
* WARNING: test tenant requested by configuration, but code organization prevents start!
* OSS binaries do not include enterprise features
*
    kvnemesis_test.go:197: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1609272606
    kvnemesis.go:168: error applying x.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
          txn.SetIsoLevel(isolation.ReadCommitted)
          {
            b := &kv.Batch{}
            txn.Run(ctx, b) // <nil>
          }
          txn.ReverseScanForUpdate(ctx, tk(77778482246669777), tk(5588773285581782348), 0) // @1690970557.592814464,1 <nil>
          txn.Scan(ctx, tk(6041570819213447631), tk(15977815651010275910), 0) // failed indeterminate commit recovery: programming error: timestamp change by implicitly committed transaction: 1690970557.592814464,2->1690970557.565970637,2
          b := &kv.Batch{}
          txn.CommitInBatch(ctx, b) // omitted
          return nil
        }) // failed indeterminate commit recovery: programming error: timestamp change by implicitly committed transaction: 1690970557.592814464,2->1690970557.565970637,2: failed indeterminate commit recovery: programming error: timestamp change by implicitly committed transaction: 1690970557.592814464,2->1690970557.565970637,2
    kvnemesis.go:168: error applying x.Scan(ctx, tk(6041570819213447631), tk(15977815651010275910), 0) // failed indeterminate commit recovery: programming error: timestamp change by implicitly committed transaction: 1690970557.592814464,2->1690970557.565970637,2: failed indeterminate commit recovery: programming error: timestamp change by implicitly committed transaction: 1690970557.592814464,2->1690970557.565970637,2
    kvnemesis.go:203: failures(verbose): /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1609272606/failures
        repro steps: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1609272606/repro.go
        rangefeed KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1609272606/kvs-rangefeed.txt
        scan KVs: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis1609272606/kvs-scan.txt
    kvnemesis_test.go:310: 
        	Error Trace:	github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:310
        	            				github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:247
        	Error:      	Should be zero, but was 2
        	Test:       	TestKVNemesisSingleNode_ReproposalChaos
        	Messages:   	kvnemesis detected failures
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisSingleNode_ReproposalChaos1978623890
--- FAIL: TestKVNemesisSingleNode_ReproposalChaos (67.88s)

Parameters: TAGS=bazel,gss,deadlock , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@AlexTalks
Copy link
Contributor

AlexTalks commented Aug 9, 2023

I haven't been able to reproduce this yet despite ~150k runs in stress. Even tried stressing with the deadlock detector enabled since I saw one of the failures had TAGS=deadlock, but no luck - I may need to try machines with lower vCPUs.

Also, I'm not sure why there are no artifacts here -- @cockroachdb/dev-inf? It seems too early for them to have been cleaned up, which is especially unfortunate because the repro steps introduced by @tbg are not available. 😢

@AlexTalks
Copy link
Contributor

I noticed in the TeamCity config that the test was run with --test_env COCKROACH_KVNEMESIS_STEPS=1000 --test_sharding_strategy=disabled --jobs 4... the difference in steps from the default of 50 may make a large difference. I'll retry stressing this on 4 vCPU nodes with 1000 steps each.

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisSingleNode_ReproposalChaos failed with artifacts on master @ e0de1f12d4f496fc7e4050a33e96f8c635c7a27c:

Fatal error:

panic: use of Span after Finish. Span: /cockroach.roachpb.Internal/Batch. Finish previously called at: <stack not captured. Set debugUseAfterFinish> [recovered]
	panic: use of Span after Finish. Span: /cockroach.roachpb.Internal/Batch. Finish previously called at: <stack not captured. Set debugUseAfterFinish>

Stack:

goroutine 3021308 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).recover(0xc002478000?, {0x69b39e8, 0xc01169da70})
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:230 +0x6a
panic({0x4b59b00, 0xc011f013b0})
	GOROOT/src/runtime/panic.go:884 +0x212
github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).detectUseAfterFinish(0x69b39e8?)
	github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:182 +0x12e
github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).Tracer(...)
	github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:225
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaDecoder).createTracingSpans(0xc00a739658, {0x69b39e8, 0xc00e5c2ed0})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_decoder.go:154 +0x4f0
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicaDecoder).DecodeAndBind(0xc00a739200?, {0x69b39e8, 0xc00e5c2ed0}, {0xc0119648d0?, 0x5003755?, 0x10?})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_decoder.go:63 +0x66
github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply.(*Task).Decode(...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/apply/task.go:142
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(_, {_, _}, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:878 +0x73a
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady(_, {_, _}, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go:728 +0x1b0
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady(0xc0186ff500, 0xc00cb78f00?)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:689 +0x14b
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftSchedulerShard).worker(0xc005003500, {0x69b39e8, 0xc01169da70}, {0x6999690, 0xc0186ff500}, 0xc0186fea80)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:418 +0x195
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).Start.func2({0x69b39e8?, 0xc01169da70?})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go:321 +0x45
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x146
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:475 +0x43b
Log preceding fatal error

=== RUN   TestKVNemesisSingleNode_ReproposalChaos
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/logTestKVNemesisSingleNode_ReproposalChaos900707488
    test_log_scope.go:81: use -show-logs to present logs inline
    kvnemesis_test.go:283: seed: 5382072640056474552
    kvnemesis_test.go:197: kvnemesis logging to /artifacts/tmp/_tmp/1f42cf5be2fc021646bf9b2daf5eaef3/kvnemesis2911087260

Parameters: TAGS=bazel,gss,deadlock , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@erikgrinaker
Copy link
Contributor

Last failure is another instance of #107521, PR pending in #108775.

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisSingleNode_ReproposalChaos failed with artifacts on master @ a900aa218054812b782c0b3d130b25296c0d14e3:

        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:787
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).isKeyLockedByConflictingTxn
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:1812
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).add
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:1237
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).getOne
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:828
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).get
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:585
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.mvccGetWithValueHeader
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:1218
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.MVCCGetWithValueHeader
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:1143
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.MVCCGet
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:1110
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.Get
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/cmd_get.go:36
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateCommand
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_evaluate.go:484
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateBatch
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_evaluate.go:303
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeReadOnlyBatchWithServersideRefreshes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_read.go:458
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeReadOnlyBatch
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_read.go:125
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeBatchWithConcurrencyRetries
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:510
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).SendWithWriteBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:185
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).SendWithWriteBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_send.go:193
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).SendWithWriteBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:202
        	            	  | github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal
        	            	  | 	github.com/cockroachdb/cockroach/pkg/server/node.go:1336
        	            	  | github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch
        	            	  | 	github.com/cockroachdb/cockroach/pkg/server/node.go:1467
        	            	  | github.com/cockroachdb/cockroach/pkg/rpc.makeInternalClientAdapter.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:701
        	            	  | github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor.ServerInterceptor.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:97
        	            	  | github.com/cockroachdb/cockroach/pkg/rpc.bindUnaryServerInterceptorToHandler.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:812
        	            	  | github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func3
        	            	  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:169
        	            	Wraps: (2) panic applying step 
        	            	  | db0.GetForUpdateSkipLocked(ctx, tk(16811042229377009131)): only transactional requests can acquire exclusive locks
        	            	Error types: (1) *withstack.withStack (2) *errutil.leafError
    panic.go:522: -- test log scope end --
--- FAIL: TestKVNemesisSingleNode_ReproposalChaos (3.92s)

Parameters: TAGS=bazel,gss , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisSingleNode_ReproposalChaos failed with artifacts on master @ a900aa218054812b782c0b3d130b25296c0d14e3:

        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:787
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).isKeyLockedByConflictingTxn
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:1812
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).add
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:1237
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).getOne
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:828
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).scan
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:659
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.mvccScanToBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:3874
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.mvccScanToKvs
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:3927
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.MVCCScan
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:4150
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.ReverseScan
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/cmd_reverse_scan.go:82
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateCommand
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_evaluate.go:484
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateBatch
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_evaluate.go:303
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeReadOnlyBatchWithServersideRefreshes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_read.go:458
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeReadOnlyBatch
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_read.go:125
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeBatchWithConcurrencyRetries
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:510
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).SendWithWriteBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:185
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).SendWithWriteBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_send.go:193
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).SendWithWriteBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:202
        	            	  | github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal
        	            	  | 	github.com/cockroachdb/cockroach/pkg/server/node.go:1336
        	            	  | github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch
        	            	  | 	github.com/cockroachdb/cockroach/pkg/server/node.go:1467
        	            	  | github.com/cockroachdb/cockroach/pkg/rpc.makeInternalClientAdapter.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:701
        	            	  | github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor.ServerInterceptor.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:97
        	            	  | github.com/cockroachdb/cockroach/pkg/rpc.bindUnaryServerInterceptorToHandler.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:812
        	            	  | github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func3
        	            	  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:169
        	            	Wraps: (2) panic applying step 
        	            	  | db0.ReverseScanForUpdateSkipLocked(ctx, tk(11866519064140098950), tk(14356790331432790134), 0): only transactional requests can acquire exclusive locks
        	            	Error types: (1) *withstack.withStack (2) *errutil.leafError
    panic.go:522: -- test log scope end --
--- FAIL: TestKVNemesisSingleNode_ReproposalChaos (16.74s)

Parameters: TAGS=bazel,gss,deadlock , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

kv/kvnemesis.TestKVNemesisSingleNode_ReproposalChaos failed with artifacts on master @ ee2b0cc7c8c24a4c9d4a32514e1e917b5f139838:

        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).isKeyLockedByConflictingTxn
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:1812
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).add
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:1237
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).getOne
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:828
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).scan
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:659
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.mvccScanToBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:3874
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.mvccScanToKvs
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:3927
        	            	  | github.com/cockroachdb/cockroach/pkg/storage.MVCCScan
        	            	  | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:4150
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.Scan
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/cmd_scan.go:82
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateCommand
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_evaluate.go:484
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateBatch
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_evaluate.go:303
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeReadOnlyBatchWithServersideRefreshes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_read.go:458
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeReadOnlyBatch
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_read.go:125
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeBatchWithConcurrencyRetries
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:510
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).SendWithWriteBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_send.go:185
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).SendWithWriteBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_send.go:193
        	            	  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).SendWithWriteBytes
        	            	  | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:202
        	            	  | github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal
        	            	  | 	github.com/cockroachdb/cockroach/pkg/server/node.go:1336
        	            	  | github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch
        	            	  | 	github.com/cockroachdb/cockroach/pkg/server/node.go:1467
        	            	  | github.com/cockroachdb/cockroach/pkg/rpc.makeInternalClientAdapter.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:701
        	            	  | github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor.ServerInterceptor.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:97
        	            	  | github.com/cockroachdb/cockroach/pkg/rpc.bindUnaryServerInterceptorToHandler.func1
        	            	  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:812
        	            	  | github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func3
        	            	  | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:169
        	            	Wraps: (2) panic applying step 
        	            	  | db0.ScanForUpdateSkipLocked(ctx, tk(46074504227073648), tk(17589576382027448249), 0): only transactional requests can acquire exclusive locks
        	            	Error types: (1) *withstack.withStack (2) *errutil.leafError
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/4aeb9c412c16a003ed8ddc43788d2bba/logTestKVNemesisSingleNode_ReproposalChaos1204737346
--- FAIL: TestKVNemesisSingleNode_ReproposalChaos (3.24s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue Aug 28, 2023
109186: pkg/util/log: flush buffered network sinks on panic r=knz a=abarganier

Previously, our crash reporter system would flush file log sinks
as part of the process to handle a panic.

This was an incomplete process, since buffered network sinks were
not included in part of this flush process. This means that many
times, panic logs would not make it to the network target, leading
to a loss in observability.

This patch introduces `log.FlushAllSync()`, which flushes both file
and buffered network log sinks. It then updates the crash reporter
to call into this, instead of just flushing file log sinks.

`FlushAllSync()` contains timeout logic to prevent the process from
completing if one of the underlying child sinks that a bufferedSink
wraps becomes unavailable/hangs on its `output()` call.

We originally attempted to fix this in #101562, but a bug in the 
bufferedSink code led us to roll back those changes. The bug in the 
bufferedSink code has since been fixed (#108928), so we can safely 
introduce this logic again.

Release note: none

Fixes: #106345

109578: rpc: increase gRPC server timeout from 1x to 2x NetworkTimeout r=andrewbaptist a=erikgrinaker

This is intended as a conservative backport that changes as little as possible. For 23.2, we should restructure these settings a bit, possibly by removing NetworkTimeout and using independent timeouts for each component/parameter, since they have unique considerations (e.g. whether they are enforced above the Go runtime or by the OS, to what extent they are subject to RPC head-of-line blocking, etc).

---

This patch increases the gRPC server timeout from 1x to 2x NetworkTimeout. This timeout determines how long the server will wait for a TCP send to receive a TCP ack before automatically closing the connection. gRPC enforces this via the OS TCP stack by setting TCP_USER_TIMEOUT on the network socket.

While NetworkTimeout should be sufficient here, we have seen instances where this is affected by node load or other factors, so we set it to 2x NetworkTimeout to avoid spurious closed connections. An aggressive timeout is not particularly beneficial here, because the client-side timeout (in our case the CRDB RPC heartbeat) is what matters for recovery time following network or node outages -- the server side doesn't really care if the connection remains open for a bit longer.

Touches #109317.

Epic: none
Release note (ops change): The default gRPC server-side send timeout has been increased from 2 seconds to 4 seconds (1x to 2x of COCKROACH_NETWORK_TIMEOUT), to avoid spurious connection failures in certain scenarios. This can be controlled via the new environment variable COCKROACH_RPC_SERVER_TIMEOUT.

109610: kv: remove assertions around non-txn'al locking reqs r=nvanbenschoten a=nvanbenschoten

Closes #107860.
Closes #109222.
Closes #109581.
Closes #109582.

We might want to re-introduce these assertions in the future and reject these requests higher up the stack. For now, just remove them to deflake tests.

Release note: None

Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in e980fd1 Aug 28, 2023
@shralex shralex assigned nvanbenschoten and unassigned AlexTalks Nov 30, 2023
@github-project-automation github-project-automation bot moved this to Closed in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants