kv/kvserver: TestSystemZoneConfigs failed #98200

cockroach-teamcity · 2023-03-08T06:22:21Z

kv/kvserver.TestSystemZoneConfigs failed with artifacts on master @ 1b162d1b274eec7b307fbbfca7294460bfdef025:

=== RUN   TestSystemZoneConfigs
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/9ded2e1b6e3cd0145900033ce06c11b0/logTestSystemZoneConfigs2842108216
    test_log_scope.go:79: use -show-logs to present logs inline
    client_replica_test.go:2768: condition failed to evaluate within 45s: got 288 voters, want 290; details: map[1:r1:/{Min-System/NodeLiveness} [(n1,s1):1, (n4,s4):2, (n6,s6):3, (n2,s2):4, (n7,s7):5, next=6, gen=8] 2:r2:/System/NodeLiveness{-Max} [(n1,s1):1, (n3,s3):2, (n4,s4):3, (n6,s6):4, (n7,s7):5, next=6, gen=8] 3:r3:/System/{NodeLivenessMax-tsd} [(n1,s1):1, (n6,s6):2, (n7,s7):3, (n3,s3):4, (n5,s5):5, next=6, gen=8] 4:r4:/System{/tsd-tse} [(n3,s3):4, (n6,s6):2, (n2,s2):3, next=5, gen=8] 5:r5:/{Systemtse-Table/0} [(n1,s1):1, (n2,s2):2, (n5,s5):3, (n3,s3):4, (n6,s6):5, next=6, gen=8] 6:r6:/Table/{0-3} [(n1,s1):1, (n7,s7):2, (n5,s5):3, (n6,s6):4, (n2,s2):5, next=6, gen=8] 7:r7:/Table/{3-4} [(n1,s1):1, (n5,s5):2, (n2,s2):3, (n7,s7):4, (n4,s4):5, next=6, gen=8] 8:r8:/Table/{4-5} [(n1,s1):1, (n5,s5):2, (n3,s3):3, (n4,s4):4, (n2,s2):5, next=6, gen=8] 9:r9:/Table/{5-6} [(n2,s2):6, (n3,s3):2, (n5,s5):3, (n4,s4):4, (n6,s6):5, next=7, gen=12] 10:r10:/Table/{6-7} [(n1,s1):1, (n5,s5):2, (n7,s7):3, (n4,s4):4, (n2,s2):5, next=6, gen=8] 11:r11:/Table/{7-8} [(n1,s1):1, (n6,s6):2, (n4,s4):3, (n3,s3):4, (n2,s2):5, next=6, gen=8] 12:r12:/Table/{8-11} [(n1,s1):1, (n5,s5):2, (n2,s2):3, (n3,s3):4, (n4,s4):5, next=6, gen=8] 13:r13:/Table/1{1-2} [(n1,s1):1, (n6,s6):2, (n3,s3):3, (n4,s4):4, (n7,s7):5, next=6, gen=8] 14:r14:/Table/1{2-3} [(n1,s1):1, (n7,s7):2, (n4,s4):3, (n2,s2):4, (n3,s3):5, next=6, gen=8] 15:r15:/Table/1{3-4} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n3,s3):4, (n7,s7):5, next=6, gen=8] 16:r16:/Table/1{4-5} [(n5,s5):6, (n3,s3):2, (n4,s4):3, (n7,s7):4, (n2,s2):5, next=7, gen=12] 17:r17:/Table/1{5-6} [(n1,s1):1, (n5,s5):2, (n3,s3):3, (n6,s6):4, (n7,s7):5, next=6, gen=8] 18:r18:/Table/1{6-7} [(n3,s3):6, (n4,s4):2, (n5,s5):3, (n6,s6):4, (n7,s7):5, next=7, gen=12] 19:r19:/Table/1{7-8} [(n1,s1):1, (n4,s4):2, (n6,s6):3, (n2,s2):4, (n5,s5):5, next=6, gen=8] 20:r20:/Table/1{8-9} [(n1,s1):1, (n7,s7):2, (n2,s2):3, (n5,s5):4, (n6,s6):5, next=6, gen=8] 21:r21:/Table/{19-20} [(n1,s1):1, (n7,s7):2, (n2,s2):3, (n6,s6):4, (n5,s5):5, next=6, gen=8] 22:r22:/Table/2{0-1} [(n1,s1):1, (n7,s7):2, (n6,s6):3, (n4,s4):4, (n5,s5):5, next=6, gen=8] 23:r23:/Table/2{1-2} [(n1,s1):1, (n7,s7):2, (n6,s6):3, (n3,s3):4, (n4,s4):5, next=6, gen=8] 24:r24:/Table/2{2-3} [(n1,s1):1, (n7,s7):2, (n6,s6):3, (n5,s5):4, (n3,s3):5, next=6, gen=8] 25:r25:/Table/2{3-4} [(n4,s4):6, (n2,s2):2, (n3,s3):3, (n7,s7):4, (n6,s6):5, next=7, gen=12] 26:r26:/Table/2{4-5} [(n5,s5):6, (n2,s2):2, (n4,s4):3, (n3,s3):4, (n7,s7):5, next=7, gen=12] 27:r27:/Table/2{5-6} [(n2,s2):6, (n4,s4):2, (n7,s7):3, (n5,s5):4, (n3,s3):5, next=7, gen=12] 28:r28:/Table/2{6-7} [(n1,s1):1, (n4,s4):2, (n2,s2):3, (n5,s5):4, (n3,s3):5, next=6, gen=8] 29:r29:/Table/2{7-8} [(n4,s4):6, (n3,s3):2, (n2,s2):3, (n7,s7):4, (n6,s6):5, next=7, gen=12] 30:r30:/Table/2{8-9} [(n1,s1):1, (n4,s4):2, (n5,s5):3, (n2,s2):4, (n7,s7):5, next=6, gen=8] 31:r31:/{Table/29-NamespaceTable/30} [(n1,s1):1, (n7,s7):2, (n4,s4):3, (n3,s3):4, (n6,s6):5, next=6, gen=8] 32:r32:/NamespaceTable/{30-Max} [(n1,s1):1, (n5,s5):2, (n4,s4):3, (n2,s2):4, (n6,s6):5, next=6, gen=8] 33:r33:/{NamespaceTable/Max-Table/32} [(n1,s1):1, (n5,s5):2, (n3,s3):3, (n7,s7):4, (n4,s4):5, next=6, gen=8] 34:r34:/Table/3{2-3} [(n1,s1):1, (n6,s6):2, (n7,s7):3, (n3,s3):4, (n2,s2):5, next=6, gen=8] 35:r35:/Table/3{3-4} [(n6,s6):6, (n4,s4):2, (n3,s3):3, (n5,s5):4, (n7,s7):5, next=7, gen=12] 36:r36:/Table/3{4-5} [(n1,s1):1, (n3,s3):2, (n5,s5):3, (n4,s4):4, (n2,s2):5, next=6, gen=8] 37:r37:/Table/3{5-6} [(n1,s1):1, (n3,s3):2, (n7,s7):3, (n2,s2):4, (n4,s4):5, next=6, gen=8] 38:r38:/Table/3{6-7} [(n1,s1):1, (n6,s6):2, (n5,s5):3, (n2,s2):4, (n3,s3):5, next=6, gen=8] 39:r39:/Table/3{7-8} [(n1,s1):1, (n3,s3):2, (n7,s7):3, (n5,s5):4, (n6,s6):5, next=6, gen=8] 40:r40:/Table/3{8-9} [(n6,s6):6, (n4,s4):2, (n2,s2):3, (n7,s7):4, (n5,s5):5, next=7, gen=12] 41:r41:/Table/{39-40} [(n1,s1):1, (n6,s6):2, (n7,s7):3, (n5,s5):4, (n4,s4):5, next=6, gen=8] 42:r42:/Table/4{0-1} [(n1,s1):1, (n6,s6):2, (n7,s7):3, (n5,s5):4, (n3,s3):5, next=6, gen=8] 43:r43:/Table/4{1-2} [(n3,s3):6, (n2,s2):2, (n7,s7):3, (n4,s4):4, (n6,s6):5, next=7, gen=12] 44:r44:/Table/4{2-3} [(n1,s1):1, (n6,s6):2, (n3,s3):3, (n2,s2):4, (n5,s5):5, next=6, gen=8] 45:r45:/Table/4{3-4} [(n1,s1):1, (n5,s5):2, (n3,s3):3, (n6,s6):4, (n4,s4):5, next=6, gen=8] 46:r46:/Table/4{4-5} [(n2,s2):6, (n3,s3):2, (n6,s6):3, (n4,s4):4, (n5,s5):5, next=7, gen=12] 47:r47:/Table/4{5-6} [(n1,s1):1, (n2,s2):2, (n6,s6):3, (n7,s7):4, (n5,s5):5, next=6, gen=8] 48:r48:/Table/4{6-7} [(n1,s1):1, (n5,s5):2, (n2,s2):3, (n6,s6):4, (n4,s4):5, next=6, gen=8] 49:r49:/Table/4{7-8} [(n1,s1):1, (n5,s5):2, (n2,s2):3, (n4,s4):4, (n3,s3):5, next=6, gen=8] 50:r50:/Table/{48-50} [(n1,s1):1, (n7,s7):2, (n6,s6):3, (n5,s5):4, (n2,s2):5, next=6, gen=8] 51:r51:/Table/5{0-1} [(n4,s4):6, (n2,s2):2, (n6,s6):3, (n3,s3):4, (n7,s7):5, next=7, gen=12] 52:r52:/Table/5{1-2} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n6,s6):4, (n7,s7):5, next=6, gen=8] 53:r53:/Table/5{2-3} [(n1,s1):1, (n6,s6):2, (n5,s5):3, (n7,s7):4, (n3,s3):5, next=6, gen=8] 54:r54:/Table/5{3-4} [(n6,s6):6, (n2,s2):2, (n7,s7):3, (n5,s5):4, (n4,s4):5, next=7, gen=12] 55:r55:/Table/5{4-5} [(n7,s7):6, (n4,s4):2, (n5,s5):3, (n2,s2):4, (n3,s3):5, next=7, gen=12] 56:r56:/Table/5{5-6} [(n1,s1):1, (n3,s3):2, (n4,s4):3, (n5,s5):4, (n7,s7):5, next=6, gen=8] 57:r57:/Table/5{6-7} [(n1,s1):1, (n3,s3):2, (n2,s2):3, (n4,s4):4, (n6,s6):5, next=6, gen=8] 58:r58:/{Table/57-Max} [(n1,s1):1, (n3,s3):2, (n6,s6):3, (n2,s2):4, (n5,s5):5, next=6, gen=8]]
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/9ded2e1b6e3cd0145900033ce06c11b0/logTestSystemZoneConfigs2842108216
--- FAIL: TestSystemZoneConfigs (56.08s)

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv _{This test on roachdash | Improve this report!

Jira issue: CRDB-25127}

The text was updated successfully, but these errors were encountered:

irfansharif · 2023-03-17T17:01:10Z

cockroach/pkg/kv/kvserver/client_replica_test.go

Lines 2703 to 2705 in 40cb075

    
           // This test was written for the gossip-backed SystemConfigSpan 
        
           // infrastructure. 
        
           DisableSpanConfigs: true,

This test doesn't use span configs, and can probably just get nuked altogether.

cockroach-teamcity · 2023-03-21T13:12:24Z

kv/kvserver.TestSystemZoneConfigs failed with artifacts on master @ 1e9899da8cb250a4e560b280beb2b0805ee75a78:

=== RUN   TestSystemZoneConfigs
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/15bcc1d4cfe2c01885bd956ef4f14c71/logTestSystemZoneConfigs589320229
    test_log_scope.go:79: use -show-logs to present logs inline
    client_replica_test.go:2777: condition failed to evaluate within 45s: got 300 voters, want 298; details: map[1:r1:/{Min-System/NodeLiveness} [(n1,s1):1, (n3,s3):2, (n2,s2):3, (n5,s5):4, (n7,s7):5, (n4,s4):6, (n6,s6):7, next=8, gen=12] 2:r2:/System/NodeLiveness{-Max} [(n1,s1):1, (n6,s6):2, (n3,s3):3, (n2,s2):4, (n4,s4):5, next=6, gen=8] 3:r3:/System/{NodeLivenessMax-tsd} [(n1,s1):1, (n6,s6):2, (n7,s7):3, (n3,s3):4, (n5,s5):5, next=6, gen=8] 4:r4:/System{/tsd-tse} [(n4,s4):4, (n3,s3):2, (n7,s7):3, next=5, gen=8] 5:r5:/{Systemtse-Table/0} [(n6,s6):6, (n3,s3):2, (n7,s7):3, (n5,s5):4, (n2,s2):5, next=7, gen=12] 6:r6:/Table/{0-3} [(n3,s3):6, (n4,s4):2, (n6,s6):3, (n7,s7):4, (n2,s2):5, next=7, gen=12] 7:r7:/Table/{3-4} [(n1,s1):1, (n2,s2):2, (n6,s6):3, (n4,s4):4, (n7,s7):5, next=6, gen=8] 8:r8:/Table/{4-5} [(n1,s1):1, (n4,s4):2, (n7,s7):3, (n5,s5):4, (n3,s3):5, next=6, gen=8] 9:r9:/Table/{5-6} [(n6,s6):6, (n3,s3):2, (n5,s5):3, (n2,s2):4, (n4,s4):5, next=7, gen=12] 10:r10:/Table/{6-7} [(n1,s1):1, (n6,s6):2, (n5,s5):3, (n2,s2):4, (n4,s4):5, next=6, gen=8] 11:r11:/Table/{7-8} [(n1,s1):1, (n7,s7):2, (n6,s6):3, (n4,s4):4, (n5,s5):5, next=6, gen=8] 12:r12:/Table/{8-11} [(n1,s1):1, (n5,s5):2, (n7,s7):3, (n4,s4):4, (n6,s6):5, next=6, gen=8] 13:r13:/Table/1{1-2} [(n5,s5):6, (n2,s2):2, (n6,s6):3, (n7,s7):4, (n3,s3):5, next=7, gen=12] 14:r14:/Table/1{2-3} [(n6,s6):6, (n3,s3):2, (n7,s7):3, (n4,s4):4, (n5,s5):5, next=7, gen=12] 15:r15:/Table/1{3-4} [(n1,s1):1, (n7,s7):2, (n4,s4):3, (n5,s5):4, (n6,s6):5, next=6, gen=8] 16:r16:/Table/1{4-5} [(n1,s1):1, (n4,s4):2, (n6,s6):3, (n7,s7):4, (n2,s2):5, next=6, gen=8] 17:r17:/Table/1{5-6} [(n7,s7):6, (n4,s4):2, (n2,s2):3, (n6,s6):4, (n5,s5):5, next=7, gen=12] 18:r18:/Table/1{6-7} [(n1,s1):1, (n4,s4):2, (n5,s5):3, (n6,s6):4, (n3,s3):5, next=6, gen=8] 19:r19:/Table/1{7-8} [(n1,s1):1, (n7,s7):2, (n4,s4):3, (n5,s5):4, (n2,s2):5, next=6, gen=8] 20:r20:/Table/1{8-9} [(n1,s1):1, (n4,s4):2, (n2,s2):3, (n7,s7):4, (n6,s6):5, next=6, gen=8] 21:r21:/Table/{19-20} [(n6,s6):6, (n2,s2):2, (n4,s4):3, (n3,s3):4, (n7,s7):5, next=7, gen=12] 22:r22:/Table/2{0-1} [(n1,s1):1, (n5,s5):2, (n3,s3):3, (n4,s4):4, (n6,s6):5, next=6, gen=8] 23:r23:/Table/2{1-2} [(n1,s1):1, (n5,s5):2, (n3,s3):3, (n7,s7):4, (n4,s4):5, next=6, gen=8] 24:r24:/Table/2{2-3} [(n1,s1):1, (n6,s6):2, (n5,s5):3, (n7,s7):4, (n2,s2):5, next=6, gen=8] 25:r25:/Table/2{3-4} [(n1,s1):1, (n2,s2):2, (n3,s3):3, (n5,s5):4, (n4,s4):5, next=6, gen=8] 26:r26:/Table/2{4-5} [(n1,s1):1, (n2,s2):2, (n3,s3):3, (n6,s6):4, (n4,s4):5, next=6, gen=8] 27:r27:/Table/2{5-6} [(n4,s4):6, (n2,s2):2, (n5,s5):3, (n3,s3):4, (n7,s7):5, next=7, gen=12] 28:r28:/Table/2{6-7} [(n1,s1):1, (n6,s6):2, (n7,s7):3, (n4,s4):4, (n3,s3):5, next=6, gen=8] 29:r29:/Table/2{7-8} [(n1,s1):1, (n4,s4):2, (n2,s2):3, (n6,s6):4, (n3,s3):5, next=6, gen=8] 30:r30:/Table/2{8-9} [(n1,s1):1, (n5,s5):2, (n4,s4):3, (n2,s2):4, (n6,s6):5, next=6, gen=8] 31:r31:/{Table/29-NamespaceTable/30} [(n1,s1):1, (n5,s5):2, (n3,s3):3, (n4,s4):4, (n2,s2):5, next=6, gen=8] 32:r32:/NamespaceTable/{30-Max} [(n5,s5):6, (n3,s3):2, (n7,s7):3, (n2,s2):4, (n4,s4):5, next=7, gen=12] 33:r33:/{NamespaceTable/Max-Table/32} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n6,s6):4, (n3,s3):5, next=6, gen=8] 34:r34:/Table/3{2-3} [(n1,s1):1, (n7,s7):2, (n2,s2):3, (n3,s3):4, (n5,s5):5, next=6, gen=8] 35:r35:/Table/3{3-4} [(n1,s1):1, (n2,s2):2, (n3,s3):3, (n6,s6):4, (n5,s5):5, next=6, gen=8] 36:r36:/Table/3{4-5} [(n7,s7):6, (n4,s4):2, (n2,s2):3, (n5,s5):4, (n3,s3):5, next=7, gen=12] 37:r37:/Table/3{5-6} [(n1,s1):1, (n5,s5):2, (n4,s4):3, (n7,s7):4, (n3,s3):5, next=6, gen=8] 38:r38:/Table/3{6-7} [(n1,s1):1, (n6,s6):2, (n2,s2):3, (n4,s4):4, (n5,s5):5, next=6, gen=8] 39:r39:/Table/3{7-8} [(n1,s1):1, (n6,s6):2, (n5,s5):3, (n7,s7):4, (n3,s3):5, next=6, gen=8] 40:r40:/Table/3{8-9} [(n4,s4):6, (n5,s5):2, (n3,s3):3, (n2,s2):4, (n6,s6):5, next=7, gen=12] 41:r41:/Table/{39-40} [(n5,s5):6, (n2,s2):2, (n7,s7):3, (n6,s6):4, (n3,s3):5, next=7, gen=12] 42:r42:/Table/4{0-1} [(n1,s1):1, (n3,s3):2, (n6,s6):3, (n7,s7):4, (n2,s2):5, next=6, gen=8] 43:r43:/Table/4{1-2} [(n1,s1):1, (n6,s6):2, (n7,s7):3, (n2,s2):4, (n3,s3):5, next=6, gen=8] 44:r44:/Table/4{2-3} [(n3,s3):6, (n7,s7):2, (n5,s5):3, (n6,s6):4, (n4,s4):5, next=7, gen=12] 45:r45:/Table/4{3-4} [(n1,s1):1, (n2,s2):2, (n5,s5):3, (n3,s3):4, (n4,s4):5, next=6, gen=8] 46:r46:/Table/4{4-5} [(n1,s1):1, (n5,s5):2, (n6,s6):3, (n2,s2):4, (n7,s7):5, next=6, gen=8] 47:r47:/Table/4{5-6} [(n7,s7):6, (n4,s4):2, (n3,s3):3, (n5,s5):4, (n2,s2):5, next=7, gen=12] 48:r48:/Table/4{6-7} [(n1,s1):1, (n7,s7):2, (n3,s3):3, (n4,s4):4, (n2,s2):5, next=6, gen=8] 49:r49:/Table/4{7-8} [(n1,s1):1, (n5,s5):2, (n6,s6):3, (n3,s3):4, (n7,s7):5, next=6, gen=8] 50:r50:/Table/{48-50} [(n1,s1):1, (n6,s6):2, (n4,s4):3, (n5,s5):4, (n7,s7):5, next=6, gen=8] 51:r51:/Table/5{0-1} [(n1,s1):1, (n4,s4):2, (n2,s2):3, (n5,s5):4, (n6,s6):5, next=6, gen=8] 52:r52:/Table/5{1-2} [(n1,s1):1, (n7,s7):2, (n3,s3):3, (n6,s6):4, (n2,s2):5, next=6, gen=8] 53:r53:/Table/5{2-3} [(n1,s1):1, (n5,s5):2, (n6,s6):3, (n3,s3):4, (n7,s7):5, next=6, gen=8] 54:r54:/Table/5{3-4} [(n2,s2):6, (n4,s4):2, (n7,s7):3, (n6,s6):4, (n5,s5):5, next=7, gen=12] 55:r55:/Table/5{4-5} [(n1,s1):1, (n3,s3):2, (n6,s6):3, (n7,s7):4, (n2,s2):6, next=7, gen=12] 56:r56:/Table/5{5-6} [(n1,s1):1, (n6,s6):2, (n7,s7):3, (n2,s2):4, (n3,s3):5, next=6, gen=8] 57:r57:/Table/5{6-7} [(n1,s1):1, (n5,s5):2, (n2,s2):3, (n4,s4):4, (n7,s7):5, next=6, gen=8] 58:r58:/Table/5{7-8} [(n1,s1):1, (n2,s2):2, (n4,s4):3, (n6,s6):4, (n5,s5):5, next=6, gen=8] 59:r59:/Table/5{8-9} [(n1,s1):1, (n7,s7):2, (n4,s4):3, (n3,s3):4, (n6,s6):5, next=6, gen=8] 60:r60:/{Table/59-Max} [(n1,s1):1, (n3,s3):2, (n5,s5):3, (n2,s2):4, (n7,s7):5, next=6, gen=8]]
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/15bcc1d4cfe2c01885bd956ef4f14c71/logTestSystemZoneConfigs589320229
--- FAIL: TestSystemZoneConfigs (62.23s)

Help

See also: How To Investigate a Go Test Failure (internal)

_{This test on roachdash | Improve this report!}

nvanbenschoten · 2023-03-22T14:35:43Z

@irfansharif, what do we want to do about this? Delete the test? Do we understand why it's suddenly flaked twice in the past month?

irfansharif · 2023-03-22T14:49:07Z

I have a branch that's trying to update this test to use span configs. I don't know why it's flaked -- this test is notoriously slow to run (it's disabled under stress/race), and I've not got a repro of the original failure. I'll try to push something out soon.

irfansharif · 2023-03-29T14:57:23Z

I think we can get rid of the GA-blocker label since the test isn’t using span configs. The flake I think was because of initial split raceyness, span configs even without range coalescing had a few different split points compared to the system config span. And that’s what we were seeing, which this test wasn’t taught to expect. I'll try again today to get this test fixed.

Fixes cockroachdb#98200. This test was written pre-spanconfig days, and when enabling spanconfigs by default, opted out of using it. This commit makes it use spanconfigs after giving up on reproducing/diagnosing the original flake (this test is notoriously slow -- taking 30+s given it waits for actual upreplication and replica movement, so not --stress friendly). Release note: None

Fixes cockroachdb#98200. This test was written pre-spanconfig days, and when enabling spanconfigs by default, opted out of using it. This commit makes it use spanconfigs after giving up on reproducing/diagnosing the original flake (this test is notoriously slow -- taking 30+s given it waits for actual upreplication and replica movement, so not --stress friendly). Using spanconfigs here surfaced a rare, latent bug, one this author incorrectly thought was fixed back in cockroachdb#75939. In very rare cases, right during cluster bootstrap before the span config reconciler has ever had a chance to run (i.e. system.span_configurations is empty), it's possible that the subscriber has subscribed to an empty span config state (we've only seen this happen in unit tests with 50ms scan intervals). So it's not been meaningfully "updated" in any sense of the word, but we still previously set a non-empty last-updated timestamp, something various components in KV rely on as proof that we have span configs as of some timestamp. As a result, we saw KV incorrectly merge away the liveness range into adjacent ranges, and then later split it off. We don't think we've ever seen this happen outside of tests as it instantly triggers the following fatal in the raftScheduler, which wants to prioritize the liveness range above all else: panic: priority range ID already set: old=2, new=61, first set at: Release note: None

irfansharif · 2023-04-19T14:35:04Z

See discussion over at #100210, this test is being rewritten and did surface a latent bug, but one that existed since 22.1.

github-actions · 2023-05-22T10:06:15Z

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

knz · 2023-05-23T13:12:02Z

still relevant

Fixes cockroachdb#98200. This test was written pre-spanconfig days, and when enabling spanconfigs by default over a year ago, opted out of using it. It's a real chore to bring this old test back up to spec (cockroachdb#100210 is an earlier attempt). It has been skipped for a while after flaking (for test-only reasons that are understood, see cockroachdb#100210) and is notoriously slow taking 30+s given it waits for actual upreplication and replica movement, making it not --stress friendly. In our earlier attempt to upgrade this to use spanconfigs, we learnt two new things: - There was a latent bug, previously thought to have been fixed in cockroachdb#75939. In very rare cases, right during cluster bootstrap before the span config reconciler has ever had a chance to run (i.e. system.span_configurations is empty), it was possible that the subscriber had subscribed to an empty span config state (we've only seen this happen in unit tests with 50ms scan intervals). So it was not been meaningfully "updated" in any sense of the word, but we still previously set a non-empty last-updated timestamp, something various components in KV rely on as proof that we have span configs as of some timestamp. As a result, we saw KV incorrectly merge away the liveness range into adjacent ranges, and then later split it off. We don't think we've ever seen this happen outside of tests as it instantly triggers the following fatal in the raftScheduler, which wants to prioritize the liveness range above all else: panic: priority range ID already set: old=2, new=61, first set at: This bug continues to exist. We've filed cockroachdb#104195 to track fixing it. - Fixing the bug above (by erroring out until a span config snapshot is available) made it so that tests now needed to actively wait for a span config snapshot before relocating ranges manually or using certain kv queues. Adding that synchronization made lots of tests a whole lot slower (by 3+s each) despite reducing the closed timestamp interval, etc. These tests weren't really being harmed by the bug (== empty span config snapshot). So it's not clear that the bug fix is worth fixing. But that can be litigated in cockroachdb#104195. We don't really need this test in this current form (end-to-end spanconfig tests exist elsewhere and are more comprehensive without suffering the issues above). Release note: None

github-actions · 2023-06-26T10:06:20Z

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

104198: kvserver: kill TestSystemZoneConfigs r=irfansharif a=irfansharif Fixes #98200. This test was written pre-spanconfig days, and when enabling spanconfigs by default over a year ago, opted out of using it. It's a real chore to bring this old test back up to spec (#100210 is an earlier attempt). It has been skipped for a while after flaking (for test-only reasons that are understood, see #100210) and is notoriously slow taking 30+s given it waits for actual upreplication and replica movement, making it not --stress friendly. In our earlier attempt to upgrade this to use spanconfigs, we learnt two new things: - There was a latent bug, previously thought to have been fixed in #75939. In very rare cases, right during cluster bootstrap before the span config reconciler has ever had a chance to run (i.e. system.span_configurations is empty), it was possible that the subscriber had subscribed to an empty span config state (we've only seen this happen in unit tests with 50ms scan intervals). So it was not been meaningfully "updated" in any sense of the word, but we still previously set a non-empty last-updated timestamp, something various components in KV rely on as proof that we have span configs as of some timestamp. As a result, we saw KV incorrectly merge away the liveness range into adjacent ranges, and then later split it off. We don't think we've ever seen this happen outside of tests as it instantly triggers the following fatal in the raftScheduler, which wants to prioritize the liveness range above all else: panic: priority range ID already set: old=2, new=61, first set at: This bug continues to exist. We've filed #104195 to track fixing it. - Fixing the bug above (by erroring out until a span config snapshot is available) made it so that tests now needed to actively wait for a span config snapshot before relocating ranges manually or using certain kv queues. Adding that synchronization made lots of tests a whole lot slower (by 3+s each) despite reducing the closed timestamp interval, etc. These tests weren't really being harmed by the bug (== empty span config snapshot). So it's not clear that the bug fix is worth fixing. But that can be litigated in #104195. We don't really need this test in this current form (end-to-end spanconfig tests exist elsewhere and are more comprehensive without suffering the issues above). Release note: None Co-authored-by: irfan sharif <[email protected]>

cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Mar 8, 2023

cockroach-teamcity added this to the 23.1 milestone Mar 8, 2023

blathers-crl bot added the T-kv KV Team label Mar 8, 2023

kvoli added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Mar 8, 2023

nvanbenschoten added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 8, 2023

nvanbenschoten assigned irfansharif Mar 17, 2023

irfansharif removed the GA-blocker label Mar 29, 2023

irfansharif mentioned this issue Mar 30, 2023

kvserver: update TestSystemZoneConfigs #100210

Closed

nvanbenschoten added the GA-blocker label Apr 3, 2023

irfansharif removed the GA-blocker label Apr 19, 2023

github-actions bot added the no-test-failure-activity label May 22, 2023

knz removed the no-test-failure-activity label May 23, 2023

irfansharif mentioned this issue Jun 1, 2023

kvserver: kill TestSystemZoneConfigs #104198

Merged

github-actions bot added the no-test-failure-activity label Jun 26, 2023

craig bot closed this as completed in 8395a21 Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/kvserver: TestSystemZoneConfigs failed #98200

kv/kvserver: TestSystemZoneConfigs failed #98200

cockroach-teamcity commented Mar 8, 2023 •

edited by cockroach-jira-scripts

Loading

irfansharif commented Mar 17, 2023

cockroach-teamcity commented Mar 21, 2023

nvanbenschoten commented Mar 22, 2023

irfansharif commented Mar 22, 2023

irfansharif commented Mar 29, 2023

irfansharif commented Apr 19, 2023

github-actions bot commented May 22, 2023

knz commented May 23, 2023

github-actions bot commented Jun 26, 2023

kv/kvserver: TestSystemZoneConfigs failed #98200

kv/kvserver: TestSystemZoneConfigs failed #98200

Comments

cockroach-teamcity commented Mar 8, 2023 • edited by cockroach-jira-scripts Loading

irfansharif commented Mar 17, 2023

cockroach-teamcity commented Mar 21, 2023

nvanbenschoten commented Mar 22, 2023

irfansharif commented Mar 22, 2023

irfansharif commented Mar 29, 2023

irfansharif commented Apr 19, 2023

github-actions bot commented May 22, 2023

knz commented May 23, 2023

github-actions bot commented Jun 26, 2023

cockroach-teamcity commented Mar 8, 2023 •

edited by cockroach-jira-scripts

Loading