Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

part: Fix bug with short and long key with shared prefix #22

Merged
merged 6 commits into from
Apr 29, 2024

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Apr 25, 2024

If we insert "foobar" and then "foo" we ended up in a case where "foo" was the common prefix and the rest of the key become empty leading to out-of-bounds access to the key slice.

Handle the case where the key becomes empty when splitting a node by just inserting it as the leaf of the split node.

Added a regression test and validated it catched the bug:

--- FAIL: Test_prefix_regression (0.00s)
panic: runtime error: index out of range [0] with length 0 [recovered]
        panic: runtime error: index out of range [0] with length 0

goroutine 32 [running]:
testing.tRunner.func1.2({0x5f3380, 0xc00001a288})
        /home/jussi/sdk/go1.22.0/src/testing/testing.go:1631 +0x24a
testing.tRunner.func1()
        /home/jussi/sdk/go1.22.0/src/testing/testing.go:1634 +0x377
panic({0x5f3380?, 0xc00001a288?})
        /home/jussi/sdk/go1.22.0/src/runtime/panic.go:770 +0x132
github.com/cilium/statedb/part.(*Txn[...]).insert(0x6725e0, ...
        /home/jussi/go/src/github.com/cilium/statedb/part/txn.go:228 +0xdaf

If we insert "foobar" and then "foo" we ended up in a case where
"foo" was the common prefix and the rest of the key become empty
leading to out-of-bounds access to the key slice.

Handle the case where the key becomes empty when splitting a node
by just inserting it as the leaf of the split node.

Added a regression test and validated it catched the bug:

--- FAIL: Test_prefix_regression (0.00s)
panic: runtime error: index out of range [0] with length 0 [recovered]
        panic: runtime error: index out of range [0] with length 0

goroutine 32 [running]:
testing.tRunner.func1.2({0x5f3380, 0xc00001a288})
        /home/jussi/sdk/go1.22.0/src/testing/testing.go:1631 +0x24a
testing.tRunner.func1()
        /home/jussi/sdk/go1.22.0/src/testing/testing.go:1634 +0x377
panic({0x5f3380?, 0xc00001a288?})
        /home/jussi/sdk/go1.22.0/src/runtime/panic.go:770 +0x132
github.com/cilium/statedb/part.(*Txn[...]).insert(0x6725e0, ...
        /home/jussi/go/src/github.com/cilium/statedb/part/txn.go:228 +0xdaf

Signed-off-by: Jussi Maki <[email protected]>
@joamaki joamaki requested a review from bimmlerd April 25, 2024 16:20
@joamaki
Copy link
Contributor Author

joamaki commented Apr 25, 2024

I think we need to switch the fuzz_test.go to use variable-length keys to exercise these sort of code paths.

Copy link

github-actions bot commented Apr 25, 2024

$ make test
go: downloading github.com/mitchellh/mapstructure v1.5.0
go: downloading golang.org/x/term v0.16.0
go: downloading github.com/spf13/cast v1.6.0
go: downloading github.com/fsnotify/fsnotify v1.7.0
go: downloading github.com/sagikazarmark/slog-shim v0.1.0
go: downloading github.com/spf13/afero v1.11.0
go: downloading golang.org/x/sys v0.17.0
go: downloading github.com/subosito/gotenv v1.6.0
go: downloading github.com/hashicorp/hcl v1.0.0
go: downloading gopkg.in/ini.v1 v1.67.0
go: downloading github.com/magiconair/properties v1.8.7
go: downloading github.com/pelletier/go-toml/v2 v2.1.0
go: downloading golang.org/x/text v0.14.0
	github.com/cilium/statedb/reconciler/example		coverage: 0.0% of statements
	github.com/cilium/statedb/reconciler/benchmark		coverage: 0.0% of statements
ok  	github.com/cilium/statedb	5.197s	coverage: 87.6% of statements
ok  	github.com/cilium/statedb/index	0.004s	coverage: 31.1% of statements
ok  	github.com/cilium/statedb/internal	0.011s	coverage: 93.3% of statements
ok  	github.com/cilium/statedb/part	1.742s	coverage: 88.2% of statements
ok  	github.com/cilium/statedb/reconciler	0.618s	coverage: 89.9% of statements
-----
$ make bench
go test ./... -bench . -benchmem -test.run xxx
goos: linux
goarch: amd64
pkg: github.com/cilium/statedb
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkDB_WriteTxn_1-4                    	  519145	      2342 ns/op	    427004 objects/sec	    2600 B/op	      31 allocs/op
BenchmarkDB_WriteTxn_10-4                   	 1331232	       889.6 ns/op	   1124097 objects/sec	     681 B/op	      10 allocs/op
BenchmarkDB_WriteTxn_100-4                  	 1669779	       696.5 ns/op	   1435685 objects/sec	     578 B/op	       7 allocs/op
BenchmarkDB_WriteTxn_1000-4                 	 1605256	       745.2 ns/op	   1341997 objects/sec	     543 B/op	       7 allocs/op
BenchmarkDB_WriteTxn_10000-4                	 1291904	      1017 ns/op	    983677 objects/sec	     595 B/op	       7 allocs/op
BenchmarkDB_WriteTxn_100_SecondaryIndex-4   	  876000	      1365 ns/op	    732687 objects/sec	     970 B/op	      15 allocs/op
BenchmarkDB_RandomInsert-4                  	    2444	    479322 ns/op	   2086280 objects/sec	  422897 B/op	    7151 allocs/op
BenchmarkDB_RandomReplace-4                 	     710	   1699024 ns/op	    588573 objects/sec	 1154292 B/op	   16503 allocs/op
BenchmarkDB_SequentialInsert-4              	    1666	    740593 ns/op	   1350270 objects/sec	  542605 B/op	    7221 allocs/op
BenchmarkDB_Changes_Baseline-4              	    1298	    920636 ns/op	   1086206 objects/sec	  582988 B/op	   10301 allocs/op
BenchmarkDB_Changes-4                       	     700	   1681646 ns/op	    594655 objects/sec	 1041324 B/op	   14646 allocs/op
BenchmarkDB_RandomLookup-4                  	   23740	     49720 ns/op	  20112468 objects/sec	     144 B/op	       1 allocs/op
BenchmarkDB_SequentialLookup-4              	   30526	     39609 ns/op	  25247106 objects/sec	       0 B/op	       0 allocs/op
BenchmarkDB_FullIteration_All-4             	     806	   1444882 ns/op	  69209947 objects/sec	     360 B/op	       7 allocs/op
BenchmarkDB_FullIteration_Get-4             	     238	   5006377 ns/op	  19974560 objects/sec	       0 B/op	       0 allocs/op
BenchmarkDB_PropagationDelay-4              	  491049	      2095 ns/op	        18.00 50th_µs	        22.00 90th_µs	        93.00 99th_µs	    1440 B/op	      23 allocs/op
PASS
ok  	github.com/cilium/statedb	25.573s
PASS
ok  	github.com/cilium/statedb/index	0.004s
PASS
ok  	github.com/cilium/statedb/internal	0.003s
goos: linux
goarch: amd64
pkg: github.com/cilium/statedb/part
cpu: AMD EPYC 7763 64-Core Processor                
Benchmark_Insert_RootOnlyWatch-4    	    9156	    123484 ns/op	   8098233 objects/sec	  119103 B/op	    2067 allocs/op
Benchmark_Insert-4                  	    6426	    172880 ns/op	   5784366 objects/sec	  219615 B/op	    3105 allocs/op
Benchmark_Replace-4                 	47342966	        25.17 ns/op	  39737723 objects/sec	       0 B/op	       0 allocs/op
Benchmark_Replace_RootOnlyWatch-4   	47701921	        25.13 ns/op	  39793074 objects/sec	       0 B/op	       0 allocs/op
Benchmark_txn_1-4                   	 3740042	       321.2 ns/op	   3113571 objects/sec	     336 B/op	       6 allocs/op
Benchmark_txn_10-4                  	 8477569	       138.3 ns/op	   7230317 objects/sec	     142 B/op	       2 allocs/op
Benchmark_txn_100-4                 	 8603142	       147.8 ns/op	   6765803 objects/sec	     211 B/op	       2 allocs/op
Benchmark_txn_1000-4                	 8129310	       146.5 ns/op	   6827519 objects/sec	     211 B/op	       2 allocs/op
Benchmark_txn_10000-4               	 5951853	       198.5 ns/op	   5036942 objects/sec	     224 B/op	       2 allocs/op
Benchmark_txn_100000-4              	 4426680	       272.4 ns/op	   3671411 objects/sec	     337 B/op	       2 allocs/op
Benchmark_txn_delete_1-4            	 3673633	       326.8 ns/op	   3059939 objects/sec	     840 B/op	       6 allocs/op
Benchmark_txn_delete_10-4           	10123044	       115.4 ns/op	   8665848 objects/sec	     129 B/op	       1 allocs/op
Benchmark_txn_delete_100-4          	12450726	        93.03 ns/op	  10749153 objects/sec	      61 B/op	       1 allocs/op
Benchmark_txn_delete_1000-4         	14170862	        84.86 ns/op	  11783548 objects/sec	      26 B/op	       1 allocs/op
Benchmark_txn_delete_10000-4        	11852568	       101.5 ns/op	   9850479 objects/sec	      25 B/op	       1 allocs/op
Benchmark_txn_delete_100000-4       	10887048	       112.0 ns/op	   8926622 objects/sec	      25 B/op	       1 allocs/op
Benchmark_Get-4                     	   50638	     23623 ns/op	  42331061 objects/sec	       0 B/op	       0 allocs/op
Benchmark_Iterate-4                 	  162938	      7356 ns/op	 135937864 objects/sec	      80 B/op	       3 allocs/op
Benchmark_Hashmap_Insert-4          	   15440	     77760 ns/op	  12860080 objects/sec	   86554 B/op	      64 allocs/op
Benchmark_Hashmap_Get_Uint64-4      	  147078	      8085 ns/op	 123688150 objects/sec	       0 B/op	       0 allocs/op
Benchmark_Hashmap_Get_Bytes-4       	  147986	      8125 ns/op	 123084366 objects/sec	       0 B/op	       0 allocs/op
PASS
ok  	github.com/cilium/statedb/part	29.586s
PASS
ok  	github.com/cilium/statedb/reconciler	0.005s
?   	github.com/cilium/statedb/reconciler/benchmark	[no test files]
?   	github.com/cilium/statedb/reconciler/example	[no test files]
go run ./reconciler/benchmark -quiet
100000 objects reconciled in 0.26 seconds (batch size 1000)
Throughput 380804.99 objects per second
Allocated 600528 objects, 40858kB bytes, 49872kB bytes still in use

joamaki added 2 commits April 25, 2024 18:25
This exercises the underlying library in more ways than just
using a fixed-length big endian uint64.

Signed-off-by: Jussi Maki <[email protected]>
When leaf was moved from header to the node structs we
missed setting the leaf when promoting/demoting.

Signed-off-by: Jussi Maki <[email protected]>
@bimmlerd
Copy link
Member

Hm, race found another bug - investigating.

joamaki and others added 3 commits April 29, 2024 15:35
Tests pass if the "mutated" cache is disabled. There's likely an
issue related to leaf channels?

Signed-off-by: Jussi Maki <[email protected]>
When deleting, we need to ensure that the target node watch channel is
closed. However, if we mutated the node in the transaction already,
we're looking at the cloned one, hence don't close the new channel too.

The regression was found in the bigger hex iteration test, but reduced to:

node4[]:(0xc00012a300)
 leaf[30]: 30 -> 0(0xc00014a410)
 node4[31]: 31 -> 1(0xc00012a360)
  leaf[30]: 3130 -> 2(0xc00014a550)
  leaf[31]: 3131 -> 3(0xc00014a5f0)

After deleting 3 and 1

node4[]:(0xc00012a420)
 leaf[30]: 30 -> 0(0xc00014a410)
 node4[31]:(0xc00012a3c0)
  leaf[30]: 3130 -> 2(0xc00014a550

The channel of node4[31] was closed already, which is wrong. It was
closed because we deleted 3, its child, and then deleted its leaf -
which didn't clone the node again, but closed the new watch channel.

Fixes: c7b7b15 (part: Fix missing closing of leaf watch channel)

Reported-by: Jussi Maki <[email protected]>
Signed-off-by: David Bimmler <[email protected]>
@joamaki joamaki force-pushed the pr/joamaki/fix-shared-prefix-bug branch from 7b2d0f6 to 8e8fc3b Compare April 29, 2024 13:35
@joamaki
Copy link
Contributor Author

joamaki commented Apr 29, 2024

The second bug was that for non-leaf nodes we didn't return the old value.

@joamaki joamaki merged commit bab8a70 into main Apr 29, 2024
1 check passed
@joamaki joamaki deleted the pr/joamaki/fix-shared-prefix-bug branch April 29, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants