-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI: fix transaction handling in state store #9438
Conversation
CSI plugin counts can drift from the correct values if allocations for a CSI plugin stop concurrently, as with a job stop or a controller and node plugin on the same client. This test demonstrates the behavior.
CSI volumes need to be "denormalized" with their plugins and (optionally) allocations. Read-only RPC endpoints should take a snapshot so that we can make multiple state store method calls with a consistent view.
fb72efe
to
6ab4288
Compare
6ab4288
to
346f933
Compare
When making updates to CSI plugins, the state store methods that have open write transactions were querying the state store using the same methods used by the CSI RPC endpoint, but these method creates their own top-level read transactions. During concurrent plugin updates (as happens when a plugin job is stopped), this can cause write skew in the plugin counts. Refactor the CSIPlugin query methods to have an implementation method that accepts a transaction, which can be called with either a read txn or a write txn.
Add missing nil checks and missing watch `Add` calls on some state store methods for CSI volumes
When making updates to CSI volumes, the state store methods that have open write transactions were querying the state store using the same methods used by the CSI RPC endpoint, but these method creates their own top-level read transactions. These have yet not been implicated in any bugs. Refactor the CSIVolume query methods to have an implementation method that accepts a transaction, which can be called with either a read txn or a write txn.
346f933
to
8b88e59
Compare
if vol.Namespace != ns { | ||
continue | ||
} | ||
|
||
vol, err := snap.CSIVolumeDenormalizePlugins(ws, vol.Copy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the significance of copying the volume here, but not during Get
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CSIVolumeByID
query in Get
does a CSIVolumeDenormalizePlugins
in the state store, so we already copy in the state store function. (I'm not wild about how that works currently because it's too easy to miss... I didn't want to get too wild with the refactoring in this PR though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a feww non blocking questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The commit breakdown with the commit messages are super awesome - made it easy to review this code. Thanks!
nomad/state/state_store.go
Outdated
if ws != nil { | ||
ws.Add(watchCh) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that this is explicit, I always get nervous when handling watchset. Want to note that this is not required - many other cases don't do a nil check, and apparently it's handled in Add: https://github.com/hashicorp/go-memdb/blob/v1.3.0/watch.go#L16-L20
Interestingly, this commit adds a new ws.Add
without doing a nil check ;-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and it's a documented part of the ws.Add
method. Maybe we should just remove these extra checks then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 88a7e71
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that's interesting... removing those caused NPE errors at those lines in the test runs for jobVersionByID
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a928e1]
goroutine 54 [running]:
github.com/hashicorp/nomad/nomad/state.(*StateStore).jobVersionByID(0xc00062b470, 0xc00057e180, 0x0, 0xc00054b630, 0x7, 0xc000551380, 0x28, 0x0, 0x0, 0x0, ...)
github.com/hashicorp/nomad/nomad/state/state_store.go:1842 +0x161
github.com/hashicorp/nomad/nomad/state.(*StateStore).upsertJobVersion(0xc00062b470, 0x7, 0xc00064e900, 0xc00057e180, 0x0, 0x0)
github.com/hashicorp/nomad/nomad/state/state_store.go:1753 +0x249
github.com/hashicorp/nomad/nomad/state.(*StateStore).upsertJobImpl(0xc00062b470, 0x7, 0xc00064e6c0, 0x35de500, 0xc00057e180, 0x0, 0x1)
github.com/hashicorp/nomad/nomad/state/state_store.go:1523 +0x42c
github.com/hashicorp/nomad/nomad/state.(*StateStore).UpsertJob(0xc00062b470, 0x4, 0x7, 0xc00064e6c0, 0x0, 0x0)
github.com/hashicorp/nomad/nomad/state/state_store.go:1449 +0xa8
github.com/hashicorp/nomad/nomad.(*nomadFSM).applyUpsertJob(0xc0005f5030, 0x4, 0xc000391801, 0x6b4, 0xbb3, 0x7, 0x0, 0x0)
github.com/hashicorp/nomad/nomad/fsm.go:507 +0x205
github.com/hashicorp/nomad/nomad.(*nomadFSM).Apply(0xc0005f5030, 0xc000241a90, 0x51dd880, 0xbfe7b879c156277c)
github.com/hashicorp/nomad/nomad/fsm.go:219 +0x34e
github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc0006a43e0)
github.com/hashicorp/[email protected]/fsm.go:90 +0x2c2
github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc000692e00, 0x1, 0x40)
github.com/hashicorp/[email protected]/fsm.go:113 +0x75
github.com/hashicorp/raft.(*Raft).runFSM(0xc0005a8000)
github.com/hashicorp/[email protected]/fsm.go:219 +0x3c4
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc0005a8000, 0xc0004c4100)
github.com/hashicorp/[email protected]/state.go:146 +0x55
created by github.com/hashicorp/raft.(*raftState).goFunc
github.com/hashicorp/[email protected]/state.go:144 +0x66
jobs_test.go:2176: err: Put "http://127.0.0.1:31070/v1/jobs": EOF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I must be missing something here, because testing with the actual go-memdb library shows otherwise...
func TestNilWatch(t *testing.T) {
test := func(ws WatchSet) {
watchCh := make(chan struct{})
ws.Add(watchCh)
}
test(nil)
test(NewWatchSet())
}
$ go test . -v -run TestNilWatch -count=1
=== RUN TestNilWatch
--- PASS: TestNilWatch (0.00s)
PASS
ok github.com/hashicorp/go-memdb 0.025s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's because jobVersionByID
's signature is:
(s *StateStore) jobVersionByID(txn *txn, ws *memdb.WatchSet, namespace, id string) ([]*structs.Job, error)
I don't think we're supposed to be taking that pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting jobVersionByID
takes a *memdb.WatchSet
unlike the other methods in this file. It was introduced as such since the beginning in https://github.com/hashicorp/nomad/pull/2566/files#diff-53c7d524897a192ece819816a9dcf006dee46747fbb853e7f638601ea556a72cR561-R572 - but without explanation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it's the only place in the code base where we do that, including Consul and Vault, I suspect it's just a long-lived typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops - I missed the last comment when I posted mine - I agree, it shouldn't be a pointer here.
b3a9402
to
20d66a5
Compare
20d66a5
to
6796a33
Compare
6796a33
to
3b3bc87
Compare
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #9371 and possibly #9248 and #8034
When making updates to CSI plugins and volumes, the state store methods that have open write transactions were querying the state store using the same methods used by the CSI RPC endpoint, but these methods create their own top-level read transactions. During concurrent plugin updates (as happens when a plugin job is stopped), this can cause write skew in the plugin counts. This bug hasn't yet been identified as the root cause of any volume query bug, but it seems likely.
Best reviewed commit-by-commit.