-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor blocking/migrating allocations #3007
Conversation
interface has 3 implementations: 1. local for blocking and moving data locally 2. remote for blocking and moving data from another node 3. noop for allocs that don't need to block
Fixup some TODOs and formatting left from new prevAllocWatcher code.
Add test for that case, add comments, remove debug logging
2e8feac
to
4f7c5e6
Compare
} | ||
|
||
// Migrate from previous local alloc dir to destination alloc dir. | ||
func (p *localPrevAlloc) Migrate(ctx context.Context, dest *allocdir.AllocDir) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function returns an error, but below only nil is returned. Should errors be returned as well as logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. I think I'm going to change this to return the error from allocdir.Move but always call allocdir.Destroy since it's a cleanup method we always want to call.
Since the caller logs errors from this method and continues nothing will change functionally but it will make this local Migrate method work like the remote Migrate method.
|
||
tg := alloc.Job.LookupTaskGroup(alloc.TaskGroup) | ||
|
||
if prevAR != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to have two different constructors with names indicating the difference in what kind of AllocWatcher they create (or add comments explaining this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this conditional is ugly, it is intentionally put inside this unified constructor so that the same logic doesn't have to be duplicated in client.go
. There's only 2 places that call into this new func, but I'd rather just let them ignore these conditions and leave it up to this file to handle it all.
Definitely up for debate...
client/alloc_watcher.go
Outdated
// terminated (and therefore won't send updates to the listener) | ||
prevStatus terminated | ||
|
||
// prevWaitCh is closed when the previous alloc is GC'd which is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GC'd -> garbage collected
client/alloc_runner.go
Outdated
@@ -80,7 +80,14 @@ type AllocRunner struct { | |||
vaultClient vaultclient.VaultClient | |||
consulClient ConsulServiceAPI | |||
|
|||
otherAllocDir *allocdir.AllocDir | |||
prevAlloc prevAllocWatcher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment
client/allocdir/alloc_dir.go
Outdated
@@ -188,6 +191,11 @@ func (d *AllocDir) Snapshot(w io.Writer) error { | |||
|
|||
// Move other alloc directory's shared path and local dir to this alloc dir. | |||
func (d *AllocDir) Move(other *AllocDir, tasks []*structs.Task) error { | |||
if !d.built { | |||
// Enfornce the invariant that Build is called before Move |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling
@@ -297,16 +297,14 @@ func (a *AllocGarbageCollector) MakeRoomFor(allocations []*structs.Allocation) e | |||
} | |||
|
|||
// MarkForCollection starts tracking an allocation for Garbage Collection | |||
func (a *AllocGarbageCollector) MarkForCollection(ar *AllocRunner) error { | |||
if ar == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any insight on why you removed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is called in a very limited number of places, all of which handle nil AllocRunners.
|
||
return resp.Node, nil | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
client/alloc_runner.go
Outdated
} | ||
|
||
r.waitingLock.Lock() | ||
r.blocked = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment in watcher. Would like to not have this
type prevAllocWatcher interface { | ||
// Wait for previous alloc to terminate | ||
Wait(context.Context) error | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add IsWaiting
and IsMigrating
methods and the alloc runner can just call into the watcher so we can skip storing that state there and remove the locks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved. Still pretty obnoxiously ugly code that would be nice to roll into more holistic (sub-)state handling someday, but at least it's tucked away out of sight in the hopefully rarely viewed alloc_watcher.go file.
client/alloc_watcher.go
Outdated
} | ||
} | ||
|
||
if done() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return ctx.Err()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call
client/alloc_watcher.go
Outdated
// migrate a remote alloc dir to local node | ||
func (p *remotePrevAlloc) migrateAllocDir(ctx context.Context, nodeAddr string) (*allocdir.AllocDir, error) { | ||
// Create the previous alloc dir | ||
prevAllocDir := allocdir.NewAllocDir(p.logger, filepath.Join(p.config.AllocDir, p.prevAllocID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What cleans this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does on errors otherwise the caller does. Clarified in the docstring.
Since alloc runner just logs these errors and continues there's no reason not to return it.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Two major components:
AllocRunner now handles the blocking/migrating logic by calling into the new alloc_watcher code. This allows the Client's add/remove alloc code to be much simpler (and less locking!).
(Speaking of locking, the new locking in alloc runner around the blocked and migrating bools is really verbose and annoying, but I think we can remove it in the future when we offer a better mechanism to expose alloc states.)