-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true #16583
Conversation
7f5b240
to
e4cb7db
Compare
8920548
to
9a9faf7
Compare
4137e01
to
a6bda8f
Compare
a6bda8f
to
eec8f23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, thanks so much for taking this on. Most on the suggestions are nitpicky around using must
and a couple of stylistic suggestions.
The biggest item is that we need to revert the changes made within nomad/periodic.go due to the problem described in my comment.
There also seem to be failures and panics within the CI tests that we should look into and fix before merging.
The PR will also need a changelog entry before we can merge.
nomad/leader.go
Outdated
// isNewEvalNeeded checks if the job allows for overlap and if there are already | ||
// instances of the job running in order to determine if a new evaluation needs to | ||
// be created upon periodic dispatcher restore | ||
func (s *Server) isNewEvalNeeded(job *structs.Job) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to bikeshed too much, but this function is specific to periodic jobs, therefore it might be nice to have a more descriptive name. Otherwise, this seems like a generic job helper function at first glance.
nomad/leader_test.go
Outdated
@@ -465,7 +487,15 @@ func TestLeader_PeriodicDispatcher_Restore_Evals(t *testing.T) { | |||
} | |||
|
|||
// Create an eval for the past launch. | |||
s1.periodicDispatcher.createEval(job, past) | |||
eval, _ := s1.periodicDispatcher.createEval(job, past) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let check this error return using must.NoError
to be sure.
nomad/periodic.go
Outdated
@@ -278,10 +278,10 @@ func (p *PeriodicDispatch) removeLocked(jobID structs.NamespacedID) error { | |||
// subsequent eval. | |||
func (p *PeriodicDispatch) ForceRun(namespace, jobID string) (*structs.Evaluation, error) { | |||
p.l.Lock() | |||
defer p.l.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to go back to the original style here, as the comment on the createEval
function reads "This should not be called with the lock held."
nomad/periodic.go
Outdated
@@ -278,10 +278,10 @@ func (p *PeriodicDispatch) removeLocked(jobID structs.NamespacedID) error { | |||
// subsequent eval. | |||
func (p *PeriodicDispatch) ForceRun(namespace, jobID string) (*structs.Evaluation, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this is an existing method name, but I wonder if calling this ForceRun
rather than ForceEval
is misleading.
Co-authored-by: James Rasell <[email protected]>
Co-authored-by: James Rasell <[email protected]>
Co-authored-by: James Rasell <[email protected]>
Co-authored-by: James Rasell <[email protected]>
Co-authored-by: James Rasell <[email protected]>
Co-authored-by: James Rasell <[email protected]>
Co-authored-by: James Rasell <[email protected]>
Co-authored-by: James Rasell <[email protected]>
Co-authored-by: James Rasell <[email protected]>
Co-authored-by: James Rasell <[email protected]>
Co-authored-by: James Rasell <[email protected]>
createEval cant be called with the lock on
407090f
to
75ccde6
Compare
…hibit_overlap is true (#16583) * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous childre. * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous children. * style: refactor force run function * fix: remove defer and inline unlock for speed optimization * Update nomad/leader.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * style: refactor tests to use must * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * fix: move back from defer to calling unlock before returning. createEval cant be called with the lock on * style: refactor test to use must * added new entry to changelog and update comments --------- Co-authored-by: James Rasell <[email protected]> Co-authored-by: James Rasell <[email protected]>
…hibit_overlap is true (#16583) * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous childre. * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous children. * style: refactor force run function * fix: remove defer and inline unlock for speed optimization * Update nomad/leader.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * style: refactor tests to use must * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * fix: move back from defer to calling unlock before returning. createEval cant be called with the lock on * style: refactor test to use must * added new entry to changelog and update comments --------- Co-authored-by: James Rasell <[email protected]> Co-authored-by: James Rasell <[email protected]>
…hibit_overlap is true (#16583) * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous childre. * Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true Fixes #11052 When restoring periodic dispatcher, all periodic jobs are forced without checking for previous children. * style: refactor force run function * fix: remove defer and inline unlock for speed optimization * Update nomad/leader.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * style: refactor tests to use must * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * Update nomad/leader_test.go Co-authored-by: James Rasell <[email protected]> * fix: move back from defer to calling unlock before returning. createEval cant be called with the lock on * style: refactor test to use must * added new entry to changelog and update comments --------- Co-authored-by: James Rasell <[email protected]> Co-authored-by: James Rasell <[email protected]>
…hibit_overlap is true (#16583) (#16683) Co-authored-by: Juana De La Cuesta <[email protected]>
…hibit_overlap is true (#16583) (#16682) Co-authored-by: Juana De La Cuesta <[email protected]>
…hibit_overlap is true (#16583) (#16681) Co-authored-by: James Rasell <[email protected]> Co-authored-by: James Rasell <[email protected]>
This PR addresses the bug reported on #11052
When a leader change happens, the periodic dispatcher on the new leader starts by re running all periodic jobs by force, without checking if there is an instance of the said job already.
A new check is introduced that skips the job if prohibit_overlap is set and there is already a instance running.