-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: time out of range #409
Conversation
Signed-off-by: fudongying <[email protected]>
Thanks for raising this PR @fudongyingluck. Checks are failing due to stale artifacts, will wait to re-run checks until the next 2.9.0 build is successful
|
Signed-off-by: fudongying <[email protected]>
@joshpalis we found that dirty data stale in the memory after exception. If the index migrates back to this node after we migrate it to another, then the job loses again. So we add the second commit to deal with this condition. |
Codecov Report
@@ Coverage Diff @@
## main #409 +/- ##
============================================
+ Coverage 28.77% 29.19% +0.42%
- Complexity 97 98 +1
============================================
Files 22 22
Lines 1178 1185 +7
Branches 109 109
============================================
+ Hits 339 346 +7
Misses 818 818
Partials 21 21
|
@joshpalis Checks seem successful now. Is it convenient for you to review those codes ? Really thanks for your time ~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix!
In the future, it would be helpful to create an issue reporting the details of the bug and then reference that issue in the PR. In case there's a need to discuss appropriate fixes for the bug that the PR might not handle it keeps the discussions separate.
Fix LGTM with a few nits.
src/main/java/org/opensearch/jobscheduler/scheduler/JobScheduler.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/jobscheduler/scheduler/JobScheduler.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/jobscheduler/scheduler/JobScheduler.java
Outdated
Show resolved
Hide resolved
Signed-off-by: fudongying <[email protected]>
@dbwiddis Really thanks for your time and advice. I'll create a bug issue next time. And The code is changed as your comments in the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fudongyingluck This PR looks good to me, but I'd be curious to know how to reproduce the scenario. Would you be able to provide steps for how to recreate the scenario where duration can be negative? I can see in the code its possible if the current instant is after the nextExecutionTime, but does that mean that a job had previously failed to run and the nextExecutionTime was not updated?
How can the situation arise where nextExecutionTime is in the past? Thank you.
@cwperks Good question. We also feel curious when the job disappears until we found logs. The thing is the cloud service which the ES k8s instance runs on, is unavailable for some time. Then the ES instance seems not to run at those times, I don't know how this happened. After about 30m, the ES instance reruns again, and the exception occurs. |
@fudongyingluck Thank you for the context! |
* fix: time out of range Signed-off-by: fudongying <[email protected]> * fix: deschedule failed after schedule exception Signed-off-by: fudongying <[email protected]> * chore: dbwiddis's comments Signed-off-by: fudongying <[email protected]> --------- Signed-off-by: fudongying <[email protected]> (cherry picked from commit 9f4ec67)
* fix: time out of range Signed-off-by: fudongying <[email protected]> * fix: deschedule failed after schedule exception Signed-off-by: fudongying <[email protected]> * chore: dbwiddis's comments Signed-off-by: fudongying <[email protected]> --------- Signed-off-by: fudongying <[email protected]> (cherry picked from commit 9f4ec67) Signed-off-by: Joshua Palis <[email protected]>
* fix: time out of range * fix: deschedule failed after schedule exception * chore: dbwiddis's comments --------- (cherry picked from commit 9f4ec67) Signed-off-by: fudongying <[email protected]> Signed-off-by: Joshua Palis <[email protected]> Co-authored-by: fudongying <[email protected]>
We found that when current time greater than
nextExecutionTime
, theTimeValue
inthreadPool.schedule
will throw anIllegalArgumentException
as followingThen the job will not be scheduled anymore.
This change fixes this, by setting the
nextExecutionTime
to current time.Thanks my colleague @kkewwei solve this out.
Signed-off-by: fudongying <[email protected]>
Signed-off-by: kewei.11 <[email protected]>