-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improper periodic job launch #3703
Comments
@tantra35 Do you have:
What happened that this allocation got marked as stopped:
It is possible that the node it was running on was drained/lost and then there wasn't enough capacity to run the last allocation till later. |
@dadgar We doesn't find nothing criminal in server logs at that moment . And absolutely nothing happened in hardware or network failure. As shown on screenshot above, nomad job was fully complete at about 6:40 MSK, then at 11:00 MSK begins absolutely wrong launch of job(it doesn't must happened at that time). To clarify the situation let me explain - nomad job launched every day at 03:00 MSK and works about 3 hours 30 minutes(this is a fist rectangle on screenshort - it's normal behavior and we can see that it completely finished, the second rectangle doesn't must appears, because scheduling doesn't must be launched for that job at that time) |
This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem 👍 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.5.6
Operating system and Environment details
Ubuntu 16.04 on aws instance
Issue
After some time nomad begins inproper launch of periodic jobs(in our case this happens already 2 times, so no any reason think that this will not happens again)
we have follow job definition which periodically launched(one time a day at
00:00 UTC
or03:00 MSK
) on 9 instances:Add time to time nomad launch additional job on only one instance. And this strange job looks like this:
To illustrate problem more clearly, here is a screen shot of our monitoring that illustrate problem more:
As you can see there only one instance launched. And this job is never stops because it wait for completion of all 9 instances, which can't be reached due only one in launched. In server logs there are not any messages that can clarify this behavior
We think about upgrade to 0.7.1 in our production (seems that GH-3201 solve this issue, but I'm not sure) but #3604 stops us from this step, because we have many autoscale jobs.
The text was updated successfully, but these errors were encountered: