-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s - Add force parameter to Teraslice Job stop API that force deletes things #2670
Comments
We had a discussion about this issue today, here are some of my thoughts from that discussion and after the fact:
|
I've spent some time playing with the
There are multiple scenarios that we are trying to fix:
|
I missed something earlier:
|
According to this issue the behavior of grace period is different between v1.18 and v1.27 than before or after those versions: "In fact, processes gets killed if we set --grace-period=0 in ver1.18.
" The behavior was changed back in v1.28. |
When using |
I think I managed to accidentally recreate the scenario this issue was for. I think the scenario was:
here's the state of the k8s ressources
Here's snippets of master logs
snippet of ts-exc log
snippet of worker log
|
I think this is the key clue:
|
And the key to being able to reproduce this easily ourselves is this log line:
Specifically |
I will definitely give this a try. |
I was given a new way to try and reproduce one of these situations. With an elasticsearch reader, give it a query that is invalid. Here's a sample
I think an operation like this would work:
It's not clear if the job didnt ultimately clean it's workers up on its own. |
This is completed and shown to work. |
I think I have identified a repeatable failure scenario where force stop does not work.
I think the problem is that the second time the job is started it has a new I distinctly remember telling @busma13 to change the way he was doing things to target the This is pretty obvious if you pay close attention to master logs and see that the We should probably fix this force stop to delete by
While k8s resources matching a different |
When addressing the issue above, please make the following additional changes:
|
This is done. We want to eventually pull force out into its own endpoint. See #3519 |
It would be nice if k8s pods could be forcibly deleted as an option using the Teraslice job stop API.
Currently when stopping jobs it's like
perhaps an option like this:
I'd have to take a look at the k8s library I am using and see how I am using it and if force parameters can be added to these API calls, but the
kubectl
equivalant would be:I'm not sure offhand how that works when deleting k8s jobs and deployments.
The text was updated successfully, but these errors were encountered: