-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add wait for num pods #2074
Add wait for num pods #2074
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2074 +/- ##
==========================================
- Coverage 76.00% 75.80% -0.20%
==========================================
Files 397 397
Lines 15311 15345 +34
Branches 2553 2553
==========================================
- Hits 11637 11633 -4
- Misses 2970 3008 +38
Partials 704 704
|
I also remove the test for the k8s implementation of deleteExecution. Mocking that would be pretty messy.
packages/teraslice/lib/cluster/services/cluster/backends/kubernetes/k8s.js
Outdated
Show resolved
Hide resolved
I have some new changes I intend to make after talking things through with Zach
|
I've added pRetry everywhere but POST. Which should be safe. I now drive the worker pod polling with the default shutdown timeout.
…nto add-waitForNumPods
I also link the worker deployment to the execution controller job. This allows k8s to garbage collect the worker deployment (and then pods) when the execution controller is deleted. This is good in general but also critical to the error handling in deleteExecution closes #1612
This is ready for review again. |
packages/teraslice/lib/cluster/services/cluster/backends/kubernetes/utils.js
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, error handling could use some improvements but I don't think it will cause an significant issue so we can defer it.
This changes the teraslice job shutdown process to be more structured. Previously we just "simultaneously" deleted the Worker deloyment and Execution Controller Job, which might leave us with the scenario described here:
#942 (comment)
To mitigate this we now do:
Note that this is in parallel with the other messaging based shutdown activities, those processes would have already been messaged to shut down, this change is a matter of managing the k8s resources (and by virtue of that, the Teraslice processes).