-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taint nodes on spot and scheduled events #162
Taint nodes on spot and scheduled events #162
Conversation
Fixes aws#160. Signed-off-by: Ilya Shaisultanov <[email protected]>
@bwagner5 I see the build failed but not all of the tests – unsure if related to my changes here or not. Do you know how I could debug those? |
@diversario I believe it's because the cluster role does not have taint permissions. https://github.com/aws/aws-node-termination-handler/blob/master/config/helm/aws-node-termination-handler/templates/clusterrole.yaml You can run the e2e tests locally:
or all tests:
Also, looks like some small formatting issues:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add permissions to cluster role and run goimports -w ./ && gofmt -s -w ./
If all the tests except helm-sync passes, I can take the action to fix that. It's just a tedious task of syncing the helm chart to the aws/eks-charts repo.
Is it possible for me to see the build logs? I don't think it's a permissions issue because NTH can already add labels to nodes. I checked
which looks pretty much like the NTH's one:
I'm currently trying to get some view into what's failing from the |
For scheduled event cancellation test, add tolerations to the proxy to allow it to schedule to emit the cancellation event.
Codecov Report
@@ Coverage Diff @@
## master #162 +/- ##
===========================================
- Coverage 94.64% 84.22% -10.43%
===========================================
Files 8 8
Lines 635 748 +113
===========================================
+ Hits 601 630 +29
- Misses 22 99 +77
- Partials 12 19 +7
Continue to review full report at Codecov.
|
@@ -363,3 +434,112 @@ func jsonPatchEscape(value string) string { | |||
value = strings.Replace(value, "~", "~0", -1) | |||
return strings.Replace(value, "/", "~1", -1) | |||
} | |||
|
|||
func addTaint(node *corev1.Node, nth Node, taintKey string, taintValue string, effect corev1.TaintEffect) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, addTaintToSpec
and cleanTaint
are lifted from cluster-autoscaler
with minor modifications.
@@ -80,7 +94,9 @@ helm upgrade --install $CLUSTER_NAME-emtp $SCRIPTPATH/../../config/helm/ec2-meta | |||
--set ec2MetadataTestProxy.enableScheduledMaintenanceEvents="true" \ | |||
--set ec2MetadataTestProxy.enableSpotITN="false" \ | |||
--set ec2MetadataTestProxy.scheduledEventStatus="canceled" \ | |||
--set ec2MetadataTestProxy.port="$IMDS_PORT" | |||
--set ec2MetadataTestProxy.port="$IMDS_PORT" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the worker node is now tainted, the proxy won't schedule there unless it tolerates the taint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work on this! I just had a few comments in-line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @diversario ! 🚀
Issue #, if available: #160
Description of changes:
Upon receiving an event, taint the affect node with the appropriate taint. Remove the taint if event is cancelled or completed.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.