Releases: aws/aws-node-termination-handler
v1.18.1
v1.18.0
Improved logging in Queue Processor mode
v1.18.0
introduces the logFormatVersion
Helm chart option, to allow you to opt-in to more detailed logs.
The default value is 1
, which keeps logging the same way it did in prior releases (<= v1.17.3
).
Setting the value to 2
will give you more detail about which AWS event triggered the cordon/drain. Previously, all these events were bucketed under SQS_TERMINATE
and it was difficult to tell what was happening.
This option is also available as a command line flag, --log-format-version
What does the new logging look like?
logFormatVersion=2
modifies several Debug, Info, and Warn logs, as well as Kubernetes events emitted by NTH. These changes improve your observability about what NTH is doing when responding to events via SQS. If your monitoring system is configured to look for any of the specific strings in the tables below, you may need to modify your configuration to use the updated strings if you use the new log format version.
Changes to logs when starting up
- Remove
event_type
field from the Info log when starting a monitor; replace withmonitor_type
field, with new values. See Table 1. - Remove
event_type
field from the Warn log when a monitor fails to start; replace withmonitor_type
field, with new values. See Table 1.
Changes to logs when processing an event
- New
monitor
field in the Info log. See Table 1. - Potentially change value of
kind
field in the Info log, if running Queue Processor mode. See Table 2. - Potentially change the "reason" field in the k8s event if running Queue Processor mode. See Table 3.
Changes to logs when receiving an SQS message
- Include the specific event type instead of
SQS_TERMINATE
in the Debug log if running Queue Processor mode. See Table 2.
Tables of changed values
Table 1: Monitor types
Previous | New |
---|---|
REBALANCE_RECOMMENDATION |
REBALANCE_RECOMMENDATION_MONITOR |
SCHEDULED_EVENT |
SCHEDULED_EVENT_MONITOR |
SPOT_ITN |
SPOT_ITN_MONITOR |
SQS_TERMINATE |
SQS_MONITOR |
Table 2: Event types
Previous | New |
---|---|
REBALANCE_RECOMMENDATION |
REBALANCE_RECOMMENDATION |
SCHEDULED_EVENT |
SCHEDULED_EVENT |
SPOT_ITN |
SPOT_ITN |
SQS_TERMINATE |
REBALANCE_RECOMMENDATION SCHEDULED_EVENT SPOT_ITN STATE_CHANGE ASG_LIFECYCLE |
Table 3: Event reasons
Previous reason | New reason |
---|---|
RebalanceRecommendation |
RebalanceRecommendation |
ScheduledEvent |
ScheduledEvent |
SpotInterruption |
SpotInterruption |
SQSTermination |
RebalanceRecommendation ScheduledEvent SpotInterruption StateChange ASGLifecycle |
Commits with these changes
- feat: emit pod events on drain by @trutx in #703
- chore: add annotations to events in SQS mode by @trutx in #715
- fix: show actual event kinds in Queue mode by @trutx and @cjerad in #725
Other changes
- README: Clarify distinctions between IMDS and QP modes by @snay2 in #695
- Clarify wording about using ASG tags. Fix broken docs link. by @snay2 in #721
- Remove bespoke Prometheus helm chart and use the latest public release instead by @snay2 in #723
- upgrade to Go 1.19 by @cjerad and @snay2 in #726
Full Changelog: v1.17.3...v1.18.0
v2.0.0-alpha
What's Changed
- add v2 scaffolding by @cjerad in #577
- replace kustomize with helm by @cjerad in #587
- Knative webhook by @cjerad in #596
- NTHv2 core functionality by @cjerad in #612
- Dynamically download toolchain and build symlinks by @cjerad in #622
- add workflow to run test suite by @cjerad in #623
- add node label selector to Terminator by @cjerad in #625
- add event action config to Terminator by @cjerad in #628
- add webhook config to Terminator by @cjerad in #641
- add make target explore-test-coverage by @cjerad in #646
- add dev setup guide by @cjerad in #656
- update dev guide and resources by @cjerad in #660
- add 'getting started' section to README.md by @cjerad in #671
- add prepare-for-release.sh script by @cjerad in #691
- add downloaded binaries to PATH by @cjerad in #692
- add build-and-push-images.sh script by @cjerad in #693
- add upload-resources-to-github.aaakk.us.kg script by @cjerad in #696
- add sync-to-aws-eks-charts.sh script by @cjerad in #697
- add sync-readme-to-ecr-public.sh script by @cjerad in #698
- add release workflow by @cjerad in #699
- update third-party licenses list by @cjerad in #700
- miscellaneous clean up by @cjerad in #702
- 🥑🤖 v2.0.0-alpha release prep by @cjerad in #707
- handle undefined variable in release scripts by @cjerad in #708
- multiple fixes by @cjerad in #709
- update helm crd by @cjerad in #711
- update gh path within downloaded archive by @cjerad in #712
- check that config file exists before move by @cjerad in #713
- set git identity in sync-to-aws-eks-charts.sh by @cjerad in #714
Full Changelog: 0ab461d...v2.0.0-alpha
v1.17.3
v1.17.2
v1.17.1
What's Changed
- divide log output between stdout and stderr by @cjerad in #676
- helm: Apply extraEnv to daemonsets by @hamishforbes in #674
Full Changelog: v1.17.0...v1.17.1
v1.17.0
⚠️ Callouts ⚠️
These may be breaking changes, depending on your setup:
- Remove calls to ASG APIs when determining whether NTH should manage an instance.
- If you use ASGs but do not propagate tags to your EC2 instances, NTH may stop managing those instances. This is because NTH will now only check tags on the instance itself to determine whether NTH should manage that instance.
- Deprecate two config values. Release
v1.17.0
supports both configs, but you'll see a warning if you use the deprecated name. We may remove the deprecated configs altogether in a future release.- Deprecate
CheckASGTagBeforeDraining
and replace it withCheckTagBeforeDraining
- Deprecate
ManagedAsgTag
replace it withManagedTag
- Deprecate
What's Changed
- Filter managed non-ASG nodes by tag by @AustinSiu in #669
- feat(observability): add eventID to exposed metrics by @cmotta2016 in #652
- Update infra setup steps for multi-cluster by @AustinSiu in #653
- Handle scheduled events immediately in IMDS mode, the same as queue processor mode by @snay2 in #661
- chore(README): add hint about EKS managed node groups by @m00lecule in #664
- Remove runAsUser in helm template for windows node by @pmcenery-bl in #663
New Contributors
- @cmotta2016 made their first contribution in #652
- @m00lecule made their first contribution in #664
- @pmcenery-bl made their first contribution in #663
Full Changelog: v1.16.5...v1.17.0
v1.16.5
v1.16.4
v1.16.3
What's Changed
- Remove
AssumeAsgTagPropagation
by @brycahta in #632 - Fix AWS Health Event Bridge Rule by @LikithaVemulapalli in #633
- Remove community meeting from ReadMe by @brycahta in #634
New Contributors
- @LikithaVemulapalli made their first contribution in #633
Full Changelog: v1.16.2...v1.16.3