Skip to content

Releases: aws/aws-node-termination-handler

v1.18.1

28 Nov 17:16
82f3a25
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.18.0...v1.18.1

v1.18.0

21 Nov 23:22
62a9142
Compare
Choose a tag to compare

Improved logging in Queue Processor mode

v1.18.0 introduces the logFormatVersion Helm chart option, to allow you to opt-in to more detailed logs.

The default value is 1, which keeps logging the same way it did in prior releases (<= v1.17.3).

Setting the value to 2 will give you more detail about which AWS event triggered the cordon/drain. Previously, all these events were bucketed under SQS_TERMINATE and it was difficult to tell what was happening.

This option is also available as a command line flag, --log-format-version

What does the new logging look like?

logFormatVersion=2 modifies several Debug, Info, and Warn logs, as well as Kubernetes events emitted by NTH. These changes improve your observability about what NTH is doing when responding to events via SQS. If your monitoring system is configured to look for any of the specific strings in the tables below, you may need to modify your configuration to use the updated strings if you use the new log format version.

Changes to logs when starting up

  1. Remove event_type field from the Info log when starting a monitor; replace with monitor_type field, with new values. See Table 1.
  2. Remove event_type field from the Warn log when a monitor fails to start; replace with monitor_type field, with new values. See Table 1.

Changes to logs when processing an event

  1. New monitor field in the Info log. See Table 1.
  2. Potentially change value of kind field in the Info log, if running Queue Processor mode. See Table 2.
  3. Potentially change the "reason" field in the k8s event if running Queue Processor mode. See Table 3.

Changes to logs when receiving an SQS message

  1. Include the specific event type instead of SQS_TERMINATE in the Debug log if running Queue Processor mode. See Table 2.

Tables of changed values

Table 1: Monitor types
Previous New
REBALANCE_RECOMMENDATION REBALANCE_RECOMMENDATION_MONITOR
SCHEDULED_EVENT SCHEDULED_EVENT_MONITOR
SPOT_ITN SPOT_ITN_MONITOR
SQS_TERMINATE SQS_MONITOR
Table 2: Event types
Previous New
REBALANCE_RECOMMENDATION REBALANCE_RECOMMENDATION
SCHEDULED_EVENT SCHEDULED_EVENT
SPOT_ITN SPOT_ITN
SQS_TERMINATE REBALANCE_RECOMMENDATION SCHEDULED_EVENT SPOT_ITN STATE_CHANGE ASG_LIFECYCLE
Table 3: Event reasons
Previous reason New reason
RebalanceRecommendation RebalanceRecommendation
ScheduledEvent ScheduledEvent
SpotInterruption SpotInterruption
SQSTermination RebalanceRecommendation ScheduledEvent SpotInterruption StateChange ASGLifecycle

Commits with these changes

  • feat: emit pod events on drain by @trutx in #703
  • chore: add annotations to events in SQS mode by @trutx in #715
  • fix: show actual event kinds in Queue mode by @trutx and @cjerad in #725

Other changes

  • README: Clarify distinctions between IMDS and QP modes by @snay2 in #695
  • Clarify wording about using ASG tags. Fix broken docs link. by @snay2 in #721
  • Remove bespoke Prometheus helm chart and use the latest public release instead by @snay2 in #723
  • upgrade to Go 1.19 by @cjerad and @snay2 in #726

Full Changelog: v1.17.3...v1.18.0

v2.0.0-alpha

20 Oct 19:06
9504965
Compare
Choose a tag to compare
v2.0.0-alpha Pre-release
Pre-release

What's Changed

Full Changelog: 0ab461d...v2.0.0-alpha

v1.17.3

14 Sep 15:47
6efd62a
Compare
Choose a tag to compare

What's Changed

  • fix(sqs): log itn event messageid correctly by @acobaugh in #684
  • Add back deprecated flags for checking whether an instance is managed by NTH by @snay2 in #686

New Contributors

Full Changelog: v1.17.2...v1.17.3

v1.17.2

31 Aug 21:28
10edb16
Compare
Choose a tag to compare

What's Changed

  • Add optional pause prior to completing lifecycle action by @cjerad in #680

Full Changelog: v1.17.1...v1.17.2

v1.17.1

24 Aug 19:48
d4d447c
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.17.0...v1.17.1

v1.17.0

18 Aug 15:14
9f08b60
Compare
Choose a tag to compare

⚠️ Callouts ⚠️

These may be breaking changes, depending on your setup:

  • Remove calls to ASG APIs when determining whether NTH should manage an instance.
    • If you use ASGs but do not propagate tags to your EC2 instances, NTH may stop managing those instances. This is because NTH will now only check tags on the instance itself to determine whether NTH should manage that instance.
  • Deprecate two config values. Release v1.17.0 supports both configs, but you'll see a warning if you use the deprecated name. We may remove the deprecated configs altogether in a future release.
    • Deprecate CheckASGTagBeforeDraining and replace it with CheckTagBeforeDraining
    • Deprecate ManagedAsgTag replace it with ManagedTag

What's Changed

  • Filter managed non-ASG nodes by tag by @AustinSiu in #669
  • feat(observability): add eventID to exposed metrics by @cmotta2016 in #652
  • Update infra setup steps for multi-cluster by @AustinSiu in #653
  • Handle scheduled events immediately in IMDS mode, the same as queue processor mode by @snay2 in #661
  • chore(README): add hint about EKS managed node groups by @m00lecule in #664
  • Remove runAsUser in helm template for windows node by @pmcenery-bl in #663

New Contributors

Full Changelog: v1.16.5...v1.17.0

v1.16.5

31 May 16:25
f10d1f3
Compare
Choose a tag to compare

What's Changed

  • delete test notification by @cjerad in #644
  • Log IMDS response full text when status code not in 200 range by @snay2 in #645

Full Changelog: v1.16.4...v1.16.5

v1.16.4

18 May 19:21
c20dc36
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.16.3...v1.16.4

v1.16.3

11 May 20:38
25e0ed9
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.16.2...v1.16.3