AWSMachine metadata can exceed safe sizes due to cluster-api-provider-aws-last-applied-tags values #4530

cnmcavoy · 2023-09-28T16:58:27Z

/kind bug

What steps did you take and what happened:
AWSMachine annotation values for sigs.k8s.io/cluster-api-provider-aws-last-applied-tags seem to accumulate and never be cleared or reset when volumes are removed for a machine. This results in problems for longer-lived awsmachines, especially if they are used in a context with frequent volume churn (Gitlab CI), because this annotation value can become excessively large. We have several dozen AWSMachines where this value exceeds 20k length:

kubectl --cluster awscmhinfra2 -n capi-awscmhinfra2 get -o json awsmachine | jq -r '.items[] | select(.metadata.annotations | to_entries | any(.value | length > 20000)) | .metadata.name' | wc -l
28

Today we noticed three cases where the value exceeded 250k length, which prevented the resources from being converted to older versions by the conversion webhook:

kubectl get awsmachine.v1beta1.infrastructure.cluster.x-k8s.io worker-private-m6-mixed-az1-0a772868-9b8lf -oyaml -n capi-awscmhinfra2

Error from server: conversion webhook for infrastructure.cluster.x-k8s.io/v1beta2, Kind=AWSMachine returned invalid metadata: metadata.annotation: Too long: must have at most 262144 bytes

What did you expect to happen:
The number of elements stored in the sigs.k8s.io/cluster-api-provider-aws-last-applied-tags should match the number of volumes the corresponding node resource has attached. On one of the problematic AWSMachine resources, there were 1907 distinct volumes ids saved in the annotation values, but only 2 volumes present on the node.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-aws version: 2.2.1
Kubernetes version: (use kubectl version): 1.24.14
OS (e.g. from /etc/os-release): Ubuntu 20.04.6 LTS

The text was updated successfully, but these errors were encountered:

Ankitasw · 2023-09-29T07:04:21Z

/triage accepted
/priority important-soon

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 28, 2023

cnmcavoy mentioned this issue Sep 28, 2023

Fix cluster-api-provider-last-applied-tags-on-volumes to recalculate the value on each reconcile #4531

Merged

4 tasks

k8s-ci-robot closed this as completed in #4531 Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWSMachine metadata can exceed safe sizes due to cluster-api-provider-aws-last-applied-tags values #4530

AWSMachine metadata can exceed safe sizes due to cluster-api-provider-aws-last-applied-tags values #4530

cnmcavoy commented Sep 28, 2023 •

edited

Loading

Ankitasw commented Sep 29, 2023

AWSMachine metadata can exceed safe sizes due to cluster-api-provider-aws-last-applied-tags values #4530

AWSMachine metadata can exceed safe sizes due to cluster-api-provider-aws-last-applied-tags values #4530

Comments

cnmcavoy commented Sep 28, 2023 • edited Loading

Ankitasw commented Sep 29, 2023

cnmcavoy commented Sep 28, 2023 •

edited

Loading