Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWSMachine metadata can exceed safe sizes due to cluster-api-provider-aws-last-applied-tags values #4530

Closed
cnmcavoy opened this issue Sep 28, 2023 · 1 comment · Fixed by #4531
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@cnmcavoy
Copy link
Contributor

cnmcavoy commented Sep 28, 2023

/kind bug

What steps did you take and what happened:
AWSMachine annotation values for sigs.k8s.io/cluster-api-provider-aws-last-applied-tags seem to accumulate and never be cleared or reset when volumes are removed for a machine. This results in problems for longer-lived awsmachines, especially if they are used in a context with frequent volume churn (Gitlab CI), because this annotation value can become excessively large. We have several dozen AWSMachines where this value exceeds 20k length:

kubectl --cluster awscmhinfra2 -n capi-awscmhinfra2 get -o json awsmachine | jq -r '.items[] | select(.metadata.annotations | to_entries | any(.value | length > 20000)) | .metadata.name' | wc -l
28

Today we noticed three cases where the value exceeded 250k length, which prevented the resources from being converted to older versions by the conversion webhook:

kubectl get awsmachine.v1beta1.infrastructure.cluster.x-k8s.io worker-private-m6-mixed-az1-0a772868-9b8lf -oyaml -n capi-awscmhinfra2

Error from server: conversion webhook for infrastructure.cluster.x-k8s.io/v1beta2, Kind=AWSMachine returned invalid metadata: metadata.annotation: Too long: must have at most 262144 bytes

What did you expect to happen:
The number of elements stored in the sigs.k8s.io/cluster-api-provider-aws-last-applied-tags should match the number of volumes the corresponding node resource has attached. On one of the problematic AWSMachine resources, there were 1907 distinct volumes ids saved in the annotation values, but only 2 volumes present on the node.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-aws version: 2.2.1
  • Kubernetes version: (use kubectl version): 1.24.14
  • OS (e.g. from /etc/os-release): Ubuntu 20.04.6 LTS
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 28, 2023
@Ankitasw
Copy link
Member

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
3 participants