Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: CA on fargate causing log flood #5887

Conversation

vadasambar
Copy link
Member

@vadasambar vadasambar commented Jun 23, 2023

  • happens when CA tries to check if the unmanaged fargate node is a part of ASG (it isn't)
  • and keeps on logging error
    Signed-off-by: vadasambar [email protected]

What type of PR is this?

/kind bug

What this PR does / why we need it:

Fixes #5842 (and adds tests)

  • Scenario: EKS cluster running on fargate but uses CA to scale some non-fargate nodegroups
  • Problem: CA reads all K8s nodes including fargate ones and checks if the instance is present in AWS (by calling HasInstance). This leads to error and causes log flooding (fargate is not a part of any ASG).

Which issue(s) this PR fixes:

Fixes #5842

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed: CA tries to check if unmanaged fargate node exists in AWS and keeps on logging error message

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 23, 2023
@k8s-ci-robot k8s-ci-robot requested a review from drmorr0 June 23, 2023 17:55
@k8s-ci-robot k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Jun 23, 2023
@k8s-ci-robot k8s-ci-robot requested a review from jaypipes June 23, 2023 17:55
@k8s-ci-robot k8s-ci-robot added area/cluster-autoscaler size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 23, 2023
@@ -55,16 +56,21 @@ func newTestAwsManagerWithMockServices(mockAutoScaling autoScalingI, mockEC2 ec2
autoscalingOptions: make(map[AwsRef]map[string]string),
},
}

if instanceStatus != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To pass custom instanceStatus so that it is easy to test HasInstance()

Name: "node-XXX",
},
Spec: apiv1.NodeSpec{
ProviderID: "aws:///us-east-1a/test-instance-id-2",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This node has a provider ID which is not present in the asg cache.

- happens when CA tries to check if the unmanaged fargate node is a part of ASG (it isn't)
- and keeps on logging error
Signed-off-by: vadasambar <[email protected]>
Signed-off-by: vadasambar <[email protected]>
@vadasambar vadasambar force-pushed the fix/5842/fargate-nodes-causing-logs-flood branch from 3f9749b to ec01059 Compare June 26, 2023 18:06
@gjtempleton
Copy link
Member

/assign @gjtempleton

@gjtempleton
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 10, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gjtempleton, vadasambar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 10, 2023
@k8s-ci-robot k8s-ci-robot merged commit 1ce7553 into kubernetes:master Jul 10, 2023
@tri-bao
Copy link

tri-bao commented Aug 20, 2023

May I know which version contains this fix? I've just installed version 1.27.2 on an EKS cluster containing Fargate workload and getting a lot of such warning, the more fargate nodes the more warning messages printed.

The version 1.27.3 was out but the latest helm chart at the time of this writing (9.29.1) is still using version 1.27.2

@vadasambar
Copy link
Member Author

vadasambar commented Aug 25, 2023

@tri-bao sorry for the delay.

May I know which version contains this fix?

The fix is only in master branch right now. It will be there in the new release of cluster-autoscaler.

If you want, I can create cherry-pick PR for 1.27.

grosser pushed a commit to zendesk/autoscaler that referenced this pull request Sep 7, 2023
…odes-causing-logs-flood

fix: CA on fargate causing log flood
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS provider warning Failed to check cloud provider has instance for fargate is flooding the log
4 participants