Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for managed resources count #4031

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

oliviassss
Copy link
Collaborator

@oliviassss oliviassss commented Jan 22, 2025

Issue

Description

This PR adds the metrics of managed resources count, for ingress, service type of load balancer (nlb), targetgroupbinding. And get the count of AWS resources like ALB and NLB via resourcegrouptagging API. Adds the tag:GetResources iam policy in the template.

Test

Created 4 ingresses (2 of them are in the same ingress group), and 2 service type of load balancer (nlb). Verified the prometheus metrics are as expected

dev-dsk-sonyingy-2c-61cf589a % k get ingress -A
NAMESPACE   NAME              CLASS   HOSTS   ADDRESS                                                                     PORTS   AGE
default     nginx-ingress-1   alb     *       internal-k8s-myinggroup-0af676b930-1495877518.us-west-2.elb.amazonaws.com   80      12m
default     nginx-ingress-2   alb     *       internal-k8s-myinggroup-0af676b930-1495877518.us-west-2.elb.amazonaws.com   80      12m
game-2048   ingress-2048      alb     *       k8s-game2048-ingress2-4d86c6a92e-1406978370.us-west-2.elb.amazonaws.com     80      13m
game-2048   ingress-2048-2    alb     *       k8s-game2048-ingress2-8d165b56df-1236487848.us-west-2.elb.amazonaws.com     80      16m

(25-01-22 1:35:31) <0> [~/EKS/LBC]
dev-dsk-sonyingy-2c-61cf589a % k get svc -A | grep LoadBalancer
default       ip-nlb-svc-01                       LoadBalancer   10.100.30.169    k8s-default-ipnlbsvc-c34fd4b56d-e30509379b365c24.elb.us-west-2.amazonaws.com   80:32320/TCP             9m4s
default       ip-nlb-svc-02                       LoadBalancer   10.100.210.187   k8s-default-ipnlbsvc-299bc353da-c2da5e41643e1f18.elb.us-west-2.amazonaws.com   80:32425/TCP             8m54s

(25-01-22 1:35:42) <0> [~/EKS/LBC]
dev-dsk-sonyingy-2c-61cf589a % k get targetgroupbindings.elbv2.k8s.aws -A
NAMESPACE   NAME                               SERVICE-NAME    SERVICE-PORT   TARGET-TYPE   AGE
default     k8s-default-ipnlbsvc-725fc8e5fe    ip-nlb-svc-02   80             ip            8m56s
default     k8s-default-ipnlbsvc-9b749841f7    ip-nlb-svc-01   80             ip            9m6s
default     k8s-default-nginxsvc-06b13125f0    nginx-svc03     80             ip            12m
default     k8s-default-nginxsvc-563deb1176    nginx-svc03     80             ip            12m
game-2048   k8s-game2048-service2-446c26be3b   service-2048    80             ip            16m
game-2048   k8s-game2048-service2-6abcfb7ee1   service-2048    80             ip            13m

In the metrics I have:

dev-dsk-sonyingy-2c-61cf589a % curl http://localhost:8080/metrics | grep lb_controller_managed
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  103k    0  1# HELP lb_controller_managed_albs_total Current number of ALBs managed by the controller
03# TYPE lb_controller_managed_albs_total gauge
lb_controller_managed_albs_total 3
 # HELP lb_controller_managed_ingress_count Number of ingresses managed by the AWS Load Balancer Controller.
 # TYPE lb_controller_managed_ingress_count gauge
 lb_controller_managed_ingress_count 4
 0# HELP lb_controller_managed_nlbs_total Current number of NLBs managed by the controller
  # TYPE lb_controller_managed_nlbs_total gauge
 lb_controller_managed_nlbs_total 2
  # HELP lb_controller_managed_service_count Number of service type Load Balancers (NLBs) managed by the AWS Load Balancer Controller.
0# TYPE lb_controller_managed_service_count gauge
 lb_controller_managed_service_count 2
  # HELP lb_controller_managed_targetgroupbinding_count Number of targetgroupbindings managed by the AWS Load Balancer Controller.
3# TYPE lb_controller_managed_targetgroupbinding_count gauge
3lb_controller_managed_targetgroupbinding_count 6

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: oliviassss

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 22, 2025
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 22, 2025
docs/install/iam_policy.json Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
main.go Outdated
select {
case <-ticker.C:
// Update managed resource metrics
err := lbcMetricsCollector.UpdateManagedK8sResourceMetrics(context.Background())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collecting all these resources during the same tick might lead to sparse metrics. I would suggest a ticker per resource to improve performance and metric reliability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, will do 3 tickers - 1 for k8s resources, 1 for ALB and 1 for NLB. Just in case the API call has latency, but it should be rare.

Copy link
Collaborator Author

@oliviassss oliviassss Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zac-nixon Hi, I though twice but decided to keep them in the same ticker, because I'd like to have all the metrics to be updated in one loop. Though I increased the ticker to 2min, to reduce unnecessary calls, as we don't expect a super timely metrics. Also added a TODO to update the metrics per reconciliation.

@oliviassss oliviassss added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 30, 2025
@oliviassss oliviassss removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants