Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reordering of metrics in controller HPA resource is causing sync problems #10038

Closed
peters5 opened this issue Jun 5, 2023 · 14 comments · Fixed by #10043
Closed

Reordering of metrics in controller HPA resource is causing sync problems #10038

peters5 opened this issue Jun 5, 2023 · 14 comments · Fixed by #10043
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@peters5
Copy link

peters5 commented Jun 5, 2023

What happened:

When upgrading the ingress-nginx Helm chart to version 4.7.0 we got the following OutOfSync problem in ArgoCD when using the controller HPA resource:

image

This problem arises because of commit 06612e6 (PR: #9521) where the order of cpu and memory in the controller HPA resource was switched.

In The ArgoCD project there are also some issues on this topic (like this: argoproj/argo-cd#7846).
The origin of this problem is the HPA controller in Kubernetes which can reorder the metrics list (here is also an issue in upstream Kubernetes: kubernetes/kubernetes#74099)

I think the best idea would be to revert the commit that did the reordering and keep the memory before cpu in the list.

What you expected to happen:

ArgoCD should not show OutOfSync after the Helm chart upgrade

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.8.0
  Build:         35f5082ee7f211555aaff431d7c4423c17f8ce9e
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.10", GitCommit:"5c1d2d4295f9b4eb12bfbf6429fdf989f2ca8a02", GitTreeState:"clean", BuildDate:"2023-01-27T22:54:20Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: Azure

  • OS (e.g. from /etc/os-release): AKSUbuntu-1804

  • How was the ingress-nginx-controller installed:
    With Helm chart in ArgoCD

    Values:

    ingress-nginx:
      controller:
        autoscaling:
          enabled: true
          maxReplicas: 3
          targetCPUUtilizationPercentage: 80
          targetMemoryUtilizationPercentage: 80

How to reproduce this issue:

Deploy the latest ingress-nginx Helm chart v4.7.0 with controller autoscaling being enabled via ArgoCD and check the ArgoCD UI. It will always show you an OutOfSync on the HPA resource.

Anything else we need to know:

No

@peters5 peters5 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 5, 2023
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Jun 5, 2023
@longwuyuan
Copy link
Contributor

Since it improves ArgoCD integration and since its a cosmetic change as far as the controller's hpa template goes, I think we can change this asap.

I will do the change and update. Whatout for PR here.
/assign
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 5, 2023
@longwuyuan
Copy link
Contributor

Just to be clear, the commit you asked to be reverted is critically requirement so its not the commit that will be reverted. I will just interchange the current order of cpu & mem spec in the template. Like just pupt memory first and cpu next.

@peters5
Copy link
Author

peters5 commented Jun 5, 2023

Yes sure, you are right. The other changes in this commit are totally fine of course :-)

@M0dj0
Copy link

M0dj0 commented Jun 5, 2023

Same here :)
I look forward for a fix and will do a downgrade to nginx-ingress helm version 4.6.1 at first.

@longwuyuan
Copy link
Contributor

@peters5 you can see a 4 line interchange line-numbers PR above. I have done this even without digging into the reordering metrics list capability of HPA.

Can you install chart from this branch of my fork and test to see results. meaning hope there is no, hitherto unknown side affect surprise from this 4 line up/down swap.

@ricardojdsilva87
Copy link

ricardojdsilva87 commented Jun 16, 2023

Hello @longwuyuan,
I am having the exact same problem on both HPA for the controller and the backend
image

I think what happens is that kubernetes applies the manifest as it is but argocd for some reason reorders alphabetically the components in the metrics, making the cpu the first in the list and then the memory.
@peters5 I think the same was happening in your screenshot, the left the state of the cluster and on the right what argocd compares to. Argocd applies on the cluster the order that the helm chart has, in this case memory first, but when comparing it compares with cpu first
https://github.com/kubernetes/ingress-nginx/blob/main/charts/ingress-nginx/templates/controller-hpa.yaml#L27

Maybe this was correct and needs to be changed back again?
https://github.com/kubernetes/ingress-nginx/pull/10043/files

For the moment the only way we have is to ignore these differences on argocd.
Also I think that something should be fixed also on argocd just to match the template as it is without reordering it

Thanks!

@longwuyuan
Copy link
Contributor

@peters5 and @ricardojdsilva87 , my request is that you please test it using this change in your local environment.
I will need to deploy and investigate if what @ricardojdsilva87 is saying is true or not. But that is a lot of effort and complication so first it is best to get feedback from you and only later dive into rabbit hole.

/re-open
/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jun 17, 2023
@longwuyuan
Copy link
Contributor

/reopen

@k8s-ci-robot
Copy link
Contributor

@longwuyuan: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jun 17, 2023
@MrCorncob
Copy link

I'm facing the same problem with argocd

@ricardojdsilva87
Copy link

Hello again,
I've left a comment on the argocd project regarding this:
argoproj/argo-cd#7846 (comment)
I think that this issue should be fixed on argocd otherwise all the external projects would need to order alphabetically their HPAs to not be detected the drift on ArgoCD.
A quick fix in this case for nginx would be ordering the settings on the HPA.
I have tested this on a custom HPA we have and if we have it not ordered alphabetically argocd will detect the drift.

Thanks!

@github-actions
Copy link

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 25, 2023
dihnatenkopinup added a commit to dihnatenkopinup/k8app that referenced this issue Feb 20, 2024
oprokhorenko pushed a commit to pin-up-global/k8app that referenced this issue Feb 21, 2024
* upd: hpa reverse items ordering

kubernetes/ingress-nginx#10038

* upd: bump chart version
@Gacko
Copy link
Member

Gacko commented Apr 5, 2024

This already got fixed. Closing for now.

/close

@k8s-ci-robot
Copy link
Contributor

@Gacko: Closing this issue.

In response to this:

This already got fixed. Closing for now.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants