Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog scaler doesn't scale correctly to 0 #2759

Closed
Dudesons opened this issue Mar 14, 2022 · 7 comments
Closed

Datadog scaler doesn't scale correctly to 0 #2759

Dudesons opened this issue Mar 14, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@Dudesons
Copy link

Dudesons commented Mar 14, 2022

Report

We are configuring datadog scaler on queue workers to scale 0 to N.
Scale up to N work perfectly, scale down to 1 is also working but when we are expecting to scale to 0 we are experiencing strange behaviours.
If the query return 0 the scaler will keep the minimum to 1, if we "break" the metric the deployment go to 0.
By break I mean no data is sent to datadog, so we have no plot on the graph.

Expected Behavior

I expect if the datadog query return a value under the queryValue the scaler scale down to the min specify, in my case 0

Actual Behavior

The scaler is not scaling down to 0 when the minReplicas is set to 0 and if the query return a value under queryValue
The scaler cale to 0 only if there is no plot in the response from datadog

Steps to Reproduce the Problem

  1. you need a datadog account
  2. Deploy a pod with the manifest declared below
  3. then you can execute the following commands:
    • apt-get update && apt-get install nano -y
    • add default_type text/plain; in /etc/nginx/conf.d/default.conf
    • add the file call metrics in /usr/share/nginx/html/
    • you can run in the pod: service nginx reload

Deployment manifest (you can change the namespace):

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: test-metrics
  name: test-metrics
  namespace: prometheus
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: test-metrics
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        ad.datadoghq.com/prometheus-sidekiq-exporter.check_names: '["prometheus"]'
        ad.datadoghq.com/prometheus-sidekiq-exporter.init_configs: '[{}]'
        ad.datadoghq.com/prometheus-sidekiq-exporter.instances: |2-
                [
                  {
                    "prometheus_url": "http://%%host%%/metrics",
                    "namespace": "sidekiq",
                    "metrics": ["sidekiq*"]
                  }
                ]
      labels:
        app: test-metrics
    spec:
      containers:
      - image: nginx
        imagePullPolicy: Always
        name: prometheus-sidekiq-exporter
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

Content of the metrics file:

# HELP sidekiq_queue_enqueued_jobs The number of enqueued jobs in the queue.
# TYPE sidekiq_queue_enqueued_jobs gauge
sidekiq_queue_enqueued_jobs{name="mailers"} 0

In the metrics file you can change the value without reload as this one is a plain text file delivered by nginx

We use this scaling object:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  finalizers:
  - finalizer.keda.sh
spec:
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          policies:
          - periodSeconds: 60
            type: Pods
            value: 1
          selectPolicy: Min
          stabilizationWindowSeconds: 300
        scaleUp:
          policies:
          - periodSeconds: 10
            type: Pods
            value: 5
          selectPolicy: Max
          stabilizationWindowSeconds: 0
  cooldownPeriod: 300
  maxReplicaCount: 1
  minReplicaCount: 0
  pollingInterval: 30
  scaleTargetRef:
    name: async-mailer
  triggers:
  - authenticationRef:
      kind: ClusterTriggerAuthentication
      name: datadog-cluster-trigger-authentication
    metadata:
      age: "90"
      query: avg:sidekiq.sidekiq_queue_enqueued_jobs{name:mailers,env:integration}
      queryValue: "1"
      type: global
    type: datadog

Logs from KEDA operator

example

KEDA Version

2.6.1

Kubernetes Version

1.21

Platform

Google Cloud

Scaler Details

datadog

Anything else?

No response

@Dudesons Dudesons added the bug Something isn't working label Mar 14, 2022
@tomkerkhove tomkerkhove moved this to Proposed in Roadmap - KEDA Core Mar 14, 2022
@JorTurFer
Copy link
Member

Hi @Dudesons
Thanks for reporting that, I'd say that this issue is already solved in this PR.
Next version will have this fix :)
I'll close this issue for that reason, but if you have any other thing to all, feel free to reopen it 😄

Repository owner moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Mar 16, 2022
@Dudesons
Copy link
Author

Hi @JorTurFer
Ok thank you for the information, I was not sure this PR solved the issue.
Do you know when the release will be available or if an rc is available ?

@Dudesons
Copy link
Author

@JorTurFer I just test the version with main tag and my issue is still here.
When I'm reading the code my issue is not related when datadog is not returning data.
My issue is when I'm using the datadog scaler is not downscaling to 0.
It works like a charm 1 to N and N to 1.
The only method I found to downscale to 0 is when I'm "breaking the metrics" in datadog at this moment the scaler go to 0

Can you reopen the issue ?

@JorTurFer JorTurFer reopened this Mar 16, 2022
Repository owner moved this from Ready To Ship to Proposed in Roadmap - KEDA Core Mar 16, 2022
@zroubalik
Copy link
Member

This might be the problem:

return num > float64(s.metadata.queryValue), nil

it should be num > 0, I mistaken @arapulido on this one. Will fix this.

@zroubalik
Copy link
Member

This might be the problem:

return num > float64(s.metadata.queryValue), nil

it should be num > 0, I mistaken @arapulido on this one. Will fix this.

#2798

@Dudesons
Copy link
Author

Ok thank you @zroubalik I will do a test next monday and share the result

@Dudesons
Copy link
Author

Finally I had time to test it and it works as expected 😃
Thank you for the fix

Repository owner moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Mar 24, 2022
@tomkerkhove tomkerkhove moved this from Ready To Ship to Done in Roadmap - KEDA Core Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

3 participants