Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Metric-Based “waitUntil-Like” Behavior in KEDA HTTP Add-on #1234

Open
kahirokunn opened this issue Jan 12, 2025 · 0 comments

Comments

@kahirokunn
Copy link
Contributor

kahirokunn commented Jan 12, 2025

Proposal

Describe the Feature

I would like to request functionality in the KEDA HTTP Add-on that supports asynchronous “post-response” tasks before scaling down a pod. This is similar to the “waitUntil()” concept present in some application frameworks (e.g., Vercel Functions), but implemented at the infrastructure/auto-scaling layer through custom metrics.

Context

Currently, in some serverless platforms and application frameworks, an API can return a response immediately while scheduling asynchronous tasks (e.g., logging, analytics, cache updates) to run in the background. In order to prevent these tasks from being terminated prematurely (e.g., when a pod is about to spin down), some form of coordination is needed so that the auto-scaler knows there are still in-flight tasks.

In the KEDA HTTP Add-on world, we can’t literally provide a “waitUntil()” function—because that involves application-level code. Instead, the Add-on could expose or respect a Prometheus (or similar) metric that indicates outstanding background tasks. Only when this metric reaches zero can the pod be considered safe to scale down.

Proposed Approach

  1. The application itself tracks how many “post-response” tasks are currently in flight.
  2. It serves a Prometheus metric (e.g., via an endpoint like /metrics) indicating that count.
  3. KEDA HTTP Add-on is configured to allow scale-in (down to zero or removing pods) only if this metric is zero.

Prometheus Metric Example

Below is a simple example of how you might expose this metric in a Go application (the same idea can be used in any language). You could name it something like “myapp_background_tasks_in_flight”:

package main

import (
    "fmt"
    "net/http"
    "sync/atomic"
)

var tasksInFlight int64

func main() {
    http.HandleFunc("/do-something", func(w http.ResponseWriter, r *http.Request) {
        // Do your normal request handling here
        w.Write([]byte("OK\n"))

        // Start a background task
        atomic.AddInt64(&tasksInFlight, 1)
        go func() {
            defer atomic.AddInt64(&tasksInFlight, -1)
            // ... do some logging, analytics, etc.
        }()
    })

    // Expose metrics
    http.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
        metricFmt := "# HELP myapp_background_tasks_in_flight Count of background tasks.\n"
        metricFmt += "# TYPE myapp_background_tasks_in_flight gauge\n"
        metricFmt += fmt.Sprintf("myapp_background_tasks_in_flight %d\n", atomic.LoadInt64(&tasksInFlight))
        w.Write([]byte(metricFmt))
    })

    http.ListenAndServe(":8080", nil)
}

When there are three background tasks running, the /metrics endpoint might show:

# HELP myapp_background_tasks_in_flight Count of background tasks.
# TYPE myapp_background_tasks_in_flight gauge
myapp_background_tasks_in_flight 3

KEDA HTTP Add-on Configuration Sketch

If KEDA HTTP Add-on supported a configuration parameter (e.g., “scaleDownWhenZero: myapp_background_tasks_in_flight”), it would look like:

apiVersion: keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
  name: myapp-http-scaler
spec:
  host: "example.com"
  rules:
    - name: myapp
      scaleDownMetric:
        metricName: "myapp_background_tasks_in_flight"
        mustBeZero: true
      # ... other standard config, placeholders, etc.
  # ...

• “metricName: myapp_background_tasks_in_flight” references the Prometheus metric.
• “mustBeZero: true” means: do not scale down if its current value is > 0.

This ensures no pods will be terminated as long as background tasks are in progress.

Use-Case

  1. Logging: Avoid truncated logs by ensuring the logging process in the background finishes.
  2. Analytics: Send analytics data asynchronously and reliably, even in bursty traffic environments.
  3. Cache Updates: Update and invalidate caches asynchronously without risking partial updates if the pod shuts down too soon.

Benefits

  • Improved Performance: Responses are sent immediately, while heavier tasks happen post-response.
  • Efficient Resource Usage: Pods only remain alive if there are still tasks in flight; no guesswork.
  • Better Developer Experience: Infrastructure “knows” not to kill pods while there are unfinished tasks.

Conclusion

By adding a way to respect a “tasks in flight” metric, KEDA HTTP Add-on would let developers perform post-response tasks without risking termination during critical background work. It avoids implementing an application-specific “waitUntil()” method and cleanly leverages existing Prometheus monitoring. This feature would fulfill a similar role as waitUntil() in other environments—ensuring that asynchronous chores complete before a pod is scaled down.

Thank you for considering this request!

Is this a feature you are interested in implementing yourself?

No

Anything else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To Triage
Development

No branches or pull requests

1 participant