Feature Request: Metric-Based “waitUntil-Like” Behavior in KEDA HTTP Add-on #1234

kahirokunn · 2025-01-12T06:43:15Z

Proposal

Describe the Feature

I would like to request functionality in the KEDA HTTP Add-on that supports asynchronous “post-response” tasks before scaling down a pod. This is similar to the “waitUntil()” concept present in some application frameworks (e.g., Vercel Functions), but implemented at the infrastructure/auto-scaling layer through custom metrics.

Context

Currently, in some serverless platforms and application frameworks, an API can return a response immediately while scheduling asynchronous tasks (e.g., logging, analytics, cache updates) to run in the background. In order to prevent these tasks from being terminated prematurely (e.g., when a pod is about to spin down), some form of coordination is needed so that the auto-scaler knows there are still in-flight tasks.

In the KEDA HTTP Add-on world, we can’t literally provide a “waitUntil()” function—because that involves application-level code. Instead, the Add-on could expose or respect a Prometheus (or similar) metric that indicates outstanding background tasks. Only when this metric reaches zero can the pod be considered safe to scale down.

Proposed Approach

The application itself tracks how many “post-response” tasks are currently in flight.
It serves a Prometheus metric (e.g., via an endpoint like /metrics) indicating that count.
KEDA HTTP Add-on is configured to allow scale-in (down to zero or removing pods) only if this metric is zero.

Prometheus Metric Example

Below is a simple example of how you might expose this metric in a Go application (the same idea can be used in any language). You could name it something like “myapp_background_tasks_in_flight”:

package main

import (
    "fmt"
    "net/http"
    "sync/atomic"
)

var tasksInFlight int64

func main() {
    http.HandleFunc("/do-something", func(w http.ResponseWriter, r *http.Request) {
        // Do your normal request handling here
        w.Write([]byte("OK\n"))

        // Start a background task
        atomic.AddInt64(&tasksInFlight, 1)
        go func() {
            defer atomic.AddInt64(&tasksInFlight, -1)
            // ... do some logging, analytics, etc.
        }()
    })

    // Expose metrics
    http.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
        metricFmt := "# HELP myapp_background_tasks_in_flight Count of background tasks.\n"
        metricFmt += "# TYPE myapp_background_tasks_in_flight gauge\n"
        metricFmt += fmt.Sprintf("myapp_background_tasks_in_flight %d\n", atomic.LoadInt64(&tasksInFlight))
        w.Write([]byte(metricFmt))
    })

    http.ListenAndServe(":8080", nil)
}

When there are three background tasks running, the /metrics endpoint might show:

# HELP myapp_background_tasks_in_flight Count of background tasks.
# TYPE myapp_background_tasks_in_flight gauge
myapp_background_tasks_in_flight 3

KEDA HTTP Add-on Configuration Sketch

If KEDA HTTP Add-on supported a configuration parameter (e.g., “scaleDownWhenZero: myapp_background_tasks_in_flight”), it would look like:

apiVersion: keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
  name: myapp-http-scaler
spec:
  host: "example.com"
  rules:
    - name: myapp
      scaleDownMetric:
        metricName: "myapp_background_tasks_in_flight"
        mustBeZero: true
      # ... other standard config, placeholders, etc.
  # ...

• “metricName: myapp_background_tasks_in_flight” references the Prometheus metric.
• “mustBeZero: true” means: do not scale down if its current value is > 0.

This ensures no pods will be terminated as long as background tasks are in progress.

Use-Case

Logging: Avoid truncated logs by ensuring the logging process in the background finishes.
Analytics: Send analytics data asynchronously and reliably, even in bursty traffic environments.
Cache Updates: Update and invalidate caches asynchronously without risking partial updates if the pod shuts down too soon.

Benefits

Improved Performance: Responses are sent immediately, while heavier tasks happen post-response.
Efficient Resource Usage: Pods only remain alive if there are still tasks in flight; no guesswork.
Better Developer Experience: Infrastructure “knows” not to kill pods while there are unfinished tasks.

Conclusion

By adding a way to respect a “tasks in flight” metric, KEDA HTTP Add-on would let developers perform post-response tasks without risking termination during critical background work. It avoids implementing an application-specific “waitUntil()” method and cleanly leverages existing Prometheus monitoring. This feature would fulfill a similar role as waitUntil() in other environments—ensuring that asynchronous chores complete before a pod is scaled down.

Thank you for considering this request!

Is this a feature you are interested in implementing yourself?

No

Anything else?

The text was updated successfully, but these errors were encountered:

keda-automation added this to Roadmap - KEDA HTTP Add-On Jan 12, 2025

github-project-automation bot moved this to To Triage in Roadmap - KEDA HTTP Add-On Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Metric-Based “waitUntil-Like” Behavior in KEDA HTTP Add-on #1234

Feature Request: Metric-Based “waitUntil-Like” Behavior in KEDA HTTP Add-on #1234

kahirokunn commented Jan 12, 2025 •

edited

Loading

Feature Request: Metric-Based “waitUntil-Like” Behavior in KEDA HTTP Add-on #1234

Feature Request: Metric-Based “waitUntil-Like” Behavior in KEDA HTTP Add-on #1234

Comments

kahirokunn commented Jan 12, 2025 • edited Loading

Proposal

Describe the Feature

Context

Proposed Approach

Prometheus Metric Example

KEDA HTTP Add-on Configuration Sketch

Use-Case

Benefits

Conclusion

Is this a feature you are interested in implementing yourself?

Anything else?

kahirokunn commented Jan 12, 2025 •

edited

Loading