-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional metrics for worker saturation analysis and scaling #11
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thank you for contributing to Yabeda!
My main concern here is that every worker process reports aggregated data about all workers (and there may be hundreds of them).
Isn't it better to report data about only this worker and let monitoring system to aggregate it? (By the way, what are you using?)
@@ -34,6 +34,10 @@ module Sidekiq | |||
gauge :active_processes, tags: [], comment: "The number of active Sidekiq worker processes." | |||
gauge :queue_latency, tags: %i[queue], comment: "The queue latency, the difference in seconds since the oldest job in the queue was enqueued" | |||
|
|||
gauge :concurrency, tags: [], comment: "The total number of jobs that can be run at a time across all processes." | |||
gauge :available_workers, tags: [], comment: "The number of workers available for new jobs across all processes." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the metric name I can't understand whether it is about processes or threads. I need to read the collector's code below to get it. Let's clarify:
gauge :available_workers, tags: [], comment: "The number of workers available for new jobs across all processes." | |
gauge :available_threads, tags: [], comment: "The number of threads available for job processing across all processes." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name was chosen to be consistent with how it’s reporting active workers — it looks like Sidekiq uses the term “worker” to mean “thread”.
I originally tried to do it per-process and let Prometheus aggregate it, but Sidekiq isn’t reporting it that way so I had to do it like this. The scaling is done with the Prometheus Exporter process making these stats available as custom metrics for the Kubernetes Horizontal Pod Autoscaler. To avoid it overloading Prometheus with queries, it’s set up as a service-level metric instead of per-pod. |
Hi! I saw the feature and specially the total number of threads is interesting (we already have the busy
By what I saw when this PR was opened there was no @Envek are you open to consider this feature again? Maybe a bit smaller (just a |
This adds three new metrics that can be used to implement scaling rules:
We then scale in Kubernetes using the following metric:
((max(sidekiq_jobs_waiting_count) / max(sidekiq_concurrency) + max(sidekiq_saturation)) * 100
This lets us scale pods based on the current saturation plus the additional saturation required for jobs waiting in the queue.