Investigation for auto-scaling Pravega deployment #131
Labels
kind/feature
New feature
Pravega 0.9
Fix needed by Pravega 0.9
priority/P2
Slight inconvenience or annoyance to applications, system continues to function
status/needs-investigation
Further investigation is required
This investigation is for future reference.
Pravega deployment may need to dynamically scale to match the Pravega changing data ingestion rate. We want to automate this process in Kubernetes.
Auto-scale Pravega
Pravega can already auto-scale Segment according to the data ingestion rate, see here. But the Segmentstore cannot be auto-scaled right now.
Where to implement the control logic?
There is a component in Kubernetes called Horizontal Pod Autoscaler(HPA) that we can leverage. The control object of the HPA is either deployment or statefulset. The logic behind HPA is simple, it tries to maintain user-defined metrics at a fixed value by scaling or shrinking the size of deployment or statefulset. Metrics can be custom metrics, such as Pravega metrics, or built-in metrics, such as CPU and memory usage of Pod and Node.
In the case of Pravega, we need three HPAs for controller, segmentstore and bookkeeper. The deployment of HPA could be handled by Pravega operator, an example of HPA operator might be useful for a reference.
How to trigger the scaling?
CPU and Mem usage for nodes and pods can be accessed by sending requests to API
metrics.k8s.io
like this, HPA can directly use those metrics without doing any extra steps. But for custom metrics, such as Pravega metrics, we need to deploy two more components.Prometheus
As a monitoring tool, Prometheus can collect all the Pravega metrics and store them in the TSDB. It is very popular and its community is very active, I hope we plan to use it in the future.
Adaptor
The adaptor is the bridge between the monitoring tool and Kubernetes. There are only a few adaptors in the wild now, see the list. Take Prometheus Adaptor for an example, it implements the API
custom.metrics.k8s.io
and registers that API in the aggregation layer, which extends the API server. When HPA sends requests tocustom.metrics.k8s.io
, the Kubernetes API server will route the request to the Prometheus Adaptor. The Adaptor then processes the request into some query and send it to Prometheus to fetch metrics and then return the result to HPA. HPA will scale or shrink based on the latest metrics.This is the first round of investigation. Please feel free to leave any comments or thoughts.
The text was updated successfully, but these errors were encountered: