[Serverless] Support elastic scaling interface based on group #2740

ZhiHanZ · 2021-11-11T04:38:55Z

Summary
Goals:

Scaling up or down to 0 based on metric value we configured
Integrate with keda for elastic serverless on cloud native environment
high performance, push active info to keda instead of polling, which may incur scaling issues

Non Goals:

Investigate on other scaling triggers.
Prometheus + Keda integration(seems not make sense).

Concepts:
Query Namespace(Group): logic group for computing resource isolation(which is the scaling target)

Tenant: Storage isolation(which is not relevant to the scaler for now?)

HPA: HPA is a kubernetes component for autoscaling, mechanism in simple word is to poll object and collect their metrics, then compare with target metrics to calculate the desired cluster size.

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

cons of HPA: metrics define is hard, no uniform api for metrics define, not able to scale to 0(needs to open some feature gates)

External scaler: external scaler a trigger type in keda which use grpc protocol to poll or push from a kubernetes service, which would tell the keda to scale up or down deployment based on metrics, follow the HPA calculation.

Background:
Keda is an addon to kubernetes which designed to support elastic auto scaling part on cloud native deployments, statefulsets, jobs and CRDs. In serverless platform, it is required for us to scale up or down our query service based on some sort of metrics for example: scale to 0 if tenant has no query in given time internal, scale up a cluster based on query time etc. for now we could implement the first one.

Detailed Design:
To support keda autoscaling we need to define the following metrics

service ExternalScaler {
    rpc IsActive(ScaledObjectRef) returns (IsActiveResponse) {}
    rpc StreamIsActive(ScaledObjectRef) returns (stream IsActiveResponse) {}
    rpc GetMetricSpec(ScaledObjectRef) returns (GetMetricSpecResponse) {}
    rpc GetMetrics(GetMetricsRequest) returns (GetMetricsResponse) {}
}

IsActive: if it is return true, object will be scaled to 1, other wise it would be scaled to 0.
GetMetricSpec: return the target value of metrics used for autoscaling(desiredMetricValue)
GetMetric: return current metric value

In our group deployment, we could introduce external scaler in the following way

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: databend-query-group1
  namespace: group1
spec:
  scaleTargetRef:
    name: databend-query-group1 // a deployment
  triggers:
    - type: external-push
      metadata:
        scalerAddress: meta.meta-system.svc.cluster.local:20048
        tenant: "user"
        namespace: "group1"
        metrics_rule: "scaling down policies" // to define multiple rules
    - type: external-push
      metadata:
        scalerAddress: meta.meta-system.svc.cluster.local:20048
        tenant: "user"
        namespace: "group1"
        metrics_rule: "scaling up policies"

For fine tune on flutation of auto scaling objects, we could follow the documentation: https://keda.sh/docs/2.0/concepts/scaling-deployments/

Ref:
apache/datafusion#586
https://github.com/turbaszek/keda-example/blob/master/keda/mysql-hpa.yaml
https://github.com/kedacore/external-scalers
https://keda.sh/docs/2.0/concepts/external-scalers/

Alternative plan:
Use metrics api + prometheus,
Some billing and monitoring service implemented by us.

BohuTANG added the A-databend-cloud label Nov 13, 2021

ZhiHanZ closed this as completed Dec 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serverless] Support elastic scaling interface based on group #2740

[Serverless] Support elastic scaling interface based on group #2740

ZhiHanZ commented Nov 11, 2021 •

edited

Loading

[Serverless] Support elastic scaling interface based on group #2740

[Serverless] Support elastic scaling interface based on group #2740

Comments

ZhiHanZ commented Nov 11, 2021 • edited Loading

ZhiHanZ commented Nov 11, 2021 •

edited

Loading