forked from kubernetes/autoscaler
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request kubernetes#462 from epam/performance_tests
Kueue performance tests
- Loading branch information
Showing
9 changed files
with
443 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Clusterloader home directory (checkout https://github.com/kubernetes/perf-tests) | ||
export CL2_HOME_DIR="/Users/johny/perf-tests/clusterloader2" | ||
|
||
# Run the performance test with Kueue (this requires Kueue to be pre-deployed to the cluster) | ||
# or without Kueue | ||
export USE_KUEUE=false | ||
|
||
# Test iterations: | ||
# number-of-small-jobs number-of-large-jobs job-replica-running-time test-timeout cluster-queue-CPU-quota cluster-queue-memory-quota | ||
export EXPERIMENTS=( | ||
"10 2 0 2s 3m 100 100Gi" | ||
"20 2 0 2s 5m 100 100Gi" | ||
) | ||
|
||
# Kubeconfig file location | ||
export KUBECONFIG="$HOME/.kube/config" | ||
|
||
# Kubernetes kind | ||
export PROVIDER="gke" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
*report*/ | ||
prerequisites/cluster-queue.yaml | ||
tmp_manifests/ | ||
.env |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Kueue Performance Testing | ||
|
||
## Measurements | ||
|
||
### Job startup latency | ||
|
||
How fast do jobs transition from `created` to `started` state? | ||
Time spent between the transition from `job.CreationTimestamp.Time` to `job.Status.StartTime.Time` state. | ||
|
||
High Job startup latency in Kueue is expected when the total quota is not enough to schedule all jobs immediately, because the jobs need to queue. | ||
|
||
### Job startup throughput | ||
|
||
The best workload admission rate per second within 1 minute intervals. | ||
The rate is measured every 5 seconds (see more details in [PromQL examples](https://prometheus.io/docs/prometheus/latest/querying/examples/#subquery)): | ||
|
||
`max_over_time(sum(rate(kueue_admitted_workloads_total{cluster_queue="{{$clusterQueue}}"}[1m]))[{{$testTimeout}}:5s])` | ||
|
||
This measurement is not accurate if the cluster quota is big enough to schedule all workloads of the test immediately, because Kueue immediately admits all the workloads and the `kueue_admitted_workloads_total` never increases. In this case, the PromQL query returns 0. | ||
## How to run the test? | ||
|
||
### Prerequisites | ||
|
||
1. Deploy [Kueue](https://github.com/kubernetes-sigs/kueue/blob/main/docs/setup/install.md) | ||
2. Make sure you have `kubectl`, [jq](https://stedolan.github.io/jq/download/), [golang version](https://github.com/mikefarah/yq) of `yq` and `go` | ||
3. Checkout `Clusterloader2` framework: https://github.com/kubernetes/perf-tests and build `clusterloader` binary: | ||
|
||
* change to `clusterloader2` directory | ||
* run `go build -o clusterloader './cmd/'` | ||
|
||
### Run the test | ||
|
||
1. Copy an environment file example to `.env` file: | ||
|
||
* `cp .env.example .env` | ||
|
||
2. Edit the environment variables | ||
|
||
| Variable | Description | | ||
| ----------- | ----------- | | ||
| CL2_HOME_DIR | Clusterloader home directory (checkout https://github.com/kubernetes/perf-tests) | | ||
| USE_KUEUE | Run the performance test with Kueue (this requires Kueue to be pre-deployed to the cluster) or without Kueue | | ||
| EXPERIMENTS | Configuration of iterations iterations (see configuration example in the file) | | ||
| KUBECONFIG | Kubeconfig file location | | ||
| PROVIDER | Kubernetes kind (tested on `gke` only) | ||
|
||
3. Run the `run-test.sh` file | ||
|
||
### Test results | ||
|
||
Every test execution creates a `report_<timestamp>` directory inside `TEST_CONFIG_DIR` with `summary.csv` file, where the following metrics are available: | ||
|
||
* P50 Job Create to start latency (ms) | ||
* P90 Job Create to start latency (ms) | ||
* P50 Job Start to complete latency (ms) | ||
* P90 Job Start to complete latency (ms) | ||
* Max Job Throughput (max jobs/s) | ||
* Total Jobs | ||
* Total Pods | ||
* Duration (s) | ||
|
||
Additionally, the following metrics are added to the results only for reference. Kueue doesn't influence them directly. | ||
|
||
* Avg Pod Waiting time (s) | ||
* P90 Pod Waiting time (s) | ||
* Avg Pod Completion time (s) | ||
* P90 Pod Completion time (s) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
{{$MODE := DefaultParam .MODE "Indexed"}} | ||
{{$LOAD_TEST_THROUGHPUT := DefaultParam .CL2_LOAD_TEST_THROUGHPUT 10}} | ||
|
||
{{$smallJobs := DefaultParam .CL2_SMALL_JOBS 10}} | ||
{{$mediumJobs := DefaultParam .CL2_MEDIUM_JOBS 2}} | ||
{{$largeJobs := DefaultParam .CL2_LARGE_JOBS 0}} | ||
|
||
{{$namespaces := DefaultParam .CL2_NAMESPACES 1}} | ||
|
||
{{$smallJobsPerNamespace := DivideInt $smallJobs $namespaces}} | ||
{{$mediumJobsPerNamespace := DivideInt $mediumJobs $namespaces}} | ||
{{$largeJobsPerNamespace := DivideInt $largeJobs $namespaces}} | ||
|
||
{{$smallJobSize := 5}} | ||
{{$mediumJobSize := 20}} | ||
{{$largeJobSize := 100}} | ||
|
||
{{$jobRunningTime := DefaultParam .CL2_JOB_RUNNING_TIME "30s"}} | ||
|
||
{{$clusterQueue := "default-cluster-queue"}} | ||
{{$localQueue := "local-queue"}} | ||
|
||
{{$testTimeout := DefaultParam .CL2_TEST_TIMEOUT "5m"}} | ||
|
||
{{$namespacePrefix := "queue-test"}} | ||
|
||
{{$useKueue := DefaultParam .CL2_USE_KUEUE false}} | ||
|
||
name: batch | ||
|
||
namespace: | ||
number: {{$namespaces}} | ||
prefix: {{$namespacePrefix}} | ||
|
||
tuningSets: | ||
- name: UniformQPS | ||
qpsLoad: | ||
qps: {{$LOAD_TEST_THROUGHPUT}} | ||
|
||
steps: | ||
- name: Start measurements | ||
measurements: | ||
- Identifier: Timer | ||
Method: Timer | ||
Params: | ||
action: start | ||
label: job_performance | ||
- Identifier: WaitForFinishedJobs | ||
Method: WaitForFinishedJobs | ||
Params: | ||
action: start | ||
labelSelector: group = test-job | ||
- Identifier: JobLifecycleLatency | ||
Method: JobLifecycleLatency | ||
Params: | ||
action: start | ||
labelSelector: group = test-job | ||
- Identifier: GenericPrometheusQuery | ||
Method: GenericPrometheusQuery | ||
Params: | ||
action: start | ||
metricName: Job (Kueue) API performance | ||
metricVersion: v1 | ||
unit: s | ||
queries: | ||
- name: total_jobs_scheduled | ||
query: count(kube_job_info{namespace=~"{{$namespacePrefix}}.*"}) | ||
- name: total_pods_scheduled | ||
query: count(kube_pod_info{namespace=~"{{$namespacePrefix}}.*"}) | ||
- name: avg_pod_running_time | ||
query: (avg(kube_pod_completion_time{namespace=~"{{$namespacePrefix}}.*"} - kube_pod_start_time{namespace=~"{{$namespacePrefix}}.*"})) | ||
- name: perc_90_pod_completion_time | ||
query: quantile(0.90, kube_pod_completion_time{namespace=~"{{$namespacePrefix}}.*"} - kube_pod_start_time{namespace=~"{{$namespacePrefix}}.*"}) | ||
- name: avg_pod_waiting_time | ||
query: (avg(kube_pod_start_time{namespace=~"{{$namespacePrefix}}.*"} - kube_pod_created{namespace=~"{{$namespacePrefix}}.*"})) | ||
- name: perc_90_pod_waiting_time | ||
query: quantile(0.90, kube_pod_start_time{namespace=~"{{$namespacePrefix}}.*"} - kube_pod_created{namespace=~"{{$namespacePrefix}}.*"}) | ||
- name: max_job_throughput | ||
query: max_over_time(sum(rate(kueue_admitted_workloads_total{cluster_queue="{{$clusterQueue}}"}[1m]))[{{$testTimeout}}:5s]) | ||
- name: Sleep | ||
measurements: | ||
- Identifier: sleep | ||
Method: Sleep | ||
Params: | ||
duration: 10s | ||
{{if $useKueue}} | ||
- name: Create local queue | ||
phases: | ||
- namespaceRange: | ||
min: 1 | ||
max: {{$namespaces}} | ||
replicasPerNamespace: 1 | ||
tuningSet: UniformQPS | ||
objectBundle: | ||
- basename: {{$localQueue}} | ||
objectTemplatePath: "local-queue.yaml" | ||
templateFillMap: | ||
ClusterQueue: {{$clusterQueue}} | ||
{{end}} | ||
- name: Create {{$MODE}} jobs | ||
phases: | ||
- namespaceRange: | ||
min: 1 | ||
max: {{$namespaces}} | ||
replicasPerNamespace: {{$smallJobsPerNamespace}} | ||
tuningSet: UniformQPS | ||
objectBundle: | ||
- basename: small | ||
objectTemplatePath: "job.yaml" | ||
templateFillMap: | ||
UseKueue: {{$useKueue}} | ||
Replicas: {{$smallJobSize}} | ||
Mode: {{$MODE}} | ||
Sleep: {{$jobRunningTime}} | ||
LocalQueue: "{{$localQueue}}-0" | ||
- namespaceRange: | ||
min: 1 | ||
max: {{$namespaces}} | ||
replicasPerNamespace: {{$mediumJobsPerNamespace}} | ||
tuningSet: UniformQPS | ||
objectBundle: | ||
- basename: medium | ||
objectTemplatePath: "job.yaml" | ||
templateFillMap: | ||
UseKueue: {{$useKueue}} | ||
Replicas: {{$mediumJobSize}} | ||
Mode: {{$MODE}} | ||
Sleep: {{$jobRunningTime}} | ||
LocalQueue: "{{$localQueue}}-0" | ||
- namespaceRange: | ||
min: 1 | ||
max: {{$namespaces}} | ||
replicasPerNamespace: {{$largeJobsPerNamespace}} | ||
tuningSet: UniformQPS | ||
objectBundle: | ||
- basename: large | ||
objectTemplatePath: "job.yaml" | ||
templateFillMap: | ||
UseKueue: {{$useKueue}} | ||
Replicas: {{$largeJobSize}} | ||
Mode: {{$MODE}} | ||
Sleep: {{$jobRunningTime}} | ||
LocalQueue: "{{$localQueue}}-0" | ||
- name: Wait for {{$MODE}} jobs to finish | ||
measurements: | ||
- Identifier: JobLifecycleLatency | ||
Method: JobLifecycleLatency | ||
Params: | ||
action: gather | ||
timeout: {{$testTimeout}} | ||
- Identifier: WaitForFinishedJobs | ||
Method: WaitForFinishedJobs | ||
Params: | ||
action: gather | ||
timeout: {{$testTimeout}} | ||
- name: Stop Timer | ||
measurements: | ||
- Identifier: Timer | ||
Method: Timer | ||
Params: | ||
action: stop | ||
label: job_performance | ||
- name: Gather Timer | ||
measurements: | ||
- Identifier: Timer | ||
Method: Timer | ||
Params: | ||
action: gather | ||
- name: Sleep | ||
measurements: | ||
- Identifier: sleep | ||
Method: Sleep | ||
Params: | ||
duration: 30s | ||
- name: Gather Prometheus measurements | ||
measurements: | ||
- Identifier: GenericPrometheusQuery | ||
Method: GenericPrometheusQuery | ||
Params: | ||
action: gather | ||
enableViolations: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
apiVersion: batch/v1 | ||
kind: Job | ||
metadata: | ||
name: {{.Name}} | ||
labels: | ||
group: test-job | ||
{{if .UseKueue}} | ||
annotations: | ||
kueue.x-k8s.io/queue-name: {{.LocalQueue}} | ||
{{end}} | ||
spec: | ||
suspend: {{.UseKueue}} | ||
parallelism: {{.Replicas}} | ||
completions: {{.Replicas}} | ||
completionMode: {{.Mode}} | ||
template: | ||
metadata: | ||
labels: | ||
group: test-pod | ||
spec: | ||
containers: | ||
- name: {{.Name}} | ||
image: gcr.io/k8s-staging-perf-tests/sleep:v0.0.3 | ||
args: | ||
- {{.Sleep}} | ||
resources: | ||
requests: | ||
cpu: "200m" | ||
memory: "100Mi" | ||
restartPolicy: Never |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
apiVersion: kueue.x-k8s.io/v1alpha2 | ||
kind: LocalQueue | ||
metadata: | ||
name: {{.Name}} | ||
spec: | ||
clusterQueue: {{.ClusterQueue}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
apiVersion: kueue.x-k8s.io/v1alpha2 | ||
kind: ClusterQueue | ||
metadata: | ||
name: default-cluster-queue | ||
spec: | ||
namespaceSelector: {} | ||
resources: | ||
- name: "cpu" | ||
flavors: | ||
- name: default | ||
quota: | ||
min: 100 | ||
- name: "memory" | ||
flavors: | ||
- name: default | ||
quota: | ||
min: 50Gi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
apiVersion: kueue.x-k8s.io/v1alpha2 | ||
kind: ResourceFlavor | ||
metadata: | ||
name: default |
Oops, something went wrong.