Node Performance Dashboard (node-perf-dash) is a web UI to collect and analyze performance test results of Kubernetes nodes. It collects data from Kubernetes node e2e performance tests, which can be stored either in local FS or Google GCS, then visualizes the data in 4 dashboards:
- Builds:monitoring performance change over different builds
- Comparison: compare performance change with different test parameters (e.g. pod number, creation speed, machine type)
- Time series: time series data including operation tracing probes inside kubernetes and resource-usage change over time
- Tracing: plot the latency percentile between any two tracing probes over different build
Node-Perf-Dash is running and available at http://node-perf-dash.k8s.io/
Build node-perf-dash:
make node-perf-dash
Collect data from Google GCS:
node-perf-dash --address=0.0.0.0:808 --builds=20 --tracing=true --datasource=google-gcs
Collect data from local test data:
node-perf-dash --address=0.0.0.0:808 --builds=20 --tracing=true --datasource=local --local-data-dir=$MY_TEST_RESULT_PATH
The test result must have the following directory structure:
$MY_TEST_RESULT_PATH/
latest-build.txt
build_nr_1/
build-log.txt
artifacts/
test_machine_host_name1/
kubelet.log
test_machine_host_name2
...
build_nr_N
...
You display the desired data by selecting
- Job: select the test project (e.g. ci-kubernetes-node-kubelet-benchmark)
- test: display data for a test by selecting the short test name, or selecting test options one by one
- image/machine: select from the available images and machine type (capacity in format cpu:1core,memory:3.5G)
- build: periodic benchmark tests are running with incremental build number, node-perf-dash collects latest test data using total build count specified by --builds, you can change the range of builds in dashboar (see https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/jenkins/benchmark/benchmark-config.yam)
Resource usage (CPU/memory of kubelet/runtime) will be displayed. Pod startup latency and creation throughput will be displayed for density test. (see https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/density_test.go)
To compare node performance among different tests, click COMPARE IT button in the right upper corner on the build page. The test will be added to the comparison list in the comparison page. Click LOAD to see the comparison in bar charts (data are averaged over the selected build range).
Analyzing time series data are useful to drill into node performance issues. The page contains the operation tracing data both from test and kubelet operations. It also shows the resource usage of kubelet and runtime changing with time during the test.
The tracing inside kubelet is done by parsing the log of kubelet. It contains important information such as when kubelet SyncLoop detects pod configuration change, when a pod is running, and when kubelet status manager reports pod status change to the API server. In future we plan to use Event as a fixed format of tracing instead of using random logs. See kubernetes/kubernetes#31583 for more details.
Interested in knowing the latency distribution between any two operations? You can select two operations shown in the time series page (probes) and see the latency percentiles. (it does not match operations for the same pod, instead simply assumes all operations happen in order)