Skip to content

Commit

Permalink
feat: Update device settings to drive better performance (#843)
Browse files Browse the repository at this point in the history
* feat: Update device settings to drive better performance
In this PR:
1. Add ability for customer to enable perf optimization in storageclass.
2. Ability to select perf profile in storageclass.
3. Added scripts to run perf tests against the driver.
4. Added scripts to get sku\latency for azure disk.
5. Added unit tests.

* Address review comments

* Address review comments and fix unit tests

* fix unit tests

* Increase test coverage

* Add e2e test for pv perf optimization

* Fix bugs in perf optimization

* fix tests

* fix e2e tests

* fiz e2e test

* fix e2e test

* address review comments

* regenerate chart index

* fix chart index

* delete v1.2 index

* fix v2 unit test
  • Loading branch information
abhisheksinghbaghel authored May 27, 2021
1 parent a5907be commit 1af2e64
Show file tree
Hide file tree
Showing 41 changed files with 2,953 additions and 73 deletions.
28 changes: 14 additions & 14 deletions charts/index.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ entries:
azuredisk-csi-driver:
- apiVersion: v1
appVersion: v2.0.0-alpha.1
created: "2021-05-21T07:14:15.260811559Z"
created: "2021-05-26T21:40:58.6554752Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: 78489f4429e68903d79e59bbba2a0d2d284164ac26068b96176f6da77da5fd6a
name: azuredisk-csi-driver
Expand All @@ -12,16 +12,16 @@ entries:
version: v2.0.0-alpha.1
- apiVersion: v1
appVersion: latest
created: "2021-05-21T07:14:15.248528878Z"
created: "2021-05-26T21:40:58.6358679Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: fa28b8a63ae984366bce7b6afddd184aaed4ab3f36b7c16dab1b95b84af6f1d0
digest: c15e1d2a29a77fece047ccf708be578b5a068fc4cfb124941f5543eaa8de839d
name: azuredisk-csi-driver
urls:
- https://raw.githubusercontent.com/kubernetes-sigs/azuredisk-csi-driver/master/charts/latest/azuredisk-csi-driver-v1.4.0.tgz
version: v1.4.0
- apiVersion: v1
appVersion: v1.3.0
created: "2021-05-21T07:14:15.258805146Z"
created: "2021-05-26T21:40:58.6545028Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: 2665483e922a577feb8539ca7f774bc70c945ce490294fd3378f098c2d244dde
name: azuredisk-csi-driver
Expand All @@ -30,7 +30,7 @@ entries:
version: v1.3.0
- apiVersion: v1
appVersion: v1.2.0
created: "2021-05-21T07:14:15.257794839Z"
created: "2021-05-26T21:40:58.6534629Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: 2bbfe2f9d080f1b3ff10590c7168d05ce026c5a73332b4d48014610a52337808
name: azuredisk-csi-driver
Expand All @@ -39,7 +39,7 @@ entries:
version: v1.2.0
- apiVersion: v1
appVersion: v1.1.1
created: "2021-05-21T07:14:15.256744232Z"
created: "2021-05-26T21:40:58.651938Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: dd7066be8f499f6c1a396ab27c0013c09f5a8d8319cc04fbdd480d31107bb851
name: azuredisk-csi-driver
Expand All @@ -48,7 +48,7 @@ entries:
version: v1.1.1
- apiVersion: v1
appVersion: v1.1.0
created: "2021-05-21T07:14:15.255773926Z"
created: "2021-05-26T21:40:58.6494581Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: 3d2a5189416dd6a43bd3e2097bbe23a8db347b6e1a36c6a43fd59cc9c9633ff3
name: azuredisk-csi-driver
Expand All @@ -57,7 +57,7 @@ entries:
version: v1.1.0
- apiVersion: v1
appVersion: v1.0.0
created: "2021-05-21T07:14:15.25488032Z"
created: "2021-05-26T21:40:58.646755Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: d179bc6f338518859b6efdc3b3bed8d06513313e8047563eb4b654b2d417c81e
name: azuredisk-csi-driver
Expand All @@ -66,7 +66,7 @@ entries:
version: v1.0.0
- apiVersion: v1
appVersion: v0.10.0
created: "2021-05-21T07:14:15.250650692Z"
created: "2021-05-26T21:40:58.6368514Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: 3dbbaca098fe8316de079236598253b5831e8e85fd88b390231d828157d62206
name: azuredisk-csi-driver
Expand All @@ -75,7 +75,7 @@ entries:
version: v0.10.0
- apiVersion: v1
appVersion: v0.9.0
created: "2021-05-21T07:14:15.253998214Z"
created: "2021-05-26T21:40:58.6420196Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: a978f3e6ef5d678c3b6512bd8a63277cb4ce40d3f3e34b80370f0c37298824f2
name: azuredisk-csi-driver
Expand All @@ -84,7 +84,7 @@ entries:
version: v0.9.0
- apiVersion: v1
appVersion: v0.8.0
created: "2021-05-21T07:14:15.253114408Z"
created: "2021-05-26T21:40:58.6409807Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: 1762b832389b4f7a5eab9748127fa6dbb85131485d67bc3fe485bbe86c468128
name: azuredisk-csi-driver
Expand All @@ -93,7 +93,7 @@ entries:
version: v0.8.0
- apiVersion: v1
appVersion: v0.7.0
created: "2021-05-21T07:14:15.252242002Z"
created: "2021-05-26T21:40:58.6393335Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: 29e21f686814f46c1edaaaa95ce2d25579ff1aad270c58b774bdb5a89858b8bf
name: azuredisk-csi-driver
Expand All @@ -102,11 +102,11 @@ entries:
version: v0.7.0
- apiVersion: v1
appVersion: v0.6.0
created: "2021-05-21T07:14:15.251378497Z"
created: "2021-05-26T21:40:58.6375025Z"
description: Azure disk Container Storage Interface (CSI) Storage Plugin
digest: b11d8dfee371ca7c63a1448ba27c1fd1f032ea33575fefeeb16927fc95d1eeb7
name: azuredisk-csi-driver
urls:
- https://raw.githubusercontent.com/kubernetes-sigs/azuredisk-csi-driver/master/charts/v0.6.0/azuredisk-csi-driver-v0.6.0.tgz
version: v0.6.0
generated: "2021-05-21T07:14:15.24727447Z"
generated: "2021-05-26T21:40:58.6347071Z"
Binary file modified charts/latest/azuredisk-csi-driver-v1.4.0.tgz
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ spec:
- name: AZURE_ENVIRONMENT_FILEPATH
value: /etc/kubernetes/azurestackcloud.json
{{- end }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
imagePullPolicy: {{ .Values.image.azuredisk.pullPolicy }}
volumeMounts:
- mountPath: /csi
name: socket-dir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ spec:
- "--endpoint=$(CSI_ENDPOINT)"
- "--nodeid=$(KUBE_NODE_NAME)"
- "--metrics-address=0.0.0.0:{{ .Values.node.metricsPort }}"
- "--enable-perf-optimization={{ .Values.linux.enablePerfOptimization }}"
ports:
- containerPort: 29603
name: healthz
Expand Down Expand Up @@ -121,7 +122,7 @@ spec:
- name: AZURE_ENVIRONMENT_FILEPATH
value: /etc/kubernetes/azurestackcloud.json
{{- end }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
imagePullPolicy: {{ .Values.image.azuredisk.pullPolicy }}
securityContext:
privileged: true
volumeMounts:
Expand Down
1 change: 1 addition & 0 deletions charts/latest/azuredisk-csi-driver/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ linux:
enabled: true
kubelet: /var/lib/kubelet
distro: debian
enablePerfOptimization: true

windows:
enabled: true
Expand Down
1 change: 1 addition & 0 deletions deploy/csi-azuredisk-node.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ spec:
- "--endpoint=$(CSI_ENDPOINT)"
- "--nodeid=$(KUBE_NODE_NAME)"
- "--metrics-address=0.0.0.0:29605"
- "--enable-perf-optimization=true"
ports:
- containerPort: 29603
name: healthz
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Add ability to tune azuredisk performance parameters

## Table of Contents

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Proposal](#proposal)
- [Perf Profile](#perf-profile)
- [Scope Of the Change](#scope-of-the-change)
- [Limitations](#limitations)
- [Future Considerations](#future-considerations)
<!-- /toc -->

## Summary

Persistent volumes in kubernetes are used for wide variety of stateful workloads.
These workloads have different runtime/IO characterstics, certain device level config settings
on the data disk can make a huge difference in performance of the application.

Azure storage publishes [guidelines](https://docs.microsoft.com/en-us/azure/virtual-machines/premium-storage-performance)
for the applications to configure the disks' guest OS settings to drive maximum IOPS and Bandwidth.

Azure Disk CSI driver users currently do not have an easy way to tune disk device configuration to
get better performance out of them for their workloads.

This proposal proposes to provide users with ability to enable automatic perf optimization of data
disks by tweaking guest OS level device/disk/driver parameters.

## Motivation

As the adoption of the Azure Disk CSI driver increases, we will encounter different type
of workloads which have different disk IO chracterstics. Some may require to drive the data
disks to maximum IOPS, while others may need to do larger size writes and drive maximum throughput.

Azure Disk CSI driver should evolve to let users tune the data disks configurations, to get optimal
performance for their workloads.

With this feature, we try to provide a configurable and handsfree way for the users to enable
perf optimizations of data disk by enabling a feature in storageclass and be able to select from
multiple optimization profiles based on their requirement.

We intend to provide a way for the users to opt-in for the new behavior and expect not to break
any existing applications/configurations.

## Proposal

Azure Disk CSI driver will read one extra parameter from the storageclass, `perfProfile`.

- *perfprofile*: Users will be able to select a certain performance profile for their device to match their
workload needs. In the beginning we will expose `Basic` profile which sets the disks for balanaced IO and throughput
workloads. If users want to keep this feature disabled they can skip adding `perfProfile` parameter in the storage class
or set `perfProfile` to `None`. If `perfProfile` is set to an unknown value, it will result in CreateVolume failure.
Please read the limitations to fully understand the options available today.

### Perf Profile

```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: sc-kubestone-perf-optimized-premium-ssd-csi
provisioner: disk.csi.azure.com
parameters:
skuName: Premium_LRS
perfProfile: Basic # available values: None(by default), Basic. These are case insensitive.
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
```
### Scope Of the Change
In this iteration we will only enable the feature for StandardSSD and Premium disks.
## Limitations
- This feature is not supported for HDD or UltraDisk right now.
- The current implementation only optimizes the disks which use the storVsc linux disk driver.
- Only `Basic` `perfProfile` is available today, which would provide balanced IO and throughput performance.

## Future Considerations

- We will consider exposing more perf profiles, tailor made for different IO characterstics.
- We will consider allowing users to create their own perf profile and express the workloads characterstics for
the storage class using CRD or some other configuration.
- We will consider expanding perf optimization for HDD and UltraDisks.
- We will consider expanding perf optimization for other disk drivers such as NVME etc.
- We will consider expanding perf optimization for other OS' such as Windows.
108 changes: 108 additions & 0 deletions hack/calc_device_rand_latency.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
#!/usr/bin/env python

# Copyright 2015 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import json
import math
import os.path
import shlex
import subprocess
import requests
import sys
import logging

logger = logging.getLogger()

def measure_latency(directory, iops, request_size, rw, name):
# Through experimentation, 1000 was the fewest number of IOs with a consistent average
number_ios = 1000
# Rate limit the fio command to non-burst IOPs to avoid inaccurate latency measurement due to throttling
fio_command = [
'/usr/bin/fio', '--direct=1', '--ioengine=libaio', '--iodepth=1',
'--time_based=0', f'--name={name}', f'--size=10MB',
'--output-format=json', '--overwrite=1', f'--rw={rw}',
f'--bs={request_size}k', f'--number_ios={number_ios}',
f'--rate_iops={iops}', f'--directory={directory}'
]
# For usability, print an approximate expected runtime
expected_runtime = 2

logger.info('Running: %s.\nThis will take approximately %d seconds.' % (' '.join(shlex.quote(i) for i in fio_command), expected_runtime))
return json.loads(subprocess.check_output(fio_command))

def get_latency(directory, iops, request_size, random):
write = ('rand' if random else '') + 'write'
read = ('rand' if random else '') + 'read'

jobname = 'test'
# The fio filename_format is, by default, $jobname.$jobnum.$filenum (when
# no other format specifier is given). This script only runs a single job
# with a single file, so the suffix will always be 0.0
job_filename = jobname + '.0.0'
try:
os.remove(job_filename)
except FileNotFoundError:
pass

parsed = measure_latency(directory, iops, request_size, write, jobname)
write_clat_mean = parsed['jobs'][0]['write']['clat_ns']['mean']

parsed = measure_latency(directory, iops, request_size, read, jobname)
read_clat_mean = parsed['jobs'][0]['read']['clat_ns']['mean']

winner = max(write_clat_mean, read_clat_mean)

# convert to seconds
return winner / 1_000_000_000

parser = argparse.ArgumentParser(description="""
Calculate the latencies azure disk is seeing for read\writes.
""".strip())

parser.add_argument('--maxIops',
type=int,
required=True,
default=1,
help='Max IOs possible')

parser.add_argument('--ioSize',
required=True,
help='Size of the IOs for calculating latency.')

parser.add_argument('--directory',
required=True,
help='Directory at which read/writes need to be made.')

args = parser.parse_args()

try:
handler = logging.StreamHandler()
logger.addHandler(handler)
if not sys.stderr.isatty():
handler.setFormatter(logging.Formatter('%(asctime)s %(message)s'))
logger.setLevel(logging.DEBUG)


# completion latency of a single direct IO request of RS_min_seq_IO
latency_max_BW = get_latency(args.directory, args.maxIops, args.ioSize, False)
logger.info(f'Measured {latency_max_BW}s latency for a sequential request of size {args.ioSize}kB for disk at LUN {args.directory}.')

# completion latency of a single direct IO request of the smallest expected size
latency_base_rand = get_latency(args.directory, args.maxIops, 8, True)
logger.info(f'Measured {latency_base_rand}s latency for a random request of size 8kB for disk at LUN {args.directory}.')

except Exception as e:
logger.critical('An exception was encountered while calculating the desired block device settings. As such, no settings can be recommended', exc_info=e)
Loading

0 comments on commit 1af2e64

Please sign in to comment.