-
Notifications
You must be signed in to change notification settings - Fork 183
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update stress test infrastructure docs
- Loading branch information
Showing
1 changed file
with
75 additions
and
102 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,162 +1,135 @@ | ||
This directory contains [Azure Bicep](https://docs.microsoft.com/en-us/azure/azure-resource-manager/bicep/overview) | ||
# Layout | ||
|
||
This directory contains all configuration used for stress test cluster buildout (azure and kubernetes buildout), as well | ||
as a set of common stress test config boilerplate (helm library). | ||
|
||
The `./azure` directory contains [Azure Bicep](https://docs.microsoft.com/en-us/azure/azure-resource-manager/bicep/overview) | ||
files for deploying Azure resources (mainly [AKS clusters](https://azure.microsoft.com/en-us/services/kubernetes-service/) | ||
to support stress testing (for dev/test and/or production). | ||
|
||
Azure Bicep comes pre-installed with the Azure CLI, and is a DSL for generating ARM templates. | ||
|
||
The `./kubernetes/stress-infrastructure` directory contains a helm chart for deploying the core services | ||
that must be installed into any stress cluster: chaos-mesh (for chaos) and stress-watcher (for event handling like chaos | ||
resource start and resource group cleanup). | ||
|
||
The `./kubernetes/stress-test-addons` directory contains a [library chart](https://helm.sh/docs/topics/library_charts/) | ||
for use by stress test packages. This common set of config boilerplate simplifies stress test authoring, and makes it | ||
easier to make and roll out config changes to tests across repos by using helm chart dependency versioning. | ||
|
||
|
||
# Dependencies | ||
|
||
- [Powershell Core](https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell-core-on-linux?view=powershell-7.1#ubuntu-2004) (if using Linux) | ||
- [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) | ||
- If using app insights, install the az extension: `az extension add --name application-insights` | ||
- [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) (if accessing clusters) | ||
- [helm](https://helm.sh) (if installing stress infrastructure) | ||
- [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) | ||
- [helm](https://helm.sh) | ||
- [kind](https://github.com/kubernetes-sigs/kind/releases) (if testing locally) | ||
- [Docker](https://docs.docker.com/get-docker/) (if deploying/testing locally) | ||
|
||
# Cluster Deployment Quick Start | ||
|
||
## Deploying a Dev Cluster | ||
# Deploying Cluster(s) | ||
|
||
First, update the `./azure/parameters/dev.json` parameters file with the values marked `// add me`, then: | ||
The cluster-specific configurations can be found at `./azure/parameters/<environment>.json`. | ||
|
||
``` | ||
az deployment sub create -o json -n <your name> -l westus -f ./azure/main.bicep --parameters ./azure/parameters/dev.json | ||
Almost all stress test infrastructure is local to the cluster resource group, including storage accounts, keyvaults, | ||
log workspaces and the AKS cluster. There is also a set of static resources, including a subscription service principal | ||
and a keyvault containing the credential configuration. These are shared across clusters located in the same subscription | ||
and are provisioned independently of the bicep templates. | ||
|
||
# wait until resource group and AKS cluster are deployed | ||
az aks get-credentials stress-azuresdk -g rg-stress-test-cluster-<group suffix parameter> | ||
``` | ||
Cluster buildout and deployment involves three main steps which are automated in `./provision.ps1`: | ||
|
||
## Deploying a Local Cluster | ||
1. Provision static resources (service principal, role assignments, static keyvault). | ||
1. Provision cluster resources (`main.bicep` entrypoint, standard ARM subscription deployment). | ||
1. Provision stress infrastructures resources into the Azure Kubernetes Service cluster via helm | ||
(`./kubernetes/stress-infrastructure` helm chart). | ||
|
||
NOTE: Chaos-Mesh may not work on all local deployments (e.g. Docker Desktop on Windows via WSL). | ||
It may be easier to test services, manifests and containers locally with KIND, and test chaos | ||
in an Azure AKS cluster (shared or personal). | ||
|
||
``` | ||
# Ensure docker is running | ||
kind create cluster | ||
``` | ||
## Dev Cluster | ||
|
||
## Deploying Stress Infrastructure into Cluster | ||
First, update the `./azure/parameters/dev.json` parameters file with the values marked `// add me`, then run: | ||
|
||
``` | ||
helm repo add chaos-mesh https://charts.chaos-mesh.org | ||
helm dependency update ./kubernetes/stress-infrastructure | ||
helm install stress-infra -n stress-infra --create-namespace ./kubernetes/stress-infrastructure | ||
./provision.ps1 -env dev | ||
``` | ||
|
||
## Test Cluster | ||
|
||
# Development | ||
|
||
Examples detailing the Azure Bicep DSL can be found [here](https://github.com/Azure/bicep/tree/main/docs/examples). | ||
|
||
Bicep also has a [VSCode extension](https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-bicep). | ||
The test cluster is the main ad-hoc cluster made available to SDK developers and partners. Changes to this cluster | ||
should be made carefully and announced in advance in order not to disrupt people's work. | ||
|
||
To validate file changes/compilation: | ||
|
||
``` | ||
az bicep build -f ./azure/main.bicep | ||
``` | ||
|
||
To deploy and access resources: | ||
|
||
./provision.ps1 -env test | ||
``` | ||
# Edit ./azure/parameters/dev.json, replacing // add me values | ||
# Add -c to dry run changes with a chance to confirm | ||
az deployment sub create -o json -n <your name> -l westus -f ./azure/main.bicep --parameters ./azure/parameters/dev.json | ||
|
||
# Copy the relevant outputs from the deployment to ./kubernetes/environments/<environment yaml file> | ||
# for deploying stress tests later on | ||
az deployment sub show -o json -n <your name> --query properties.outputs | ||
## Prod Cluster | ||
|
||
az aks list -g rg-stress-test-cluster-<group suffix parameter> | ||
az aks get-credentials stress-test -g rg-stress-test-cluster-<group suffix parameter> | ||
# Verify cluster access | ||
kubectl get pods | ||
# Install stress infrastructure components | ||
helm repo add chaos-mesh https://charts.chaos-mesh.org | ||
helm dependency update ./kubernetes/stress-infrastructure | ||
helm install stress-infra -n stress-infra --create-namespace ./kubernetes/stress-infrastructure | ||
kubectl get pods --namespace stress-infra | ||
``` | ||
|
||
To access the chaos-mesh dashboard, run the below command then navigate to `localhost:2333` in the browser: | ||
The "prod" cluster is the main cluster used for auto-deployment of checked-in stress tests via the StressTestRelease pipeline. | ||
Currently, new instances of all stress tests across the language repositories are deployed on a weekly cadence. | ||
Changes to the prod cluster should ideally be made around the stress test deployment cycle so as to avoid disruption | ||
of test metrics. | ||
|
||
``` | ||
kubectl port-forward -n stress-infra svc/chaos-dashboard 2333:2333 | ||
./provision.ps1 -env prod | ||
``` | ||
|
||
To remove AKS cluster stress testing resources: | ||
## Local Cluster | ||
|
||
``` | ||
helm uninstall stress-infra --namespace stress-infra | ||
``` | ||
For quick testing of various kubernetes configurations, it can be faster and cheaper to use a local cluster. | ||
Not all components of stress testing work in local clusters, however. If testing these components is necessary, the | ||
recommended action is to spin up a dev cluster. | ||
|
||
To remove Azure resources: | ||
NOTE: Chaos-Mesh may not work on all local deployments (e.g. Docker Desktop on Windows via WSL). | ||
It may be easier to test services, manifests and containers locally with KIND, and test chaos | ||
in an Azure AKS cluster (shared or personal). | ||
|
||
``` | ||
az group delete <resource group name> | ||
az keyvault purge -n <keyvault name> | ||
# Ensure docker is running | ||
kind create cluster | ||
``` | ||
|
||
# Building out the Main/Prod Testing Cluster | ||
|
||
If not already done, enable the relevant preview features in the subscription and CLI: | ||
- [AKS-AzureKeyVaultSecretsProvider](https://docs.microsoft.com/en-us/azure/aks/csi-secrets-store-driver#register-the-aks-azurekeyvaultsecretsprovider-preview-feature) | ||
|
||
## Initializing static identities | ||
|
||
The "official" stress testing clusters rely on a separately created keyvault containing secrets with subscription credentials for stress test resource deployments. | ||
The identities/credentials in these keyvaults can't be created via ARM/Bicep, and should be managed independently of the individual environments. | ||
|
||
To initialize these resources, if they don't exist: | ||
# Development | ||
|
||
``` | ||
az group create rg-StressTestSecrets | ||
az keyvault create -n StressTestSecrets -g rg-StressTestSecrets | ||
az ad sp create-for-rbac -n 'stress-test-provisioner' --role Contributor --scopes '/subscriptions/<subscription id>' | ||
``` | ||
## Bicep templates | ||
|
||
Create an env file with the service principal values created above: | ||
Examples detailing the Azure Bicep DSL can be found [here](https://github.com/Azure/bicep/tree/main/docs/examples). | ||
|
||
``` | ||
AZURE_CLIENT_OID=<app object id> | ||
AZURE_CLIENT_ID=<app id> | ||
AZURE_CLIENT_SECRET=<password/secret> | ||
AZURE_TENANT_ID=<tenant id> | ||
AZURE_SUBSCRIPTION_ID=<subscription id> | ||
``` | ||
Bicep also has a [VSCode extension](https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-bicep). | ||
|
||
Upload it to the static keyvault: | ||
To validate file changes/compilation: | ||
|
||
``` | ||
az keyvault secret set --vault-name StressTestSecrets -f ./<env file> -n public | ||
az bicep build -f ./azure/main.bicep | ||
``` | ||
|
||
## Building Out Stress Test Cluster Resources | ||
## Helm templates | ||
|
||
Various environment configurations are located in `./azure/parameters/<env>.json` to be configured when deploying. | ||
When making changes to `stress-test-addons`, it is easiest to validate them by building one of the [example projects | ||
](https://github.com/Azure/azure-sdk-tools/tree/main/tools/stress-cluster/chaos/examples). | ||
|
||
Deploy the cluster and related components (app insights, container registry, keyvault, access policies, etc.) | ||
First, update the `dependencies section of the example's `Chart.yaml` file to point to your local changes on disk: | ||
|
||
``` | ||
az deployment sub create -o json -n stress-test-deploy -l westus -f ./azure/main.bicep --parameters ./azure/parameters/test.json | ||
dependencies: | ||
- name: stress-test-addons | ||
version: <latest version on disk in stress-test-addons Chart.yaml> | ||
repository: https://stresstestcharts.blob.core.windows.net/helm/ | ||
repository: file:///<path to azure-sdk-tools repo>/tools/stress-cluster/cluster/kubernetes/stress-test-addons | ||
``` | ||
|
||
Gain access to the cluster and install the stress infrastructure components: | ||
Then you can test out the template changes by running, in the example stress test package directory: | ||
|
||
``` | ||
az aks get-credentials stress-test -g rg-stress-test-cluster-<group suffix> | ||
helm repo add chaos-mesh https://charts.chaos-mesh.org | ||
helm dependency update ./kubernetes/stress-infrastructure | ||
helm install stress-infra -n stress-infra --create-namespace ./kubernetes/stress-infrastructure | ||
helm template testrelease . | ||
``` | ||
|
||
Update the values in `./kubernetes/stress-test-addons/values.yaml` to match the deployment outputs and check in the changes. | ||
If there are any issues, the helm command will print any errors. If there are no errors, the rendered yaml | ||
may still be an invalid kubernetes manifest, so the example stress test should also be deployed to validate | ||
the full set of changes: | ||
|
||
``` | ||
az deployment sub show -o json -n <your name> --query properties.outputs | ||
# -Login only needs to be run once or if the azure container registry credentials have expired (~24 hours) | ||
<tools repo>/eng/common/scripts/stress-testing/deploy-stress-tests.ps1 -Login | ||
``` | ||
|
||
For more helm debugging info, see [here](https://helm.sh/docs/chart_template_guide/debugging/). |