This control-plane seed provides the "dial tone" infrastructure services to support the Network Observability solution, however, it can be repurposed for use with other Kubernetes-based solutions. It is built on the Coral platform and largely follows patterns established there.
.github/workflows
- Runs a workflow on each push to transform Coral entities into cluster gitops repo YAML to be processed by Fluxapplications
<workspace-name>
ApplicationRegistrations
- defines theApplicationRegistrations
for a given workspace (sample)ManifestDeployments
- defines theManifestDeployments
(dialtone services) for a given workspace
assignments
- holds the application:cluster assignments after Coral processes the repoclusters
- defines theClusters
in your platform (sample)manifests
- holds Kubernetes YAML for use withManifestDeployments
templates
- defines the availableApplicationTemplates
in your platform (sample)workspaces
- defines theWorkspaces
in your platform (sample)
To get started, see the platform setup instructions in the main Coral repo.
Before getting started, please review the official Coral docs. In particular the docs for platform setup and registering a Kubernetes cluster.
The control-plane's CI / CD pipelines uses a service principal to authenticate to Azure to manage SOPS keys in Azure Key Vault and query cluster credentials to deploy those keys.
Create a service principal:
az ad sp create-for-rbac --name "<your service principal name>" --role contributor \
--scopes /subscriptions/<your subscription id>/resourceGroups/<your resource group> \
--sdk-auth
For example:
$ az ad sp create-for-rbac --name "github-actions" --role contributor \
--scopes /subscriptions/11111111-1111-1111-1111-111111111111/resourceGroups/mission-cloud \
--sdk-auth
{
"clientId": "22222222-2222-2222-2222-222222222222",
"clientSecret": "",
"subscriptionId": "",
"tenantId": "",
"activeDirectoryEndpointUrl": "https://login.microsoftonline.us",
"resourceManagerEndpointUrl": "https://management.usgovcloudapi.net/",
"activeDirectoryGraphResourceId": "https://graph.windows.net/",
"sqlManagementEndpointUrl": "https://management.core.usgovcloudapi.net:8443/",
"galleryEndpointUrl": "https://gallery.usgovcloudapi.net/",
"managementEndpointUrl": "https://management.core.usgovcloudapi.net/"
}
Save the object produced by this command as it will be used to populate the AZURE_CREDENTIALS
environment variable in the GitHub / GitLab repository.
Note: The SP and the AKS cluster (that will be created in the next steps) should be created under the same resource group
Next, assign this service principal permissions to work with secrets on your Azure Key Vault instance.
az keyvault set-policy --name <your AKV name> \
--resource-group <your resource group> \
--spn <your service principal clientId> \
--key-permissions all \
--secret-permissions all
For example, using the service principal created above:
az keyvault set-policy --name mission-cloud-kv \
--resource-group mission-cloud \
--spn 22222222-2222-2222-2222-222222222222 \
--key-permissions all \
--secret-permissions all
The following environment variables need to be configured on the control-plane repository.
Secrets can be created by following the below steps:
GitHub
- GitHub Manual Way Or
- By using gh secret CLI and
env
file as below
GitLab
- GitLab Manual Way Or
- By using a GitLab Rest API
NAME | REQUIRED (Y/N) | PURPOSE / EXAMPLE VALUES |
---|---|---|
AZURE_CLOUD | Y | The Azure cloud environment in which you Key Vault instance is deployed (AzureCloud, AzureUSGovernment, etc.). |
AZURE_CREDENTIALS | Y | The service principal credentials created in the Create an Azure Service Principal section which have access to the Azure Key Vault instance. The entire JSON object produced by the az ad sp create-for-rbac should be assigned. |
AKV_NAME | Y | The name of your Azure Key Vault instance. |
SOPS_KEY_NAME | Y | The name of the AKV secret containing the SOPS key. The reccomended value is sops-age , but any desired name can be used. This secret does not need to exist in AKV. If it doesn't exist, it will be created with the name configured with this name when the pipeline runs. |
SOPS_PUBLIC_KEY | Y | The AGE public key string used encrypt SOPS secrets. This value is generated when creating an AGE key pair. |
SS_PUBLIC_KEY | Y | The public key certificate used to encrypt Sealed Secrets using kubeseal. This is either generated automatically by the sealed-secrets-controller or is an existing public / private key pair used when deploying Sealed Secrets. |
GITOPS_PAT | Y | Your PAT with access to the GitOps repository. This should have been automatically created by Coral. |
After deploying this control-plane, the first step to get started is to register a cluster. Clusters are registered by creating a kind: Cluster
object as a yaml file in the clusters directory The following labels should be included in the cluster registration:
- aksClusterName: The name of the AKS cluster in which this cluster will run. This is used to query connection info and credentials when deploying SOPS keys.
- aksClusterResourceGroup: The resource group containing the AKS cluster in which this cluster will run. This is used to query connection info and credentials when deploying SOPS keys. The yaml file must match the desired cluster name.
For example, if the cluster is named as usgovvirginia-1
, its registration might look like:
cat <<EOF > clusters/usgovvirginia-1.yaml
kind: Cluster
metadata:
name: usgovvirginia-1
labels:
cloud: azure
region: usgovvirginia
aksClusterName: mission-cloud-aks
aksClusterResourceGroup: mission-cloud
spec:
environments:
- dev
- prod
EOF
After committing these changes, the corresponding files will be created in the gitops repo. More info can be found in the Coral docs on registering a cluster.
Note: The pipeline will fail after committing these changes. That's ok, we havent' finished setting things up. It will succeed after setup is complete
The flux-system
namespace needs to be created so SOPS secrets can be deployed and available before Flux attempts to bootstrap the cluster. Create the flux-system
namespace:
kubectl create namespace flux-system
Before proceeding, we need to setup environment variables. The main list of required variables is in the Coral docs and should take precedence over any listed here.
export GITHUB_TOKEN="github-token" # (eg. ghp_ZsPfZbeefLyeCa8deadEmFVupxAZYT285CjY)
export GITHUB_OWNER="username" # (eg. contoso)
export GITOPS_REPO="repo-name" # (eg. cluster-gitops)
export CLUSTER_NAME="cluster-name" # (eg. azure-eastus2-1) note: will install flux-system to your new cluster
Next, we need to generate Istio certificates for securing both inter-cluster communications. Instructions can be found in the Inter-Cluster Certificate Management docs.
Next, we need to generate Istio certificates for securing gateway communications. Instructions can be found in the Gateway Certificate Management docs.
The next step is to bootstrap your Kubernetes cluster to sync Flux with your source control repository.
flux bootstrap github --owner=$GITHUB_OWNER \
--repository=$GITOPS_REPO \
--branch=main \
--path=clusters/$CLUSTER_NAME \
--personal \
--network-policy=false
GitHub
.github/workflows
- Runs a workflow for github on each push to transform Coral entities into cluster gitops repo YAML to be processed by Flux- Initially when the new repo is created by
coral init
using this Network-observability-control-plane-seed, during the repo creation process the CI-CD variableGITOPS_PAT
gets created which will be used for the github workflow.
GitLab
.gitlab-ci.yml
- Runs a workflow for gitlab on each push to transform Coral entities into cluster gitops repo YAML to be processed by Flux
Note: Workflow for gitlab requires a gitlab runner to be configured. For more info on configuring GitLab Runner, please refer to the GitLab Runner docs.
- Create a Access Token under Group Level under group Setting -> Access Token
- Next use this token value for the ACCESS_TOKEN variable under group Setting -> CI/CD -> Variables
ACESS_TOKEN:value
- Note: Only group Owner will have permission to perform this action
- This variable will be automatically inherited by all the projects created under this group and can be viewed at Setting -> CICD -> Variables -> Group Variables (inherited)
Repo action secrets can be modified at Repo -> Settings -> Secrets -> Actions -> New repository secret
NAME | REQUIRED (Y/N) | PURPOSE / EXAMPLE VALUES |
---|---|---|
GITOPS_REPO_NASME | Y | repo name of the control plane created / test-control-plane |
Centralized logging is an important component of any production-grade infrastructure, but it is especially critical in a containerized architecture. If you’re using Kubernetes to run your workloads, there’s no easy way to find the correct log “files” on one of the many worker nodes in your cluster. Kubernetes will reschedule your pods between different physical servers or cloud instances. Pod logs can be lost or, if a pod crashes, the logs may also get deleted from disk. Without a centralized logging solution, it is practically impossible to find a particular log file located somewhere on one of the hundreds or thousands of worker nodes. For this reason, any production-grade cluster should have its log collector agents configured on all nodes and use a centralized storage.
This control-plane leverages the following components to implement centralized logging:
- Fluentbit: Log collection and aggregation
- Elasticsearch: Log storage and search capabilities
- Kibana: Log visualizations
Each component is implemented as a Flux HelmRelease and values are applied via Coral patches.
For more info, please refer to the Centralized Logging docs.
In order to implement traffic management features such as canary and a / b deployments, the Network Observability control-plane plans to leverage a Service Mesh. A service mesh is a dedicated infrastructure layer that you can add to your applications. It allows you to transparently add capabilities like observability, traffic management, and security, without adding them to your own code. Due to its large number of features and widespread industry adoption, our initial choice is Istio. This spike evaluates Istio, provides an overview of its features and configuration and compares it to other popular technologies such as Linkerd.
This control-plane leverages Istio to implement service mesh using the following components:
- istio-base: Installs and configures Istio CRDs
- istiod: Installs and configures Istio control-plane
- istio-gateway: Installs and configures Istio ingress gateway
Each component is implemented as a Flux HelmRelease and values are applied via Coral patches.
For more info, please refer to the Service Mesh docs.
This control-plane leverages Prometheus, Grafana, Istio and Zipkin to implement observability.
- Prometheus: Metric collection and persistence
- Grafana: Metric visualization and dashboards
- Istio: Service Mesh and distributed tracing metrics
- Zipkin: Distributed tracing monitoring and visualizations
Each component is implemented as a Flux HelmRelease and values are applied via Coral patches.
For more info, please refer to the Observability and Monitoring docs.
For more info, please refer to the Secret Management docs.
In order to use the control-plane, gateway certificates need to be created and deployed to the cluster.
For more info, please refer to the Certificate Management docs.
The Platform team(Network Observability control-plane) could be contacted by the Service/App team(net-obs-stats-generator app service) to get the network-observability dashboard configured for them in the control-plane.
Only when the platform team recieves any such requests, the platform team needs to configure the service specific dashboards for the App team. In order to configure the service specific dashboards for App team please do refer to Configure Service Dashboards docs.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.