Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Azure): add support for workload identity #3111

Closed
Closed
112 changes: 112 additions & 0 deletions docs/tutorials/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ The following fields are used:
* `aadClientID` and `aaClientSecret` are associated with the Service Principal. This is only used with Service Principal method documented in the next section.
* `useManagedIdentityExtension` - this is set to `true` if you use either AKS Kubelet Identity or AAD Pod Identities methods documented in the next section.
* `userAssignedIdentityID` - this contains the client id from the Managed identitty when using the AAD Pod Identities method documented in the next setion.
* `useWorkloadIdentityExtension` - this is set to `true` if you use Workload Identity method documented in the next section.

The Azure DNS provider expects, by default, that the configuration file is at `/etc/kubernetes/azure.json`. This can be overridden with the `--azure-config-file` option when starting ExternalDNS.

Expand All @@ -63,6 +64,7 @@ ExternalDNS needs permissions to make changes to the Azure DNS zone. There are t
- [Service Principal](#service-principal)
- [Managed Identity Using AKS Kubelet Identity](#managed-identity-using-aks-kubelet-identity)
- [Managed Identity Using AAD Pod Identities](#managed-identity-using-aad-pod-identities)
- [Managed Identity Using Workload Identity](#managed-identity-using-workload-identity)

### Service Principal

Expand Down Expand Up @@ -319,6 +321,116 @@ kubectl patch deployment external-dns --namespace "default" --patch \
'{"spec": {"template": {"metadata": {"labels": {"aadpodidbinding": "external-dns"}}}}}'
```

### Managed identity using Workload Identity

For this process, we will create a [managed identity](https://docs.microsoft.com//azure/active-directory/managed-identities-azure-resources/overview) that will be explicitly used by the ExternalDNS container. This process is somewhat similar to Pod Identity except that this managed identity is associated with a kubernetes service account.

#### Enable the Worload Identity feature

To enable [Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) preview feature, use the following commands:

```bash
$ az extension add --name aks-preview
$ az extension update --name aks-preview
$ az feature register --namespace "Microsoft.ContainerService" --name "EnableWorkloadIdentityPreview"

# After several minutes, the feature should become registered. You can check that by periodically running the following command
$ az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/EnableWorkloadIdentityPreview')].{Name:name,State:properties.state}"

# Once the process is completed, refresh the registration of resource provider
$ az provider register --namespace Microsoft.ContainerService
```

#### Deploy OIDC issuer and Workload Identity services

Once enabled, you can update your cluster and install needed services for the [Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) feature:

```bash
$ AZURE_AKS_RESOURCE_GROUP="my-aks-cluster-group" # name of resource group where aks cluster was created
$ AZURE_AKS_CLUSTER_NAME="my-aks-cluster" # name of aks cluster previously created

$ az aks update --resource-group ${AZURE_AKS_RESOURCE_GROUP} --name ${AZURE_AKS_CLUSTER_NAME} --enable-oidc-issuer --enable-workload-identity
```

#### Create a managed identity

Create a managed identity:

```bash
$ IDENTITY_RESOURCE_GROUP=$AZURE_AKS_RESOURCE_GROUP # custom group or reuse AKS group
$ IDENTITY_NAME="example-com-identity"

# create a managed identity
$ az identity create --resource-group "${IDENTITY_RESOURCE_GROUP}" --name "${IDENTITY_NAME}"
```

#### Assign rights for the managed identity

Grant access to Azure DNS zone for the managed identity:

```bash
$ AZURE_DNS_ZONE_RESOURCE_GROUP="MyDnsResourceGroup" # name of resource group where dns zone is hosted
$ AZURE_DNS_ZONE="example.com" # DNS zone name like example.com or sub.example.com

# fetch identity client id from managed identity created earlier
$ IDENTITY_CLIENT_ID=$(az identity show --resource-group "${IDENTITY_RESOURCE_GROUP}" \
--name "${IDENTITY_NAME}" --query "clientId" --output tsv)
# fetch DNS id used to grant access to the managed identity
$ DNS_ID=$(az network dns zone show --name "${AZURE_DNS_ZONE}" \
--resource-group "${AZURE_DNS_ZONE_RESOURCE_GROUP}" --query "id" --output tsv)

$ az role assignment create --role "DNS Zone Contributor" \
--assignee "${IDENTITY_CLIENT_ID}" --scope "${DNS_ID}"
```

#### Create a configuration file for the managed identity

Create the file `azure.json` with the values from previous steps:

```bash
cat <<-EOF > /local/path/to/azure.json
{
"subscriptionId": "$(az account show --query id -o tsv)",
"resourceGroup": "$AZURE_DNS_ZONE_RESOURCE_GROUP",
"useWorkloadIdentityExtension": true
}
EOF
```

Use the `azure.json` file to create a Kubernetes secret:

```bash
$ kubectl create secret generic azure-config-file --namespace "default" --from-file /local/path/to/azure.json
```

#### Create a federated identity credential

A binding between the managed identity and the ExternalDNS service account needs to be setup by creating a federated identity resource:

```bash
$ OIDC_ISSUER_URL="$(az aks show -n myAKSCluster -g myResourceGroup --query "oidcIssuerProfile.issuerUrl" -otsv)"

$ az identity federated-credential create --name ${IDENTITY_NAME} --identity-name ${IDENTITY_NAME} --resource-group $AZURE_AKS_RESOURCE_GROUP} --issuer "$OIDC_ISSUER_URL" --subject "system:serviceaccount:default:external-dns"
```

#### Update labels and annotations on ExternalDNS service account

To instruct Workload Identity webhook to inject a projected token into the ExternalDNS pod, the service account needs to have a label `azure.workload.identity/use: "true"` and an annotation `azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>`:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weisdd - Went through the tutorial to test out building AKS cluster with workload identity and getting external DNS fully working only using managed identity. Thanks so much for the hard work on this and really hoping we can get this merged in the near future.

There was one step that was missing which is adding the necessary annotations to the pod (not just the service account). There were a few errors that keep showing up about the token missing from the azure.json file, so after some research found this portion in the docs that should be included in this tutorial for others that are trying to enable workload identity for their AKS clusters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod label was not a requirement during the original private preview. It was announced/communicated it was going to be required for GA, and was then subsequently rolled out to AKS in the next update (still during preview!). Point being, it wasn't a requirement when those instructions were written 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slickmode Thanks for the review!
Indeed, the tutorial was written before v1.0.0-beta.0 was released last month, which introduced the need for the extra label to limit the number of pods intercepted by a webhook (through an object selector) and enforce failurePolicy: Fail. As far as I understand, the design for annotations and labels is finalized now, there's already v1.0.0-rc.0 release, v1.0.0 is expected in a few weeks time (hopefully, it won't get delayed).
I'd be happy to update my PR to include the label in the guide, just wasn't sure if there's any chance for the PR to be reviewed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slickmode I've just pushed the updated guide to make sure we won't forget it whenever maintainers get to review the PR.
@pinkfloydx33 thanks for your comments here and in other places, they're always helpful :)


To patch the existing serviceaccount, use the following command:

```bash
$ kubectl patch serviceaccount external-dns --namespace "default" --patch \
"{\"metadata\": {\"labels\": {\"azure.workload.identity/use\": \"true\"}, \"annotations\": {\"azure.workload.identity/client-id\": \"${IDENTITY_CLIENT_ID}\"}}}"
```

NOTE: it's also possible to specify (or override) ClientID through `UserAssignedIdentityID` field in `azure.json`.

If a pod with external-dns is already running, you need to restart it:
```bash
$ kubectl rollout restart deployment/external-dns
```

## Ingress used with ExternalDNS

This deployment assumes that you will be using nginx-ingress. When using nginx-ingress do not deploy it as a Daemon Set. This causes nginx-ingress to write the Cluster IP of the backend pods in the ingress status.loadbalancer.ip property which then has external-dns write the Cluster IP(s) in DNS vs. the nginx-ingress service external IP.
Expand Down
90 changes: 80 additions & 10 deletions provider/azure/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ limitations under the License.
package azure

import (
"context"
"fmt"
"io/ioutil"
"os"
"strings"

"github.com/Azure/go-autorest/autorest/adal"
Expand All @@ -29,16 +31,17 @@ import (

// config represents common config items for Azure DNS and Azure Private DNS
type config struct {
Cloud string `json:"cloud" yaml:"cloud"`
Environment azure.Environment `json:"-" yaml:"-"`
TenantID string `json:"tenantId" yaml:"tenantId"`
SubscriptionID string `json:"subscriptionId" yaml:"subscriptionId"`
ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
Location string `json:"location" yaml:"location"`
ClientID string `json:"aadClientId" yaml:"aadClientId"`
ClientSecret string `json:"aadClientSecret" yaml:"aadClientSecret"`
UseManagedIdentityExtension bool `json:"useManagedIdentityExtension" yaml:"useManagedIdentityExtension"`
UserAssignedIdentityID string `json:"userAssignedIdentityID" yaml:"userAssignedIdentityID"`
Cloud string `json:"cloud" yaml:"cloud"`
Environment azure.Environment `json:"-" yaml:"-"`
TenantID string `json:"tenantId" yaml:"tenantId"`
SubscriptionID string `json:"subscriptionId" yaml:"subscriptionId"`
ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
Location string `json:"location" yaml:"location"`
ClientID string `json:"aadClientId" yaml:"aadClientId"`
ClientSecret string `json:"aadClientSecret" yaml:"aadClientSecret"`
UseManagedIdentityExtension bool `json:"useManagedIdentityExtension" yaml:"useManagedIdentityExtension"`
UseWorkloadIdentityExtension bool `json:"useWorkloadIdentityExtension" yaml:"useWorkloadIdentityExtension"`
UserAssignedIdentityID string `json:"userAssignedIdentityID" yaml:"userAssignedIdentityID"`
}

func getConfig(configFile, resourceGroup, userAssignedIdentityClientID string) (*config, error) {
Expand Down Expand Up @@ -100,6 +103,45 @@ func getAccessToken(cfg config, environment azure.Environment) (*adal.ServicePri
return token, nil
}

// Try to retrieve token with Workload Identity.
if cfg.UseWorkloadIdentityExtension {
log.Info("Using workload identity extension to retrieve access token for Azure API.")

token, err := getWIToken(environment, cfg)
if err != nil {
return nil, err
}

// adal does not offer methods to dynamically replace a federated token, thus we need to have a wrapper to make sure
// we're using up-to-date secret while requesting an access token.
// NOTE: There's no RefreshToken in the whole process (in fact, it's absent in AAD responses). An AccessToken can be
// received only in exchange for a federated token.
var refreshFunc adal.TokenRefresh = func(context context.Context, resource string) (*adal.Token, error) {
newWIToken, err := getWIToken(environment, cfg)
if err != nil {
return nil, err
}

// An AccessToken gets populated into an spt only when .Refresh() is called. Normally, it's something that happens implicitly when
// a first request to manipulate Azure resources is made. Since our goal here is only to receive a fresh AccessToken, we need to make
// an explicit call.
// .Refresh() itself results in a call to Oauth endpoint. During the process, a federated token is exchanged for an AccessToken.
// RefreshToken is absent from responses.
err = newWIToken.Refresh()
if err != nil {
return nil, err
}

accessToken := newWIToken.Token()

return &accessToken, nil
}

token.SetCustomRefreshFunc(refreshFunc)

return token, nil
}

// Try to retrieve token with MSI.
if cfg.UseManagedIdentityExtension {
log.Info("Using managed identity extension to retrieve access token for Azure API.")
Expand All @@ -125,3 +167,31 @@ func getAccessToken(cfg config, environment azure.Environment) (*adal.ServicePri

return nil, fmt.Errorf("no credentials provided for Azure API")
}

// getWIToken prepares a token for a Workload Identity-enabled setup
func getWIToken(environment azure.Environment, cfg config) (*adal.ServicePrincipalToken, error) {
// NOTE: all related environment variables are described here: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html
oauthConfig, err := adal.NewOAuthConfig(environment.ActiveDirectoryEndpoint, os.Getenv("AZURE_TENANT_ID"))
if err != nil {
return nil, fmt.Errorf("failed to retrieve OAuth config: %v", err)
}

jwt, err := os.ReadFile(os.Getenv("AZURE_FEDERATED_TOKEN_FILE"))
if err != nil {
return nil, fmt.Errorf("failed to read a file with a federated token: %v", err)
}

// AZURE_CLIENT_ID will be empty in case azure.workload.identity/client-id annotation is not set
// Thus, it's important to offer optional ClientID overrides
clientID := os.Getenv("AZURE_CLIENT_ID")
if cfg.UserAssignedIdentityID != "" {
clientID = cfg.UserAssignedIdentityID
}

token, err := adal.NewServicePrincipalTokenFromFederatedToken(*oauthConfig, clientID, string(jwt), environment.ResourceManagerEndpoint)
if err != nil {
return nil, fmt.Errorf("failed to create a workload identity token: %v", err)
}

return token, nil
}
119 changes: 119 additions & 0 deletions provider/azure/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,19 @@ limitations under the License.
package azure

import (
"encoding/json"
"fmt"
"io"
"io/ioutil"
"net/http"
"net/http/httptest"
"os"
"reflect"
"testing"

"github.com/Azure/go-autorest/autorest/adal"
"github.com/Azure/go-autorest/autorest/azure"
"github.com/stretchr/testify/assert"
)

func TestGetAzureEnvironmentConfig(t *testing.T) {
Expand Down Expand Up @@ -66,3 +72,116 @@ func TestGetAzureEnvironmentConfig(t *testing.T) {
})
}
}

func populateFederatedToken(t *testing.T, filename string, content string) {
t.Helper()

f, err := os.Create(filename)
if err != nil {
assert.FailNow(t, err.Error())
}

if _, err := io.WriteString(f, content); err != nil {
assert.FailNow(t, err.Error())
}

if err := f.Close(); err != nil {
assert.FailNow(t, err.Error())
}
}

func TestGetAccessTokenWorkloadIdentity(t *testing.T) {
// Create a file that will be used to store a federated token
f, err := os.CreateTemp("", "")
if err != nil {
assert.FailNow(t, err.Error())
}
defer os.Remove(f.Name())

// Close the file to simplify logic within populateFederatedToken helper
if err := f.Close(); err != nil {
assert.FailNow(t, err.Error())
}

// The initial federated token is never used, so we don't care about the value yet
// Though, it's a requirement from adal to have a non-empty value set
populateFederatedToken(t, f.Name(), "random-jwt")

// Envs are described here: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html
t.Setenv("AZURE_TENANT_ID", "fakeTenantID")
t.Setenv("AZURE_CLIENT_ID", "fakeClientID")
t.Setenv("AZURE_FEDERATED_TOKEN_FILE", f.Name())

t.Run("token refresh", func(t *testing.T) {
// Basically, we want one token to be exchanged for the other (key and value respectively)
tokens := map[string]string{
"initialFederatedToken": "initialAccessToken",
"refreshedFederatedToken": "refreshedAccessToken",
}

ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if err := r.ParseForm(); err != nil {
assert.FailNow(t, err.Error())
}

w.Header().Set("Content-Type", "application/json")
receivedFederatedToken := r.FormValue("client_assertion")
accessToken := adal.Token{AccessToken: tokens[receivedFederatedToken]}

if err := json.NewEncoder(w).Encode(accessToken); err != nil {
assert.FailNow(t, err.Error())
}

// Expected format: http://<server>/<tenant-ID>/oauth2/token?api-version=1.0
assert.Contains(t, r.RequestURI, os.Getenv("AZURE_TENANT_ID"), "URI should contain the tenant ID exposed through env variable")

assert.Equal(t, os.Getenv("AZURE_CLIENT_ID"), r.FormValue("client_id"), "client_id should match the value exposed through env variable")
}))
defer ts.Close()

env := azure.Environment{ActiveDirectoryEndpoint: ts.URL, ResourceManagerEndpoint: ts.URL}

cfg := config{
UseWorkloadIdentityExtension: true,
}

token, err := getAccessToken(cfg, env)
assert.NoError(t, err)

for federatedToken, accessToken := range tokens {
populateFederatedToken(t, f.Name(), federatedToken)
assert.NoError(t, token.Refresh(), "Token refresh failed")
assert.Equal(t, accessToken, token.Token().AccessToken, "Access token should have been set to a value returned by the webserver")
}
})

t.Run("clientID overrides through UserAssignedIdentityID section", func(t *testing.T) {
cfg := config{
UseWorkloadIdentityExtension: true,
UserAssignedIdentityID: "overridenClientID",
}

ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if err := r.ParseForm(); err != nil {
assert.FailNow(t, err.Error())
}

w.Header().Set("Content-Type", "application/json")
accessToken := adal.Token{AccessToken: "abc"}

if err := json.NewEncoder(w).Encode(accessToken); err != nil {
assert.FailNow(t, err.Error())
}

assert.Equal(t, cfg.UserAssignedIdentityID, r.FormValue("client_id"), "client_id should match the value passed through managedIdentity section")
}))
defer ts.Close()

env := azure.Environment{ActiveDirectoryEndpoint: ts.URL, ResourceManagerEndpoint: ts.URL}

token, err := getAccessToken(cfg, env)
assert.NoError(t, err)

assert.NoError(t, token.Refresh(), "Token refresh failed")
})
}