Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Azure): add support for workload identity #3111

Closed
Closed
95 changes: 95 additions & 0 deletions docs/tutorials/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ The following fields are used:
* `aadClientID` and `aaClientSecret` are associated with the Service Principal. This is only used with Service Principal method documented in the next section.
* `useManagedIdentityExtension` - this is set to `true` if you use either AKS Kubelet Identity or AAD Pod Identities methods documented in the next section.
* `userAssignedIdentityID` - this contains the client id from the Managed identitty when using the AAD Pod Identities method documented in the next setion.
* `useWorkloadIdentityExtension` - this is set to `true` if you use Workload Identity method documented in the next section.

The Azure DNS provider expects, by default, that the configuration file is at `/etc/kubernetes/azure.json`. This can be overridden with the `--azure-config-file` option when starting ExternalDNS.

Expand All @@ -63,6 +64,7 @@ ExternalDNS needs permissions to make changes to the Azure DNS zone. There are t
- [Service Principal](#service-principal)
- [Managed Identity Using AKS Kubelet Identity](#managed-identity-using-aks-kubelet-identity)
- [Managed Identity Using AAD Pod Identities](#managed-identity-using-aad-pod-identities)
- [Managed Identity Using Workload Identity](#managed-identity-using-workload-identity)

### Service Principal

Expand Down Expand Up @@ -319,6 +321,99 @@ kubectl patch deployment external-dns --namespace "default" --patch \
'{"spec": {"template": {"metadata": {"labels": {"aadpodidbinding": "external-dns"}}}}}'
```

### Managed identity using Workload Identity

For this process, we will create a [managed identity](https://docs.microsoft.com//azure/active-directory/managed-identities-azure-resources/overview) that will be explicitly used by the ExternalDNS container. This process is somewhat similar to Pod Identity except that this managed identity is associated with a kubernetes service account.

#### Deploy OIDC issuer and Workload Identity services

Update your cluster to install [OIDC Issuer](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer) and [Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster):

```bash
$ AZURE_AKS_RESOURCE_GROUP="my-aks-cluster-group" # name of resource group where aks cluster was created
$ AZURE_AKS_CLUSTER_NAME="my-aks-cluster" # name of aks cluster previously created

$ az aks update --resource-group ${AZURE_AKS_RESOURCE_GROUP} --name ${AZURE_AKS_CLUSTER_NAME} --enable-oidc-issuer --enable-workload-identity
```

#### Create a managed identity

Create a managed identity:

```bash
$ IDENTITY_RESOURCE_GROUP=$AZURE_AKS_RESOURCE_GROUP # custom group or reuse AKS group
$ IDENTITY_NAME="example-com-identity"

# create a managed identity
$ az identity create --resource-group "${IDENTITY_RESOURCE_GROUP}" --name "${IDENTITY_NAME}"
```

#### Assign rights for the managed identity

Grant access to Azure DNS zone for the managed identity:

```bash
$ AZURE_DNS_ZONE_RESOURCE_GROUP="MyDnsResourceGroup" # name of resource group where dns zone is hosted
$ AZURE_DNS_ZONE="example.com" # DNS zone name like example.com or sub.example.com

# fetch identity client id from managed identity created earlier
$ IDENTITY_CLIENT_ID=$(az identity show --resource-group "${IDENTITY_RESOURCE_GROUP}" \
--name "${IDENTITY_NAME}" --query "clientId" --output tsv)
# fetch DNS id used to grant access to the managed identity
$ DNS_ID=$(az network dns zone show --name "${AZURE_DNS_ZONE}" \
--resource-group "${AZURE_DNS_ZONE_RESOURCE_GROUP}" --query "id" --output tsv)

$ az role assignment create --role "DNS Zone Contributor" \
--assignee "${IDENTITY_CLIENT_ID}" --scope "${DNS_ID}"
```

#### Create a configuration file for the managed identity

Create the file `azure.json` with the values from previous steps:

```bash
cat <<-EOF > /local/path/to/azure.json
{
"subscriptionId": "$(az account show --query id -o tsv)",
"resourceGroup": "$AZURE_DNS_ZONE_RESOURCE_GROUP",
"useWorkloadIdentityExtension": true
}
EOF
```

Use the `azure.json` file to create a Kubernetes secret:

```bash
$ kubectl create secret generic azure-config-file --namespace "default" --from-file /local/path/to/azure.json
```

#### Create a federated identity credential

A binding between the managed identity and the ExternalDNS service account needs to be setup by creating a federated identity resource:

```bash
$ OIDC_ISSUER_URL="$(az aks show -n myAKSCluster -g myResourceGroup --query "oidcIssuerProfile.issuerUrl" -otsv)"

$ az identity federated-credential create --name ${IDENTITY_NAME} --identity-name ${IDENTITY_NAME} --resource-group $AZURE_AKS_RESOURCE_GROUP} --issuer "$OIDC_ISSUER_URL" --subject "system:serviceaccount:default:external-dns"
```

#### Update labels and annotations on ExternalDNS service account

To instruct Workload Identity webhook to inject a projected token into the ExternalDNS pod, the pod needs to have a label `azure.workload.identity/use: "true"` (before Workload Identity 1.0.0, this label was supposed to be set on the service account instead). Also, the service account needs to have an annotation `azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>`:

To patch the existing serviceaccount and deployment, use the following command:

```bash
$ kubectl patch serviceaccount external-dns --namespace "default" --patch \
"{\"metadata\": {\"annotations\": {\"azure.workload.identity/client-id\": \"${IDENTITY_CLIENT_ID}\"}}}"
$ kubectl patch deployment external-dns --namespace "default" --patch \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this should be part of the deployment. https://github.com/kubernetes-sigs/external-dns/blob/master/charts/external-dns/templates/deployment.yaml

Add the client_id to the values.yaml and if .Values.provider==azure (provider already exists in the values.yaml) and if the client_id for workload identity was set, add the label to the deployment plus add the annotation to the service account

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebader Many months ago, when I was working on this section, I just tried to make it look similar to other sections of the guide (to keep the guide consistent). When looking at the official external-dns chart, it doesn't seem to have much focus on offering something special for the providers external-dns has integrations with. So, I'm not sure if the changes you suggested can get accepted by maintainers. Though, we can simply rely on podLabels and serviceAccount.annotations of the existing chart.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards sebader in that I don't think patching should be encouraged but I think I lean a bit towards weisdd with that podLabels, podAnnotations and serviceAccount.labels is the way to go and explicitly setting them. Having logic for automatically setting them feels fragile since Microsoft keeps changing the way things behave.

'{"spec": {"template": {"metadata": {"labels": {\"azure.workload.identity/use\": \"true\"}}}}}'
```

NOTE: it's also possible to specify (or override) ClientID through `UserAssignedIdentityID` field in `azure.json`.

Make sure the external-dns pod restarted to reflect the configuration change.

## Ingress used with ExternalDNS

This deployment assumes that you will be using nginx-ingress. When using nginx-ingress do not deploy it as a Daemon Set. This causes nginx-ingress to write the Cluster IP of the backend pods in the ingress status.loadbalancer.ip property which then has external-dns write the Cluster IP(s) in DNS vs. the nginx-ingress service external IP.
Expand Down
90 changes: 80 additions & 10 deletions provider/azure/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ limitations under the License.
package azure

import (
"context"
"fmt"
"io/ioutil"
"os"
"strings"

"github.com/Azure/go-autorest/autorest/adal"
Expand All @@ -29,16 +31,17 @@ import (

// config represents common config items for Azure DNS and Azure Private DNS
type config struct {
Cloud string `json:"cloud" yaml:"cloud"`
Environment azure.Environment `json:"-" yaml:"-"`
TenantID string `json:"tenantId" yaml:"tenantId"`
SubscriptionID string `json:"subscriptionId" yaml:"subscriptionId"`
ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
Location string `json:"location" yaml:"location"`
ClientID string `json:"aadClientId" yaml:"aadClientId"`
ClientSecret string `json:"aadClientSecret" yaml:"aadClientSecret"`
UseManagedIdentityExtension bool `json:"useManagedIdentityExtension" yaml:"useManagedIdentityExtension"`
UserAssignedIdentityID string `json:"userAssignedIdentityID" yaml:"userAssignedIdentityID"`
Cloud string `json:"cloud" yaml:"cloud"`
Environment azure.Environment `json:"-" yaml:"-"`
TenantID string `json:"tenantId" yaml:"tenantId"`
SubscriptionID string `json:"subscriptionId" yaml:"subscriptionId"`
ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
Location string `json:"location" yaml:"location"`
ClientID string `json:"aadClientId" yaml:"aadClientId"`
ClientSecret string `json:"aadClientSecret" yaml:"aadClientSecret"`
UseManagedIdentityExtension bool `json:"useManagedIdentityExtension" yaml:"useManagedIdentityExtension"`
UseWorkloadIdentityExtension bool `json:"useWorkloadIdentityExtension" yaml:"useWorkloadIdentityExtension"`
UserAssignedIdentityID string `json:"userAssignedIdentityID" yaml:"userAssignedIdentityID"`
}

func getConfig(configFile, resourceGroup, userAssignedIdentityClientID string) (*config, error) {
Expand Down Expand Up @@ -100,6 +103,45 @@ func getAccessToken(cfg config, environment azure.Environment) (*adal.ServicePri
return token, nil
}

// Try to retrieve token with Workload Identity.
if cfg.UseWorkloadIdentityExtension {
log.Info("Using workload identity extension to retrieve access token for Azure API.")

token, err := getWIToken(environment, cfg)
if err != nil {
return nil, err
}

// adal does not offer methods to dynamically replace a federated token, thus we need to have a wrapper to make sure
// we're using up-to-date secret while requesting an access token.
// NOTE: There's no RefreshToken in the whole process (in fact, it's absent in AAD responses). An AccessToken can be
// received only in exchange for a federated token.
var refreshFunc adal.TokenRefresh = func(context context.Context, resource string) (*adal.Token, error) {
newWIToken, err := getWIToken(environment, cfg)
if err != nil {
return nil, err
}

// An AccessToken gets populated into an spt only when .Refresh() is called. Normally, it's something that happens implicitly when
// a first request to manipulate Azure resources is made. Since our goal here is only to receive a fresh AccessToken, we need to make
// an explicit call.
// .Refresh() itself results in a call to Oauth endpoint. During the process, a federated token is exchanged for an AccessToken.
// RefreshToken is absent from responses.
err = newWIToken.Refresh()
if err != nil {
return nil, err
}

accessToken := newWIToken.Token()

return &accessToken, nil
}

token.SetCustomRefreshFunc(refreshFunc)

return token, nil
}

// Try to retrieve token with MSI.
if cfg.UseManagedIdentityExtension {
log.Info("Using managed identity extension to retrieve access token for Azure API.")
Expand All @@ -125,3 +167,31 @@ func getAccessToken(cfg config, environment azure.Environment) (*adal.ServicePri

return nil, fmt.Errorf("no credentials provided for Azure API")
}

// getWIToken prepares a token for a Workload Identity-enabled setup
func getWIToken(environment azure.Environment, cfg config) (*adal.ServicePrincipalToken, error) {
// NOTE: all related environment variables are described here: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html
oauthConfig, err := adal.NewOAuthConfig(environment.ActiveDirectoryEndpoint, os.Getenv("AZURE_TENANT_ID"))
if err != nil {
return nil, fmt.Errorf("failed to retrieve OAuth config: %v", err)
}

jwt, err := os.ReadFile(os.Getenv("AZURE_FEDERATED_TOKEN_FILE"))
if err != nil {
return nil, fmt.Errorf("failed to read a file with a federated token: %v", err)
}

// AZURE_CLIENT_ID will be empty in case azure.workload.identity/client-id annotation is not set
// Thus, it's important to offer optional ClientID overrides
clientID := os.Getenv("AZURE_CLIENT_ID")
if cfg.UserAssignedIdentityID != "" {
clientID = cfg.UserAssignedIdentityID
}

token, err := adal.NewServicePrincipalTokenFromFederatedToken(*oauthConfig, clientID, string(jwt), environment.ResourceManagerEndpoint)
if err != nil {
return nil, fmt.Errorf("failed to create a workload identity token: %v", err)
}

return token, nil
}
119 changes: 119 additions & 0 deletions provider/azure/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,19 @@ limitations under the License.
package azure

import (
"encoding/json"
"fmt"
"io"
"io/ioutil"
"net/http"
"net/http/httptest"
"os"
"reflect"
"testing"

"github.com/Azure/go-autorest/autorest/adal"
"github.com/Azure/go-autorest/autorest/azure"
"github.com/stretchr/testify/assert"
)

func TestGetAzureEnvironmentConfig(t *testing.T) {
Expand Down Expand Up @@ -66,3 +72,116 @@ func TestGetAzureEnvironmentConfig(t *testing.T) {
})
}
}

func populateFederatedToken(t *testing.T, filename string, content string) {
t.Helper()

f, err := os.Create(filename)
if err != nil {
assert.FailNow(t, err.Error())
}

if _, err := io.WriteString(f, content); err != nil {
assert.FailNow(t, err.Error())
}

if err := f.Close(); err != nil {
assert.FailNow(t, err.Error())
}
}

func TestGetAccessTokenWorkloadIdentity(t *testing.T) {
// Create a file that will be used to store a federated token
f, err := os.CreateTemp("", "")
if err != nil {
assert.FailNow(t, err.Error())
}
defer os.Remove(f.Name())

// Close the file to simplify logic within populateFederatedToken helper
if err := f.Close(); err != nil {
assert.FailNow(t, err.Error())
}

// The initial federated token is never used, so we don't care about the value yet
// Though, it's a requirement from adal to have a non-empty value set
populateFederatedToken(t, f.Name(), "random-jwt")

// Envs are described here: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html
t.Setenv("AZURE_TENANT_ID", "fakeTenantID")
t.Setenv("AZURE_CLIENT_ID", "fakeClientID")
t.Setenv("AZURE_FEDERATED_TOKEN_FILE", f.Name())

t.Run("token refresh", func(t *testing.T) {
// Basically, we want one token to be exchanged for the other (key and value respectively)
tokens := map[string]string{
"initialFederatedToken": "initialAccessToken",
"refreshedFederatedToken": "refreshedAccessToken",
}

ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if err := r.ParseForm(); err != nil {
assert.FailNow(t, err.Error())
}

w.Header().Set("Content-Type", "application/json")
receivedFederatedToken := r.FormValue("client_assertion")
accessToken := adal.Token{AccessToken: tokens[receivedFederatedToken]}

if err := json.NewEncoder(w).Encode(accessToken); err != nil {
assert.FailNow(t, err.Error())
}

// Expected format: http://<server>/<tenant-ID>/oauth2/token?api-version=1.0
assert.Contains(t, r.RequestURI, os.Getenv("AZURE_TENANT_ID"), "URI should contain the tenant ID exposed through env variable")

assert.Equal(t, os.Getenv("AZURE_CLIENT_ID"), r.FormValue("client_id"), "client_id should match the value exposed through env variable")
}))
defer ts.Close()

env := azure.Environment{ActiveDirectoryEndpoint: ts.URL, ResourceManagerEndpoint: ts.URL}

cfg := config{
UseWorkloadIdentityExtension: true,
}

token, err := getAccessToken(cfg, env)
assert.NoError(t, err)

for federatedToken, accessToken := range tokens {
populateFederatedToken(t, f.Name(), federatedToken)
assert.NoError(t, token.Refresh(), "Token refresh failed")
assert.Equal(t, accessToken, token.Token().AccessToken, "Access token should have been set to a value returned by the webserver")
}
})

t.Run("clientID overrides through UserAssignedIdentityID section", func(t *testing.T) {
cfg := config{
UseWorkloadIdentityExtension: true,
UserAssignedIdentityID: "overridenClientID",
}

ts := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if err := r.ParseForm(); err != nil {
assert.FailNow(t, err.Error())
}

w.Header().Set("Content-Type", "application/json")
accessToken := adal.Token{AccessToken: "abc"}

if err := json.NewEncoder(w).Encode(accessToken); err != nil {
assert.FailNow(t, err.Error())
}

assert.Equal(t, cfg.UserAssignedIdentityID, r.FormValue("client_id"), "client_id should match the value passed through managedIdentity section")
}))
defer ts.Close()

env := azure.Environment{ActiveDirectoryEndpoint: ts.URL, ResourceManagerEndpoint: ts.URL}

token, err := getAccessToken(cfg, env)
assert.NoError(t, err)

assert.NoError(t, token.Refresh(), "Token refresh failed")
})
}