Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network restrictions limitations of Azure Storage Account with AKS #1899

Closed
aristosvo opened this issue Oct 13, 2020 · 6 comments
Closed

Network restrictions limitations of Azure Storage Account with AKS #1899

aristosvo opened this issue Oct 13, 2020 · 6 comments
Labels
question resolution/answer-provided Provided answer to issue, question or feedback.

Comments

@aristosvo
Copy link

aristosvo commented Oct 13, 2020

What happened:

We installed Velero with the Azure plugin (https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure) with AAD Pod Identity. First PoC worked amazing, until we changed to an Azure Storage Account with IP restrictions with our AKS outbound static IP in the allow list.

time="2020-10-13T15:45:40Z" level=error msg="Error listing backups in backup store" backupLocation=default controller=backup-sync error="rpc error: code = Unknown desc = storage: service returned error: StatusCode=403, ErrorCode=AuthorizationFailure, ErrorMessage=This request is not authorized to perform this operation.\nRequestId:<REDACTED>\nTime:2020-10-13T15:45:40.8331243Z, RequestInitiated=Tue, 13 Oct 2020 15:45:39 GMT, RequestId=<REDACTED>, API Version=, QueryParameterName=, QueryParameterValue=" error.file="/go/src/velero-plugin-for-microsoft-azure/velero-plugin-for-microsoft-azure/object_store.go:395" error.function="main.(*ObjectStore).ListCommonPrefixes" logSource="pkg/controller/backup_sync_controller.go:175"

What you expected to happen:
IP limits on the connection to Azure Storage from user space should be possible by allowing only outbound AKS IPs. Based on this information I get a feeling where it might go wrong, but I don't understand why.

How to reproduce it (as minimally and precisely as possible):

resource "azurerm_resource_group" "rg" {
  name     = "${var.context.prefix}-backup-rg"
  location = var.context.location
  tags     = var.context.tags
}

resource "azurerm_storage_account" "sa-backup" {
  name                      = "${var.context.project}${var.context.environment}backupsa"
  resource_group_name       = azurerm_resource_group.rg.name
  location                  = var.context.location
  account_kind              = "BlobStorage"
  account_tier              = "Standard"
  account_replication_type  = "LRS"
  is_hns_enabled            = false
  enable_https_traffic_only = true

  
  // Networking restrictions not working on Azure Storage Accounts in combination with Velero.
  // Issue reported: 
  // - https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure/issues/75
  // - https://github.com/Azure/AKS/issues/1899
  // https://docs.microsoft.com/en-us/azure/aks/troubleshooting#error-when-enabling-allow-access-allow-access-from-selected-network-setting-on-storage-account
  // network_rules {
  //   default_action = "Deny"
  //   bypass         = ["None"]
  //   ip_rules       = concat(list(var.security.terraform_client_ip), var.security.ip_rules_storage, list(var.aks_outbound_ip))
  // }
  network_rules {
     default_action = "Allow"
  }

  tags = var.context.tags
}

resource "azurerm_storage_container" "sc" {
  name                  = "${var.context.prefix}-velero-backup-sc"
  storage_account_name  = azurerm_storage_account.sa-backup.name
  container_access_type = "private"
}

// Identity should be in AKS resource group
resource "azurerm_user_assigned_identity" "backup-identity" {
  resource_group_name = "${var.aks_rg}"
  location            = var.context.location

  name = "${var.context.prefix}-backup-mi"
  tags = var.context.tags
}

resource "azurerm_role_assignment" "backup-rg-contributor" {
  scope                            = azurerm_resource_group.rg.id
  role_definition_name             = "Contributor"
  principal_id                     = azurerm_user_assigned_identity.backup-identity.principal_id
  skip_service_principal_aad_check = false
}

resource "azurerm_role_assignment" "aks-resourcegroup-contributor" {
  scope                            = "${var.aks_rg_id}"
  role_definition_name             = "Contributor"
  principal_id                     = azurerm_user_assigned_identity.backup-identity.principal_id
  skip_service_principal_aad_check = false
}

resource helm_release velero {
  name       = "velero"
  namespace  = "velero"
  repository = "https://vmware-tanzu.github.io/helm-charts"
  chart      = "velero"
  version    = "2.13.3"
  set {
    name  = "configuration.provider"
    value = "azure"
  }
  set {
    name = "installCRDs"
    value = "true"
  }
  set {
    name  = "configuration.backupStorageLocation.bucket"
    value = azurerm_storage_container.sc.name
  }
  set {
    name  = "configuration.backupStorageLocation.config.resourceGroup"
    value = azurerm_resource_group.rg.name
  }
  set {
    name  = "configuration.backupStorageLocation.config.storageAccount"
    value = azurerm_storage_account.sa-backup.name
  }
  set {
    name  = "configuration.volumeSnapshotLocation.name"
    value = "default"
  }
  set {
    name  = "configuration.volumeSnapshotLocation.config.resourceGroup"
    value = azurerm_resource_group.rg.name
  }
  set {
    name = "resources.limits.cpu"
    value = "200m"
  }
  set {
    name = "resources.requests.cpu"
    value = "100m"
  }
  set {
    name = "resources.limits.memory"
    value = "1000Mi"
  }
  set {
    name = "resources.requests.memory"
    value = "500Mi"
  }
  set {
    name = "initContainers[0].name"
    value = "velero-plugin-for-microsoft-azure"
  }
  set {
    name = "initContainers[0].image"
    value = "velero/velero-plugin-for-microsoft-azure:main"
  }
  set {
    name = "initContainers[0].volumeMounts[0].mountPath"
    value = "/target"
  }
  set {
    name = "initContainers[0].volumeMounts[0].name"
    value = "plugins"
  }
  set {
    name = "podLabels.aadpodidbinding"
    value = "backup"
  }
  set {
    name = "credentials.existingSecret"
    value = "cloud-credentials"
  }
}

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
    1.18.8
  • Size of cluster (how many worker nodes are in the cluster?)
    8
  • General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.)
  • Others:
@ghost ghost added the triage label Oct 13, 2020
@ghost
Copy link

ghost commented Oct 13, 2020

Hi aristosvo, AKS bot here 👋
Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

  1. If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
  2. Please abide by the AKS repo Guidelines and Code of Conduct.
  3. If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
  4. Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
  5. Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
  6. If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

@neumanndaniel
Copy link

@aristosvo Why you are not using VNET Service Endpoints?

-> https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview

I assume the AKS cluster and the Azure Storage Account are in the same Azure region?

@aristosvo
Copy link
Author

@neumanndaniel Thanks for the suggestion and it is probably the right solution for this problem 👍 AKS and SA are in the same region, so it definitely should work 🎉

In general, besides needing a technical solution for this particular problem I'm also a curious person and it would probably help future decision making if people know why this isn't working. In my experience it was a bit harder to automate with Terraform without BYO VNet for instance and I'd like to have more options. I've also opened an issue against Velero's Azure plugin and will probably submit a doc PR with explanation why this is not working to prevent people from banging their head into the same wall I did. 🤯

@neumanndaniel
Copy link

neumanndaniel commented Oct 13, 2020

@aristosvo As AKS and Azure Storage are in the same region you only can use the VNET Service Endpoints approach.

The network restriction option where you whitelist the public IP isn't working, when both resources are in the same region. As the traffic is handled internally in the region itself and never leaves the network via the outbound public IP.

@aristosvo aristosvo changed the title Network restrictions on Azure Storage Account not working with AKS Network restrictions limitations of Azure Storage Account with AKS Oct 14, 2020
@ghost ghost added the action-required label Oct 16, 2020
@ghost
Copy link

ghost commented Oct 16, 2020

Triage required from @Azure/aks-pm

@palma21 palma21 added question resolution/answer-provided Provided answer to issue, question or feedback. labels Oct 16, 2020
@ghost
Copy link

ghost commented Oct 19, 2020

Thanks for reaching out. I'm closing this issue as it was marked with "Answer Provided" and it hasn't had activity for 2 days.

@ghost ghost closed this as completed Oct 19, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Nov 18, 2020
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question resolution/answer-provided Provided answer to issue, question or feedback.
Projects
None yet
Development

No branches or pull requests

3 participants