Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AVM Module Issue]: associatedKeyVaultResourceId parameter for ML workspace causing deployment error when null #3849

Open
1 of 5 tasks
DavidSP-Transparity opened this issue Nov 28, 2024 · 6 comments
Assignees
Labels
Class: Resource Module 📦 This is a resource module Type: AVM 🅰️ ✌️ Ⓜ️ This is an AVM related issue Type: Feature Request ➕ New feature or request

Comments

@DavidSP-Transparity
Copy link

Check for previous/existing GitHub issues

  • I have checked for previous/existing GitHub issues

Issue Type?

Bug

Module Name

avm/res/machine-learning-services/workspace

(Optional) Module Version

No response

Description

I have noticed that there's an issue with the resource provider used in this module, which is affecting the module's consumption (including linked modules).

This seems to be the case for the resource provider, even the latest version 'Microsoft.MachineLearningServices/workspaces@2024-07-01-preview'.

The deployment error is generic, and is as follows:

{
"code": "InternalServerError",
"message": "InternalServerError"
}

From my extensive testing, this seems to have been caused by the newly added preview feature which accepts null to use a Microsoft-managed Key Vault as your credential store - Microsoft-managed credential store (preview).

To replicate the issue:

  • Create a Resource Group and deploy the following code, with deployKeyVault set to false. It should deploy successfully.
  • Make no changes and redeploy the code. It should result in InternalServerError for the project ML workspace resource
  • Create a new Resource Group, but this time with deployKeyVault set to true, and it should deploy successfully.
  • Make no changes and redeploy the code. It should redeploy successfully.
var sku = 'Standard'
var location = 'uksouth'
var prefix = uniqueString(resourceGroup().id)
var deployKeyVault = false

resource workspaceProject 'Microsoft.MachineLearningServices/workspaces@2024-07-01-preview' = {
  name: 'proj-${prefix}'
  location: location
  properties: {
    friendlyName: 'proj-${prefix}'
    systemDatastoresAuthMode: 'identity'
    hubResourceId: workspaceHub.id
  }
  identity: {
    type: 'SystemAssigned'
  }
  kind: 'Project'
  sku: {
    name: sku
    tier: sku
  }
}

resource workspaceHub 'Microsoft.MachineLearningServices/workspaces@2024-07-01-preview' = {
  name: 'hub-${prefix}'
  location: location
  properties: {
    friendlyName: 'hub-${prefix}'
    storageAccount: storageAccount.outputs.resourceId
    keyVault: deployKeyVault ? keyVault.outputs.resourceId : null // if Key Vault isn't associated, it generates 'InternalServerError' when redeploying the template.
    managedNetwork: {
      isolationMode: 'Disabled'
    }
    publicNetworkAccess: 'Enabled'
    ipAllowlist: []
    workspaceHubConfig: {
      defaultWorkspaceResourceGroup: resourceGroup().id
    }
    enableDataIsolation: true
    systemDatastoresAuthMode: 'identity'
  }
  identity: {
    type: 'SystemAssigned'
  }
  kind: 'Hub'
  sku: {
    name: sku
    tier: sku
  }
}

module keyVault 'br/public:avm/res/key-vault/vault:0.10.2' = if (deployKeyVault) {
  name: take('keyVault-${prefix}-Deployment', 64)
  params: {
    name: 'kv-${prefix}'
    sku: 'standard'
    location: location
    enablePurgeProtection: true
    enableSoftDelete: true
    enableRbacAuthorization: true
    softDeleteRetentionInDays: 7
    networkAcls: {
      defaultAction: 'Allow'
      bypass: 'AzureServices'
    }
    publicNetworkAccess: 'Enabled'
  }
}

module storageAccount 'br/public:avm/res/storage/storage-account:0.14.3' = {
  name: take('storageAccount-st${toLower(replace(prefix, '-', ''))}-Deployment', 64)
  params: {
    name: 'st${prefix}'
    location: location
    skuName: 'Standard_LRS'
    accessTier: 'Cool'
    allowBlobPublicAccess: false
    minimumTlsVersion: 'TLS1_2'
    allowSharedKeyAccess: false
    defaultToOAuthAuthentication: true
    blobServices: {
      automaticSnapshotPolicyEnabled: true
      containerDeleteRetentionPolicyDays: 10
      containerDeleteRetentionPolicyEnabled: true
      deleteRetentionPolicyDays: 9
      deleteRetentionPolicyEnabled: true
      lastAccessTimeTrackingPolicyEnabled: true
    }
    enableHierarchicalNamespace: false
    managedIdentities: {
      systemAssigned: true
    }
    publicNetworkAccess: 'Enabled'
    networkAcls: {
      defaultAction: 'Allow'
      bypass: 'AzureServices'
    }
    requireInfrastructureEncryption: true
    sasExpirationPeriod: '180.00:00:00'
  }
}

(Optional) Correlation Id

No response

@DavidSP-Transparity DavidSP-Transparity added Needs: Triage 🔍 Maintainers need to triage still Type: AVM 🅰️ ✌️ Ⓜ️ This is an AVM related issue labels Nov 28, 2024

Important

The "Needs: Triage 🔍" label must be removed once the triage process is complete!

Tip

For additional guidance on how to triage this issue/PR, see the BRM Issue Triage documentation.

@microsoft-github-policy-service microsoft-github-policy-service bot added the Type: Bug 🐛 Something isn't working label Nov 28, 2024
@avm-team-linter avm-team-linter bot added the Class: Resource Module 📦 This is a resource module label Nov 28, 2024
@github-project-automation github-project-automation bot moved this to Needs: Triage in AVM - Module Issues Nov 28, 2024
Copy link

@DavidSP-Transparity, thanks for submitting this issue for the avm/res/machine-learning-services/workspace module!

Important

A member of the @Azure/avm-res-machinelearningservices-workspace-module-owners-bicep or @Azure/avm-res-machinelearningservices-workspace-module-contributors-bicep team will review it soon!

@cecheta
Copy link
Member

cecheta commented Dec 2, 2024

Hi @DavidSP-Transparity, thanks for raising this issue.

To me, it seems like this is an issue with the ARM resource provider rather than the avm/res/machine-learning-services/workspace AVM module? If that is the case, it may be better to raise a support request within Azure.

@DavidSP-Transparity
Copy link
Author

Hi @DavidSP-Transparity, thanks for raising this issue.

To me, it seems like this is an issue with the ARM resource provider rather than the avm/res/machine-learning-services/workspace AVM module? If that is the case, it may be better to raise a support request within Azure.

@cecheta - Yes that's right, it is. Apologies as I thought it would be best to log here. I'll raise a support request with Azure.

I do think it's a good idea to keep this issue open until I get confirmation from Microsoft that it's been resolved, would you agree? I also think it would be good to add a note to the AVM module for anyone using it in the way I described. Something like this? :-

"If you provide a null or empty value for the parameter 'associatedKeyVaultResourceId' to utilise a Microsoft-managed Key Vault as your credential store (Microsoft-managed credential store (preview)), then you may experience a deployment error as 'InternalServerError'.

You can track the issue at: GitHub #3849."

Warning

Tagging the AVM Core Team (@Azure/avm-core-team-technical-bicep) due to a module owner or contributor having not responded to this issue within 3 business days. The AVM Core Team will attempt to contact the module owners/contributors directly.

Tip

  • To prevent further actions to take effect, the "Status: Response Overdue 🚩" label must be removed, once this issue has been responded to.
  • To avoid this rule being (re)triggered, the ""Needs: Triage 🔍" label must be removed as part of the triage process (when the issue is first responded to)!

@microsoft-github-policy-service microsoft-github-policy-service bot added the Status: Response Overdue 🚩 When an issue/PR has not been responded to for X amount of days label Dec 6, 2024
@cecheta
Copy link
Member

cecheta commented Dec 9, 2024

I will see if I can find time to update the description, however in the meantime a contribution would be welcome!

@cecheta cecheta added Type: Feature Request ➕ New feature or request and removed Needs: Triage 🔍 Maintainers need to triage still Status: Response Overdue 🚩 When an issue/PR has not been responded to for X amount of days Type: Bug 🐛 Something isn't working labels Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Class: Resource Module 📦 This is a resource module Type: AVM 🅰️ ✌️ Ⓜ️ This is an AVM related issue Type: Feature Request ➕ New feature or request
Projects
Status: Needs: Triage
Development

No branches or pull requests

2 participants