Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMLFS provider limit for storage_capacity_in_tb set to 8 - 128TiB, when the value should be 8 - 768TiB. #23406

Closed
1 task done
clfriede opened this issue Sep 28, 2023 · 9 comments · Fixed by #23428
Closed
1 task done

Comments

@clfriede
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

1.5.7

AzureRM Provider Version

3.73.0

Affected Resource(s)/Data Source(s)

azurerm_managed_lustre_file_system

Terraform Configuration Files

terraform { 
  required_providers { 
    azurerm = { 
      source = "hashicorp/azurerm"
      version = "3.73.0"
    }
  }
}

provider "azurerm" {
  # Configuration options
  features {}  
}

resource "azurerm_resource_group" "example" { 
  name = "clfriede-terraform-AMLFS-test"
  location = "East US"
}

resource "azurerm_virtual_network" "example" { 
  name = "clfriede-terraform-AMLFS-vnet" 
  address_space = ["10.0.0.0/16"] 
  location = azurerm_resource_group.example.location 
  resource_group_name = azurerm_resource_group.example.name
}

resource "azurerm_subnet" "example" { 
  name = "clfriede-terraform-AMLFS-subnet"
  resource_group_name = azurerm_resource_group.example.name 
  virtual_network_name = azurerm_virtual_network.example.name 
  address_prefixes = ["10.0.2.0/24"]
}

resource "azurerm_managed_lustre_file_system" "example" { 
  name = "clfriede-terraform-AMLFS-cluster"
  resource_group_name = azurerm_resource_group.example.name
  location = azurerm_resource_group.example.location 
  sku_name = "AMLFS-Durable-Premium-125" 
  subnet_id = azurerm_subnet.example.id 
  storage_capacity_in_tb = 768 
  zones = ["1"] 
  maintenance_window {
    day_of_week = "Saturday"
    time_of_day_in_utc = "02:00"
  }
}

Debug Output/Panic Output

N/A

Expected Behaviour

Terraform should be able to provide AMLFS clusters that are between 8 - 768TiB in size.

Actual Behaviour

Terraform errors out if you set the option for [storage_capacity_in_tb] greater than 128TiB, stating that the upper limit is 128TiB.

Steps to Reproduce

terraform apply

Important Factoids

N/A

References

Location showing incorrect values:

https://registry.terraform.io/providers/hashicorp/azurerm/3.73.0/docs/resources/managed_lustre_file_system

@clfriede clfriede changed the title AMLFS provider limited to 8 - 128TiB, when the value should be 8 - 768TiB. AMLFS provider limit for storage_capacity_in_tb set to 8 - 128TiB, when the value should be 8 - 768TiB. Sep 28, 2023
@barbisch
Copy link

As an aside, it would also be great to fix the inputs accepted be between 4 and 768 (instead of 8 and 768). The step size for the AMLFS-Durable-Premium-500 sku is 4TiB, so '4' is the lowest common denominator.

@WodansSon WodansSon self-assigned this Sep 29, 2023
@WodansSon
Copy link
Collaborator

Hi @clfriede, thank you for opening this issue and I am more than happy to take a look at the current implementation of this resource within Terraform, but per the sample configuration and the Microsoft Documentation the Storage Max limit for the AMLFS-Durable-Premium-125 sku appears to be correct, to get the 768TB capacity your sku_name field would need to be set to AMLFS-Durable-Premium-40.

image

@barbisch, agreed, per the above documentation the min and max values for the storage capacity appear to be dependent on which sku_name has been selected for the resource.

The documentation also mentions that:

image

So I am not sure how to move forward as the documented limits can be exceeded per instance with an exception from Azure.

That said I will look at the code and see if I can improve the current experience for our customers. 🚀

@blepore
Copy link

blepore commented Sep 29, 2023

Hi @WodansSon - you're correct that the public docs suggest that 128 is the max. HOWEVER - that is simply our default max. For many reasons, we don't want just any customer with an Azure account to deploy very large filesystems. We do have a workaround for our largest customers - we add a tag to their subscriptions to allow them to go larger. They can go up to 768TiB currently but that will soon (early 2024) be increased to over 1PiB.

Customers will receive an error if they attempt to deploy something too large. It might be easiest for you to remove the validation on the maximum all together and defer to the resource provider?

Thank you in advance for your support on this. It's truly appreciated!

@clfriede
Copy link
Author

clfriede commented Sep 29, 2023 via email

@WodansSon
Copy link
Collaborator

WodansSon commented Sep 30, 2023

@blepore, that makes sense, I will do the three following things to resolve this issue:

  • Remove the upper-bound limit from the storage_capacity_in_tb field
  • Update the storage_capacity_in_tb lower-bound limit to match the documentation (e.g., 48 TB for AMLFS-Durable-Premium-40, 16 TB for AMLFS-Durable-Premium-125, 8 TB for AMLFS-Durable-Premium-250 and 4 TB for AMLFS-Durable-Premium-500)
  • Add a note in the documentation to contact Microsoft Support if more capacity is desired.

Example of Plan with the new code:

Terraform will perform the following actions:

  # azurerm_managed_lustre_file_system.repro will be created
  + resource "azurerm_managed_lustre_file_system" "repro" {
      + id                     = (known after apply)
      + location               = "westeurope"
      + name                   = "AMLFS-repro-cluster"
      + resource_group_name    = "IcM-AMLFS-repro"
      + sku_name               = "AMLFS-Durable-Premium-125"
      + storage_capacity_in_tb = 768
      + subnet_id              = (known after apply)
      + zones                  = [
          + "1",
        ]

      + maintenance_window {
          + day_of_week        = "Saturday"
          + time_of_day_in_utc = "02:00"
        }
    }

NOTICE: The sku_name is AMLFS-Durable-Premium-125 with a storage_capacity_in_tb of 768.

QQ: @blepore, If a customer is granted an exception to the amount of storage capacity they can use is the increment requirement still valid? In my current code I DO validate the storage capacity increment based on the documented value for the sku.

@barbisch, the associated PR also addresses your concerns with the current resource as well. The minimum storage_capacity_in_tb value is now based off of the sku_name defined in the configuration file instead of a hardcoded value.

@blepore
Copy link

blepore commented Oct 2, 2023

Thank you so much for the quick action @WodansSon!

Replying to your question -

If a customer is granted an exception to the amount of storage capacity they can use is the increment requirement still valid?

Yes, these increments are enforced all the up through 768TiB. It sounds like your validation will be just fine. Thanks again!

@WodansSon WodansSon added this to the v3.76.0 milestone Oct 3, 2023
@clfriede
Copy link
Author

@WodansSon Thank you for your help on this issue! It is greatly appreciated. Would you happen to know when 3.76.0 is scheduled for release? Any information that you could provide would be helpful.

@manicminer
Copy link
Contributor

Please see the Milestones page for release dates, thanks!

Copy link

github-actions bot commented May 4, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.