Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to manage container root folder in Azure Datalake Gen2 #9425

Closed
dmaterowski opened this issue Nov 23, 2020 · 12 comments · Fixed by #9917
Closed

Impossible to manage container root folder in Azure Datalake Gen2 #9425

dmaterowski opened this issue Nov 23, 2020 · 12 comments · Fixed by #9917

Comments

@dmaterowski
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.13.5
+ provider registry.terraform.io/-/azurerm v2.37.0

Affected Resource(s)

  • azurerm_storage_data_lake_gen2_path
  • azurerm_storage_data_lake_gen2_filesystem
  • azurerm_storage_container

Terraform Configuration Files

locals {
    container_name = "examplefs"
}

resource "azurerm_storage_account" "dlg2" {
  name                     = "****"
  resource_group_name      = azurerm_resource_group.x.name
  location                 = azurerm_resource_group.x.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  is_hns_enabled           = true
}

resource "azurerm_storage_container" "dp" {
  name                  = local.container_name
  storage_account_name  = azurerm_storage_account.dlg2.name
}
# alterantively:
# resource "azurerm_storage_data_lake_gen2_filesystem" "dp" {
#  name               = local.container_name
#  storage_account_id = azurerm_storage_account.dlg2.id
#}


resource "azurerm_storage_data_lake_gen2_path" "root" {
  path               = ""
  filesystem_name    = azurerm_storage_container.dp.name
  storage_account_id = azurerm_storage_account.dlg2.id
  resource           = "directory"

  ace {
    id          = "xxxx-xxxx-xxxx"
    permissions = "rwx"
    scope       = "access"
    type        = "user"
  }

  ace {
    permissions = "---"
    scope       = "access"
    type        = "other"
  }
}

Expected Behaviour

  • Root directory path resource is added to state without manual import
  • ACLs are assigned to the root as per definition

Actual Behaviour

Deploying above definitions throws exception, as the root directory already exists.

Error: A resource with the ID "https://***.dfs.core.windows.net/examplefs/" already exists - to be managed via Terraform this resource needs to be imported into the State. Please see the resource documentation for "azurerm_storage_data_lake_gen2_path" for more information.

  on storage.tf line 34, in resource "azurerm_storage_data_lake_gen2_path" "root":
  34: resource "azurerm_storage_data_lake_gen2_path" "root" {

Steps to Reproduce

  1. Ensure identity / SP has Data Owner role
  2. terraform apply

Important Factoids

The root directory "/". This directory is created when a Data Lake Storage Gen2 container is created.

Which means that creating container/filesystem causes the root directory to already exist.

  • Since neither azurerm_storage_data_lake_gen2_filesystem, nor azurerm_storage_container support ACLs it's impossible to manage root-level ACLs without manually importing the root azurerm_storage_data_lake_gen2_path

  • It's also impossible to create the root path without existing container as this fails with

Error: Error creating Path "" in File System "testing" in Storage Account "***": datalakestore.Client#Create: 
Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidUri" Message="The request URI is invalid
  • setting path = "/" is also picked as the same, already existing resource id

References

I'm not sure what is the best expected behvaiour in this situation, because it's a conflicting api design. But in any case, as of now it's impossible to manage the root folder without importing it manually, which is not really an option for a non-trivial number of containers.

Also, the ACLs on root container are quite crucial as all nested access needs Execute rights on whole folder hierarchy starting from root.

Please do let me know if I have missed anything obvious :)

@ross-p-smith
Copy link
Contributor

My work around for the moment - should it help anybody (please note, use the access key to set the acl and not the AAD account: -

resource "null_resource" "set_root_acl" {
  provisioner "local-exec" {
    command    = "az storage fs access set --acl 'user::rwx,user:${var.ace_id}:--x,group::r-x,other::---' -p / -f ${var.file_system} --account-name ${var.storage_account_name} --account-key ${var.storage_access_key}"
    on_failure = fail
  }

  triggers = {
    always_run = azurerm_storage_data_lake_gen2_filesystem.fs.id
  }

  depends_on = [azurerm_storage_data_lake_gen2_filesystem.fs]
}

@BertrandDechoux
Copy link

BertrandDechoux commented Nov 30, 2020

The first design was planning to add two new resources

  • azurerm_storage_data_lake_gen2_path
  • azurerm_storage_data_lake_gen2_path_acl

But then it was decided that it was too complex and not needed. As a consequence, path and acl have been merged into the same resource.

At minimum, the problem could be solved by

  • having two distinct resources : path and acl
  • having a data source for path

Then the root path can be found using the data source in order to target it with the acl resource.

Of course, if this configuration complexity can be avoided with a kind of auto-import of the root dir, why not but I don't know if it is a patten that would be supported by Terraform.

cc @stuartleeks @tombuildsstuff

@stuartleeks
Copy link
Contributor

Thanks @BertrandDechoux. I was having a discussion with @tombuildsstuff and proposed two options:

  1. Add optional ACL support on the azurerm_storage_data_lake_gen2_filesystem resource to allow setting the ACL for the file system root (i.e. allow ace entries on the file system resource)
  2. Add a special case in the azurerm_storage_data_lake_gen2_path to skip the creation for the root path and simply set the ACL (if specified)

@stuartleeks
Copy link
Contributor

As you spotted, the original proposal have path and acl as separate resources and with hindsight that would have avoided this issue. To implement that now would be a breaking change so I'm not sure how viable that is

@BertrandDechoux
Copy link

BertrandDechoux commented Nov 30, 2020

Ok. Both are good.

The only thing is that for 1., I am a bit confused between azurerm_storage_container and azurerm_storage_data_lake_gen2_filesystem.

I assume azurerm_storage_data_lake_gen2_filesystem refers to a newer api than azurerm_storage_container which is probably an inheritance from the blob storage ?

If ACL support is only added to azurerm_storage_data_lake_gen2_filesystem, it implies that users will need to (manually) migrate from one resource type to the other using some kind of removal from the state (?) of the old resource type and then re-import as the new resource type. But I may be missing something, I am not a Terraform expert.

@stuartleeks
Copy link
Contributor

My understanding is that there is some compatibility implemented between containers and file systems. But when working with ADLS2 (i.e. the hierarchical namespace) I have found sticking to the file system APIs/resources works out better. My recollection is that the root folder ownership ended up a bit strange when we used the container approach rather than file system approach on my last project

@stuartleeks
Copy link
Contributor

Maybe it would help to add a note to the docs for azurerm_storage_container that points to azurerm_storage_data_lake_gen2_filesystem as the route to go for Data Lake Gen 2

@ross-p-smith
Copy link
Contributor

In the PR above, I have implemented optional ACL support on the azurerm_storage_data_lake_gen2_filesystem resource to allow setting the ACL for the file system root (i.e. allow ace entries on the file system resource)

@AbsoLouie
Copy link

@stuartleeks @ross-p-smith Is there any work/testing that needs to be done for this PR? This is a feature that will be immediately useful for my use case.

@stuartleeks
Copy link
Contributor

@louiedp3 I think it's really just waiting on review by the project maintainers - is that correct @ross-p-smith ?

@ghost
Copy link

ghost commented Jan 28, 2021

This has been released in version 2.45.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.45.0"
}
# ... other configuration ...

@ghost
Copy link

ghost commented Feb 27, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Feb 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants