Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR #453 breaks the use of DefaultAzureCredential for a Azure Storage Account Gen2 #476

Closed
johschmidt42 opened this issue Oct 8, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@johschmidt42
Copy link

With #453, in the AzureBlobClient class, an instance of the DataLakeServiceClient class is created if a blob_service_client is passed as a parameter. The Storage Account might not have a hierarchical namespace (datalake). I would expect that the AzureBlobClient does not try to create an instance of the DataLakeServiceClient class.

The error:

  File "/Users/XXX/Library/Caches/pypoetry/virtualenvs/some-project/lib/python3.11/site-packages/cloudpathlib/azure/azblobclient.py", line 117, in __init__
    blob_service_client.credential.account_name,
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DefaultAzureCredential' object has no attribute 'account_name'

The code:

from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient
from cloudpathlib import AzureBlobClient

# Define your Azure Blob Storage details
account_url = "https://storage-account.blob.core.windows.net"
container_name = "az://container-name"
blob_name = "some/blob.txt"

# Initialize the DefaultAzureCredential
credential = DefaultAzureCredential()

blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)

# Create an AzureBlobClient
client = AzureBlobClient(blob_service_client=blob_service_client)

# Construct the full path to the blob
blob_path = f"{container_name}/{blob_name}"

# Use the client to read the blob
blob = client.CloudPath(blob_path)

if __name__ == "__main__":
    # Read the content of the blob
    with blob.open("r") as file:
        content = file.read()

Therefore, 0.19.0 becomes unusable to interact with blob data on a Azure Storage Account Gen2 without hierarchical namespace. 0.18.1 works just fine.

@johschmidt42
Copy link
Author

Not sure if ADLS Gen2 generally works with the normal blob API
@pjbull @jayqi

@jayqi jayqi added the bug Something isn't working label Oct 8, 2024
@jayqi
Copy link
Member

jayqi commented Oct 8, 2024

@pjbull it looks like this is because we make assumptions about the type of credential instance that blob_service_client has. Is there any reason why we instantiate a new AzureNamedKeyCredential rather than just reusing the existing credential instance with something like credential=blob_service_client.credential?

credential=AzureNamedKeyCredential(
blob_service_client.credential.account_name,
blob_service_client.credential.account_key,
),

@pjbull
Copy link
Member

pjbull commented Oct 8, 2024

Yeah, I am remembering that this is an issue with the Azure SDK is architected. The credentials attached to the object end up wrapped in some service-specific object so you can't just pass <service>.credential. There may be a way to get the service-agnostic credentials back again from those objects, but we'd need to track that down.

Here's and example of the error we get when we run the live tests after trying to just pass blob_service_client.credential:

cloudpathlib/azure/azblobclient.py:113: in __init__
    self.data_lake_client = DataLakeServiceClient(
../../miniconda3/envs/cloudpathlib/lib/python3.11/site-packages/azure/storage/filedatalake/_data_lake_service_client.py:109: in __init__
    super(DataLakeServiceClient, self).__init__(parsed_url, service='dfs',
../../miniconda3/envs/cloudpathlib/lib/python3.11/site-packages/azure/storage/filedatalake/_shared/base_client.py:108: in __init__
    self._config, self._pipeline = self._create_pipeline(self.credential, sdk_moniker=self._sdk_moniker, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <azure.storage.filedatalake._data_lake_service_client.DataLakeServiceClient object at 0x10f1dce10>
credential = <azure.storage.blob._shared.authentication.SharedKeyCredentialPolicy object at 0x10f330a90>
kwargs = {'sdk_moniker': 'storage-dfs/12.16.0'}

    def _create_pipeline(
        self, credential: Optional[Union[str, Dict[str, str], AzureNamedKeyCredential, AzureSasCredential, TokenCredential]] = None, # pylint: disable=line-too-long
        **kwargs: Any
    ) -> Tuple[StorageConfiguration, Pipeline]:
        self._credential_policy: Any = None
        if hasattr(credential, "get_token"):
            if kwargs.get('audience'):
                audience = str(kwargs.pop('audience')).rstrip('/') + DEFAULT_OAUTH_SCOPE
            else:
                audience = STORAGE_OAUTH_SCOPE
            self._credential_policy = StorageBearerTokenCredentialPolicy(cast(TokenCredential, credential), audience)
        elif isinstance(credential, SharedKeyCredentialPolicy):
            self._credential_policy = credential
        elif isinstance(credential, AzureSasCredential):
            self._credential_policy = AzureSasCredentialPolicy(credential)
        elif credential is not None:
>           raise TypeError(f"Unsupported credential: {type(credential)}")
E           TypeError: Unsupported credential: <class 'azure.storage.blob._shared.authentication.SharedKeyCredentialPolicy'>

../../miniconda3/envs/cloudpathlib/lib/python3.11/site-packages/azure/storage/filedatalake/_shared/base_client.py:240: TypeError

@pjbull
Copy link
Member

pjbull commented Oct 8, 2024

@johschmidt42 Could you try passing your account_url and credential straight to AzureBlobClient?

E.g., just remove the creation of the BlobServiceClient explicitly:

from azure.identity import DefaultAzureCredential
from cloudpathlib import AzureBlobClient

# Define your Azure Blob Storage details
account_url = "https://storage-account.blob.core.windows.net"
container_name = "az://container-name"
blob_name = "some/blob.txt"

# Initialize the DefaultAzureCredential
credential = DefaultAzureCredential()

# Create an AzureBlobClient
client = AzureBlobClient(account_url=account_url, credential=credential)

# Construct the full path to the blob
blob_path = f"{container_name}/{blob_name}"

# Use the client to read the blob
blob = client.CloudPath(blob_path)

if __name__ == "__main__":
    # Read the content of the blob
    with blob.open("r") as file:
        content = file.read()

@johschmidt42
Copy link
Author

johschmidt42 commented Oct 9, 2024

@pjbull

Tried that.

Getting this error:

  File "/Users/XXX/Library/Caches/pypoetry/virtualenvs/some-project/lib/python3.11/site-packages/cloudpathlib/cloudpath.py", line 655, in open
    raise CloudPathIsADirectoryError(
cloudpathlib.exceptions.CloudPathIsADirectoryError: Cannot open directory, only files. Tried to open (az://container-name/some/blob.txt)

@pjbull
Copy link
Member

pjbull commented Oct 9, 2024

Thank @johschmidt42 a few follow ups to help debug:

  • What version of cloudpathlib is in that environment?
  • Does the file you tried to read exist?
  • Does that storage account have hierarchical namespace enabled?

Thanks!

@johschmidt42
Copy link
Author

johschmidt42 commented Oct 10, 2024

Happy to help! @pjbull

What version of cloudpathlib is in that environment?

0.19.0 (same code works with 0.18.1)

Does the file you tried to read exist?

Yes

Does that storage account have hierarchical namespace enabled?

Actually, it is enabled.. sorry for the confusion.

@johschmidt42 johschmidt42 changed the title PR #453 breaks the use of DefaultAzureCredential for a normal Azure Storage Account Gen2 (without hierarchical namespace/DataLake) PR #453 breaks the use of DefaultAzureCredential for a Azure Storage Account Gen2 Oct 10, 2024
@pjbull
Copy link
Member

pjbull commented Oct 12, 2024

Repro'd this and debugged. Root cause is the same as #470 that get_account_information results in ResourceNotFoundError when we try to check if HNS is turned on. This is because whatever credential we happen to have does not have sufficient permissions for that API.

Going to close this as a dupe and consolidate to #470

@pjbull
Copy link
Member

pjbull commented Oct 18, 2024

@johschmidt42 Just released v0.20.0 to PyPI. Please, take it for a spin and see if it fixes these issues for you!

@johschmidt42
Copy link
Author

@pjbull Great! I just the tried the code again with v0.20.0 and it works! Thanks for quickly resolving the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants