Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use custom ENV variables for Azure Storage Remote. #2256

Closed

Conversation

kopytjuk
Copy link

@kopytjuk kopytjuk commented Jul 11, 2019

  • Have you followed the guidelines in our
    Contributing document?

  • Does your PR affect documented changes or does it add new functionality
    that should be documented? If yes, have you created a PR for
    dvc.org documenting it or at
    least opened an issue for it? If so, please add a link to it.


A solution for https://github.com/iterative/dvc/issues/2255

Appending docs: iterative/dvc.org#482

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff @kopytjuk !
Added 1 comment and few suggestions, mostly to match test formatting and how we fill strings in the rest of the project. Though I have one question, do you know, if by any chance, connection string can start with $ sign? If so, we need to adress situation where config contains connection string, and we try to read it as environmental variable.

tests/unit/remote/test_azure.py Outdated Show resolved Hide resolved
tests/unit/remote/test_azure.py Outdated Show resolved Hide resolved
tests/unit/remote/test_azure.py Outdated Show resolved Hide resolved
tests/unit/remote/test_azure.py Show resolved Hide resolved
tests/unit/remote/test_azure.py Show resolved Hide resolved
tests/unit/remote/test_azure.py Show resolved Hide resolved
tests/unit/remote/test_azure.py Show resolved Hide resolved
dvc/remote/azure.py Outdated Show resolved Hide resolved
dvc/remote/azure.py Outdated Show resolved Hide resolved
dvc/remote/azure.py Outdated Show resolved Hide resolved
@kopytjuk
Copy link
Author

kopytjuk commented Jul 12, 2019

Great stuff @kopytjuk !
Added 1 comment and few suggestions, mostly to match test formatting and how we fill strings in the rest of the project. Though I have one question, do you know, if by any chance, connection string can start with $ sign? If so, we need to adress situation where config contains connection string, and we try to read it as environmental variable.

According to Azure documentation a connection string can start with B (BlobEndpoint), Q (Queue...), T, F or S.

@kopytjuk kopytjuk force-pushed the az-connection-string-from-env-var branch from 5bfcb80 to ca58f55 Compare July 12, 2019 15:37
dvc/remote/azure.py Outdated Show resolved Hide resolved
if (
connection_string_from_config is not None
) and connection_string_from_config.startswith("$"):
# read custom environment variable discarding the '$' character
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need $ though? Maybe I'm missing something, but I am not able to find $ used in connection strings at all.

Copy link
Contributor

@efiop efiop Jul 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realised that that ^ didn't come up very clear 😅 I meant that I am not able to find where such trick with env vars is used elsewhere (e.g. in other projects that use azure). Or is it something that you've come up with yourself?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe a little background. When working with Azure, it is handy to save the storage account credentials as connection strings in a environment variables and retrieve them from client libraries (like azure-storage-blob). I explicitly used plural form to highlight the fact that there are multiple storage accounts. Now working with dvc remotes on different storages you need a way to select the right destination of data. The solution with ENV vars would keep the connection string private while showing the name of the ENV variable (which itself encodes the destination with e.g. the storage-account name). Team members can, after setting up their .bash_profile, contribute to projects.

You are right, it is nothing Azure specific - maybe a helper function for other remote types would be handy...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other remotes it might be dangerous. What if someone would want to do something like $PART1_$PART2 ? This env evaluation in config feels like it is too powerful and doesn't really belong in the current form in the config file itself. It is also weird that it will be supported like that for 1 specific config option for 1 specific remote and if we try to support it for other options later, it has a potential to run into conflicts, where we might no longer be able to tell easily if it is an env var that we need to evaluate. But just mixing config and env doesn't feel right in this scenario.

Feels like the proper way to work with this is to use .dvc/config.local, as you've described. You will be sharing your .bash_profile anyway, so why not just share .dvc/config.local privately instead? You can use placeholders in .dvc/config like so too if you want to:

dvc remote modify myremote connection_string MY_PLACEHOLDER

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other remotes it might be dangerous. What if someone would want to do something like $PART1_$PART2 ? This env evaluation in config feels like it is too powerful and doesn't really belong in the current form in the config file itself. It is also weird that it will be supported like that for 1 specific config option for 1 specific remote and if we try to support it for other options later, it has a potential to run into conflicts, where we might no longer be able to tell easily if it is an env var that we need to evaluate. But just mixing config and env doesn't feel right in this scenario.

Feels like the proper way to work with this is to use .dvc/config.local, as you've described. You will be sharing your .bash_profile anyway, so why not just share .dvc/config.local privately instead? You can use placeholders in .dvc/config like so too if you want to:

dvc remote modify myremote connection_string MY_PLACEHOLDER

I get your first point about the power of such parsing functionality - agree on that completely.

On the other hand (your second point) there exists a specific solution with azure where we read a connection string from ENV with AZURE_STORAGE_CONNECTION_STRING - working with one storage account will be fine. Having different storage accounts with different levels of security/back up strategy will be hard (since we distribute write/delete/list connection strings).

Why I do not want to share .dvc/config.local is because you have to do that for each project - plus you have to track who got which token for which project instead of having a list of team members and tokens once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your points, but I'm just not sure that this is the proper way to solve your problem. With things like s3, you would probably just use profiles, right? Does azure have those?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kopytjuk Or maybe file-based auth?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Azure client libraries (at least the official one in Python) supports authentication via SAS tokens or primary key of the storage account.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kopytjuk maybe something like

client = get_client_from_auth_file(StorageManagementClient)
storage_keys = storage_client.storage_accounts.list_keys(resource_group_name, storage_account_name)
storage_keys = {v.key_name: v.value for v in storage_keys.keys}

storage_client = CloudStorageAccount(storage_account_name, storage_keys['key1'])
blob_service = storage_client.create_block_blob_service()

from Azure/azure-sdk-for-python#1310 (comment) ?

@kopytjuk kopytjuk force-pushed the az-connection-string-from-env-var branch from ca58f55 to 89059b6 Compare July 14, 2019 10:45
@efiop
Copy link
Contributor

efiop commented Aug 13, 2019

Closing as stale.

@efiop efiop closed this Aug 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants