-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQ] azure-storage-blob-nio: set default configuration #23653
Comments
Hi @droazen are you able to use these environment variables in your app: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/identity/azure-identity#environment-variables |
Thanks @joshfree. That will be helpful to get a read on. I think I might also have to do some work to pick up on those or at least document their existence in our javadocs |
@joshfree The request is for |
@droazen Stepping back a bit. My understanding of nio usage is that typically on startup, |
@rickle-msft We never instantiate the What we can do in some cases, however, is call static configuration methods (that don't involve any instantiation) to set defaults on startup. We do this with Google's NIO library in our main toolkit GATK to adjust the default timeouts and a few other settings, via the static method Thanks @rickle-msft and @joshfree for your time, and for considering this request! |
I see.
I think this is the information I was looking for. I was trying to figure out what the entry point was into this system because a fileSystem has to be created at some point. This method eventually calls into the Can you explain a little more about the static method Google is using? If it's already able to pull creds from the environment, why use this method on top of that? Why not just have it pull all the necessary configs from the environment while it's at it? |
@rickle-msft The reason we pull the auth info from the environment, but not other settings like timeouts, etc., is because for things like timeouts we want to bake sensible defaults into our toolkit that are appropriate for genomic-scale workloads and will work well for most of our users. Additionally, the environment variables with the user's credentials are created automatically on Google Cloud instances, and so most of the time GCP authentication "just works" for our users without any extra steps. This is a big deal for us, since most of our users are scientists who are not always able to deal with technical issues when things don't work out-of-the-box. For the Azure NIO plugin, I'm curious as to what additional configuration info typically needs to be specified apart from the credentials? |
Sure. We can definitely make it work! I'm just trying to get a full understanding of how it all fits together so we can give a consistent experience. Here is a list of all the config options that are typically expected to be passed in at config time, but we can pull this information from another source. Given our discussion on credentials from the environment, you can probably ignore what it says about setting that information in the config map. But I think the ones relevant for this part of the discussion are: Thoughts? |
@rickle-msft Thanks, this is all very helpful in clarifying the request in this ticket! I guess my ultimate wishlist would look like this:
Even just # 1 on its own would be massively useful to us. With all 3 features I think we'd achieve parity with Google's NIO plugin and be able to use the Azure plugin equally seamlessly in our toolkit. Many thanks for your help! |
@droazen Just wanted to ack this. We've had some other work come up spontaneously that is fairly urgent, which is why I haven't kept up here, but I will circle back to work on adding these features as soon as possible. |
@rickle-msft Thanks for the update! Please let me know if there's any further information I can provide on this ticket. |
@droazen Wanted to give you an update that I am beginning work on this item. Apologies for the delay and thank you for your patience. I'll start working on the first part |
@droazen What authentication method are you using? TokenAuth (RBAC), SasToken, or SharedKey? And as far as the interface for the static global configuration method, would passing a map with everything besides the credentials be reasonable? |
@rickle-msft Thanks for the update, glad you are able to look at this ticket again! To answer your questions:
|
Oh wow. Yea that's a good update to have. There are some sdk wide changes coming to how the sdks interact with the environment, so I was gunna have to wait for that actually, but if you can pass everything into that method, then I can get working on it more immediately. And I can ensure that Paths.get() opens a file system with the defaults if none is present. Since we'll be able to authenticate via this config map as well, both sas tokens and shared keys (account level access) are both already supported there |
@droazen Question for you on this. Since you are calling into |
@rickle-msft We'll certainly always have a scheme and path in our URIs, but since our URIs are user input I'm not certain whether they will always contain a query string with the endpoint. We're mainly concerned here with not placing unnecessary / onerous requirements on our users -- is it easy for (potentially non-technical) Azure users to obtain full URIs to their files including the endpoint? If the endpoint were missing, could the default / statically-configured account info be used, or is this field something that would necessarily vary with each individual URI? |
I see. It's kind of a yes and no answer here. It is, by my estimation, very easy for a customer to get a full uri to their file or the endpoint to their account. Both are readily available in the Azure Portal. I would go so far as to say it would be harder for a non technical user to get a reference (uri or otherwise) to a given blob that doesn't contain the domain/full endpoint than it would be for them to get one with the information we're looking for. What I will say is that the uri the portal gives is a different format from the format nio tends to work with (and I don't think this is unique to us if I had to guess), so it might take a bit of manipulation on the part of your application to rearrange some things. Thoughts? The endpoint will be constant per filesystem, though it is possible to have multiple open filesystems. I think the idea here was that Paths.get() accepts a uri and you're passing a map to set the global defaults, which are the required parameters for creating a filesystem that we already have code in place to handle. I propose that we continue with this track for now and that if you find it is unreasonably complicated for your customers to gather the requisite information, we can look into adding an extra endpoint option to the config map and elide the necessity of its presence on the uri. Thoughts? |
We did recently add a utility method to |
@rickle-msft If the endpoint is constant per filesystem, could we add a "default endpoint" attribute into the static config map that, if set, the code falls back upon when a URI is missing the endpoint? That would alleviate this issue for us completely. Otherwise I foresee a lot of unnecessary pain/confusion for our users :) Would this be reasonably simple to implement, or are there structural issues in the existing code that would make it difficult? |
I think we can add that. It's less a matter of technical difficulty and more a matter of design principles. We try to not to add methods and options eagerly because it's easier to add than take away, hence my slight reluctance to add the option. This may seem a bit odd initially, but what if the method was That said, are you intending to use the overload of |
@rickle-msft A signature like We are using the |
@droazen Apologies again for the delay. There's been some internal back and forth about the best way to support setting default configurations which has held up the progress here. There are some concerns from some folks on the team that a static method to set global configurations opens the door for some concerns such as concurrent/multiple sets that we'd like to avoid. Because of that, we've explored two possibilities here. The first, which we think is simpler and therefore preferable, is to have all these configurations (creds and options) all set in the environment. This mitigates both of the concerns I mentioned as they would be checked once upon startup and cached thereafter. There's also a standardized way of accessing the environment across the entire Azure SDK that we can leverage. If this is conceptually fine with you, I can share a brief sample to confirm before we proceed. The other option, which is more advanced and maybe overkill but also ultimately more flexible, is to use SPI (Service Provider Interfaces) that we can use to load a provider to serve us these default values. I know we've sort of haggled back and forth over this multiple times, but at this point does putting everything in the environment work for you technically? |
@rickle-msft From a technical standpoint, putting everything in the environment would work for us, yes. My only concern with that approach has to do with changes over time to the set of environment variables that get checked. With a static configuration method, the compiler will tell us if a setting changed. With environment variables across multiple upgrades of the NIO library over several years, is there a convenient mechanism we can use to automatically check whether something changes (ie., an environment variable gets renamed / added / removed), so that we can implement a CI test that will fail for us in this case? Thanks for continuing to follow up on this request! |
@droazen Sure thing. And thank you for continuing to work with us on getting the best experience out. I hear your concerns. Once we GA, Azure SDK has a policy of no breaks, so we would not be removing or renaming any existing options. Even if we choose to deprecate one, it will continue to function. Additions would be non breaking, so they wouldn't interfere with existing options anyway. Additionally, because we'll be loading these options on startup to cache for the lifetime of the application, if there's an error in your configuration, your test infrastructure should also see that as soon as it loads the provider. How does that sound to you |
@rickle-msft Sounds good to me! I was mainly worried about the case where we set a variable X in the environment, and then the library one day stops checking for that particular variable and silently continues along without error, but with the wrong value for, eg., the number of retries. But if there's a policy of no removals then this should not be a problem! |
@rickle-msft Any updates on this ticket? |
@droazen Unfortunately the last few months I had something cut in line for priority. That's been pretty all consuming but hopefully wrapping up in the next couple weeks. If all goes well, I should be able to pick this back up once the other item is complete. Apologies that there have been further delays on this. |
Hi @droazen, we deeply appreciate your input into this project. Regrettably, this issue has remained inactive for over 2 years, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support. |
The NIO plugin for Google Cloud Storage (https://github.com/googleapis/java-storage-nio) has a convenient feature that lets you authenticate transparently via environment variables, instead of passing in credentials explicitly. You just run
gcloud auth application-default login
, and then the NIO plugin gets the credentials automatically from the user's environment.It would be great if the NIO plugin for Azure had an equivalent feature. This is helpful in the case with large projects that are using the NIO subsystem to support multiple clouds, where propagating authentication info for specific clouds down the stack would require cumbersome special-casing that is not always easy (or even possible).
The text was updated successfully, but these errors were encountered: