-
Notifications
You must be signed in to change notification settings - Fork 82
Using KeyVault FlexVolume with Autoscaler #183
Comments
What you are asking is exactly what a DaemonSet should do and the KeyVault FlexVolume installation configures a DaemonSet. Does it work fine for CPU instances provisioned by auto scaling? Maybe the issue is just with the DaemonSet node selector. It should work fine as is on AKS. Are you just having trouble on GKE? If you checked the labels on the GPU instances do they match the node selector? Make sure to also use the GKE specific FlexVolume mount path. |
I'm seeing another error, I think it's related to permissions though. Will report back
Although
Labels are matching and I have edited the .yaml file accordingly. Currently I'm not seeing it on the gpu server and I think that it won't work either way because the secretRef is not set automatically. EDIT: I have logged in to the GKE GPU node and check if flexvolume exists:
This of course shows that there's no flexvolume installed |
Hey @berndverst, I have been testing one of those two issues - the CPU problem with the missing tenantId. I have finally figured what is the root cause of this! While running manually the
As this is a GKE node, I cannot install Any ideas how this can be fixed? EDIT: The GKE node uses Chromium OS, which AFAI can tell, it's not possible to install jq over there, but I might be wrong on that one. My quick idea to solve this, would be to add a small python script:
This will allow to ensure that once Example:
Then, in the kv binary I'd add a check:
If there's no plan to use a multi (or more) dimensional json, then it might be worth to consider removing What do you think? I can make a PR if this solution seems fair enough. |
Hey @berndverst
I have finally found the problem. The GPU nodes contain a taint:
https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/ This means that in order for keyvault-flexvol to work, one will have to update the
I will add this to the PR as comment. |
Hi @ritazh , hope you are well :) Can you look at this issue or PR #186 for removing the reliance on Also, it would be good to provide some instructions for adding the toleration to the daemonset spec in case someone wants to run the Flex Volume driver on GPU nodes as well. Alternatively, provide an installer yaml for CPU instances and one for CPU + Nvidia GPU instances. I'm not sure that a comment in the yaml alone is obvious enough. |
Honestly, I think that by default the installation should work on all types of nodes i.e uncomment |
@Shaked I suppose it doesn't really matter anymore. I just learned that mounting KeyVault will be done differently in the future. Kubernetes moves fast as you know! Check this out: You'll need to use the Service Principal option if you are not running on AKS / AKS Engine clusters in Azure. |
I will definitely check it out, thank you! I still think that it will take time to migrate from the current implementation of keyvault-flexvol to the new one, and easier to upgrade its version than migrating and putting our resources on it. Therefore, I'd still be happy if this PR will be accepted. Do you know if the new implementation will allow to import secrets from AKV directly to ENV vars instead of mounting to files? |
@ritazh and team is whom you have to convince :) |
@Shaked sorry for the delay and thank you for the PR. will review and test the PR soon! |
Describe the request
Hey, I'm currently using GKE and AKS (different purposes). When using GKE, I have set autoscaler to run a minimum 0 GPU nodes in order to save money.
The problem is that once a GPU node is up, the keyvault-flexvol installation is not available.
If there's a way to let the autoscaler know that it needs to install keyvault-flexvol (or any other .yaml) that would solve the problem.
Explain why Key Vault FlexVolume needs it
It would allow developers to use KeyVault together with autoscaler in order to share secrets.
For example, this would be very powerful when using
kubeflow
orazureml
.Describe the solution you'd like
Not sure if possible, but provision node when it goes up same as nvidia does with its drivers for GPU nodes
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: