Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garden not refreshing auth tokens #2330

Closed
graemechristie opened this issue Mar 29, 2021 · 25 comments
Closed

Garden not refreshing auth tokens #2330

graemechristie opened this issue Mar 29, 2021 · 25 comments
Assignees
Labels
bug priority:high High priority issue or feature provider/k8s

Comments

@graemechristie
Copy link

Bug

Garden does not refresh auth tokens in the same manner that kubectl does. When using garden with AKS and RBAC set up, the token in the kubeconfig file will expire and garden does not fetch a new one (as for e.g. kubectl does). I need to do a manual kubectl get pods or other arbitrary kubectl command then garden will continue working. This issue looks to have been reported several years ago, acknowledged, but then closed as stale

#1043

Current Behavior

When using AKS and a cluster with RBAC enabled, auth tokens will time out and garden commands will return

image

Expected behavior

When an auth token has expired, garden will aquire a new one using the refresh mechanism that kubectl uses.

Reproducible example

Workaround

We need to manually run an arbitrary kubectl commmand when we get the above error message.

Suggested solution(s)

Update you k8s client code to refresh auth tokens when they expire.

Additional context

Your environment

  • OS: macOS
  • How I'm running Kubernetes: AKS

garden version

0.12.19

@edvald
Copy link
Collaborator

edvald commented Mar 29, 2021

Thanks @graemechristie. I think we might need to update our k8s client library (which we didn't write ourselves), I sort of suspect this has been fixed upstream. I'll take a look.

@ITHedgeHog
Copy link
Contributor

If it does @edvald you can close this too #2311

@edvald
Copy link
Collaborator

edvald commented Mar 29, 2021

Perhaps I could trouble you to give the build a spin, to see if it does the job? https://app.circleci.com/pipelines/github/garden-io/garden/8935/workflows/e6fe4f59-061f-49a3-b638-73ab20964af3/jobs/114125/artifacts

The first four artifacts are the packaged builds for the PR. Just need to curl and extract the appropriate archive and run the binary from there, instead of the installed one. Note: on macOS you might bump into permission issues if you download via the browser, better to curl/wget from the terminal.

@ITHedgeHog
Copy link
Contributor

Grabbing it now

thsig pushed a commit that referenced this issue Mar 31, 2021
@ITHedgeHog
Copy link
Contributor

@edvald Sorry for the delay, still getting this when in use:

StatusCodeError from Kubernetes API - 401 - {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","mes
sage":"Unauthorized","reason":"Unauthorized","code":401}

That is with garden 0.12.20

@ITHedgeHog
Copy link
Contributor

Sorry I should clarify, that it works initially - but leaving garden dev running eventually returns that error message.

@edvald
Copy link
Collaborator

edvald commented Apr 2, 2021

Ah I see, that does clarify. That we can probably handle on our side by automatically retrying the requests. I'll need to inspect their code but I suspect the client library doesn't do that sort of thing out of the box.

@edvald edvald added bug priority:medium Medium priority issue or feature provider/k8s labels Apr 2, 2021
@edvald edvald self-assigned this Apr 2, 2021
@eysi09 eysi09 added priority:high High priority issue or feature and removed priority:medium Medium priority issue or feature labels Apr 20, 2021
@ITHedgeHog
Copy link
Contributor

I've just rested this on Garden 0.12.21

This is after a reboot of my development system, I open PowerShell and run garden dev (Without kubectl version) before hand.

garden_dev_error.txt

Obviously, if I run kubectl version first the token is there - it doesn't look like it requests one at all currently?

@eysi09
Copy link
Collaborator

eysi09 commented Apr 29, 2021

So this happens on the first run of garden dev? And presumably other Garden commands as well, right?

We bumped the K8s client lib to the latest version before this release to ensure us being on an older version wasn't causing the original issue (from 1.14.0 to 1.14.3). Wonder if a regression was introduced on their end between releases.

I did scan the commits in K8s client lib but didn't see anything unusual.

@ITHedgeHog
Copy link
Contributor

Yep, I have just restarted my machine again and validated that garden build|test|validate exhibit the same behaviour.

@ITHedgeHog
Copy link
Contributor

I see that in go this is the bit that makes it work https://github.com/kubernetes/client-go/tree/f6ce18ae578c8cca64d14ab9687824d9e1305a67/plugin/pkg/client/auth/azure

Looking through the repo for the node package you use I came across this: kubernetes-client/javascript#358

It looks like it might be a known issue that is currently frozen.

Let me know if I'm way off base with this assumption?

@eysi09
Copy link
Collaborator

eysi09 commented May 3, 2021

Interesting, that issue makes me wonder why it worked in the first place.

But we're currently working on a fix on our end that catches these kind of errors and refreshes the token—via the K8s client API though, so the assumption is that that works in the first place.

And just to be sure, this works on Garden v0.12.20? You can grab it from here: https://github.com/garden-io/garden/releases/tag/0.12.20 while we work on a fix but it doesn't have dev-mode. Also, you might want to download it via curl if you're on macOS because otherwise it was complain about downloaded binaries.

If this version works for you, it suggests we should downgrade the K8s client lib back to 1.14.0. I'll also open a issue in their repo on the topic.

@ITHedgeHog
Copy link
Contributor

My apologies my initial comment that v0.12.20 worked first time is incorrect, I've not been able to verify that on either my work machine nor my home machine (Which has been rebuilt since the initial testing).

garden does not get the correct token unless I run kubectl version first.

@eysi09
Copy link
Collaborator

eysi09 commented May 14, 2021

Just to be sure I'm heading in the right direction here, is this the RBAC functionality from Azure that you believe is causing the issues: https://docs.microsoft.com/en-us/azure/aks/manage-azure-rbac

(cc @graemechristie, @ITHedgeHog)

@graemechristie
Copy link
Author

yes @eysi09 - that is the feature we are using with Azure AKS (although I'm not sure if the openId/token expiry issue is Azure specific, or if the openId standard for tokens in kubenetes is an open standard/feature utilsed by AKS) .. I had assumed the latter as kubectl seems to have native support for it.

@graemechristie
Copy link
Author

Looking at #1043 - it seems this also affects other services on aws, or anything using kubectl exec plugins like heptio-authenticator-aws - so it seems the AKS support in kubectl is via a "kubelogin" plugin (presumably this https://github.com/Azure/kubelogin) .. e.g. my kubeconfig uses:

image

@eysi09 eysi09 assigned eysi09 and unassigned edvald May 27, 2021
@twelvemo
Copy link
Collaborator

We reproduced the issue internally. However when I used kubelogin and followed these instructions https://github.com/Azure/kubelogin#azure-cli-token-login-non-interactive I did not face any problems. After not having interacted with the cluster for more than 2 hours I was able to deploy with garden, @eysi09 did the same but without using kubelogin and ran into the missing auth token issue.
@graemechristie your kubeconfig looks pretty much like mine with kubelogin, but you are still experiencing auth issues with garden?

@graemechristie
Copy link
Author

That makes sense @twelvemo - I probably did not understand the role of kubelogin in respect to those tokens. I will give this a go but I suspect it will resolve our issue.

@twelvemo
Copy link
Collaborator

sounds good! Would love to hear if it works @graemechristie then we can put this in our docs.

@eysi09
Copy link
Collaborator

eysi09 commented Jun 17, 2021

cc @ITHedgeHog ☝️

We'll also add an entry to our Troubleshooting guide on this (assuming it works).

@ITHedgeHog
Copy link
Contributor

Thanks, @eysi09 @twelvemo I'm giving it a try today.

@ITHedgeHog
Copy link
Contributor

@eysi09 @twelvemo That appears to have resolved it for me, I've been working for 5 hours non-stop now without having to run any manual kubectl commands.

@eysi09
Copy link
Collaborator

eysi09 commented Jun 24, 2021

That's great news! I'll leave this open and flag it as a discussion for now so that it's visible to others. We'll also need to add this to our docs.

@eysi09
Copy link
Collaborator

eysi09 commented Jun 24, 2021

See: #2453

@thsig
Copy link
Collaborator

thsig commented Jun 7, 2022

The fix/workaround has been documented now, so I think we can safely close this issue.

@thsig thsig closed this as completed Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug priority:high High priority issue or feature provider/k8s
Projects
None yet
Development

No branches or pull requests

6 participants