Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically remove user registry secrets #333

Closed
rokroskar opened this issue Jun 10, 2020 · 6 comments · Fixed by #435
Closed

automatically remove user registry secrets #333

rokroskar opened this issue Jun 10, 2020 · 6 comments · Fixed by #435
Assignees
Labels
devops enhancement New feature or request

Comments

@rokroskar
Copy link
Member

In #327 we introduced a mechanism to use the user's own authentication token for pulling images from private repositories. We want to minimize the amount of time these credentials are left around so we need some mechanism for removing them.

Some ideas:

  • remove secrets older than x minutes/hours
  • use a naming scheme such that a secret is made per session launch - then, the secret can be cleaned immediately after the container is done spawning
  • others?
@olevski
Copy link
Member

olevski commented Oct 12, 2020

@ableuler after our brief discussion today, here are our options to tackle this:

A. Add a preStop hook on the pod that runs the jupyterhub server to delete the token once the pod is shut down

  • Pros:
    • A small amount of code is added, it can be as simple as an API call to the k8s api from the pod to delete the specific secret
    • It happens automatically exactly when we needed regardless of how the pod is being destroyed (either because of inactivity or because the user decided to do so through the UI).
  • Cons:
    • The kubernetes API is not accessible from within the pod with this new egress policy on the user pods (feat: restrict user pod egress #430)
    • Even if access to the k8s API was available we would need to inject a service account with proper permissions into the pod so that the pod can use this to authenticate with the k8s API and delete the image pull secret. By default no such service account is injected and such a service account does not currently exist. To make this fully operational I think we would have to make a service account and role binding for every user. The problem is that we do not have a separate namespace for every user so we cannot make a Role that gives a user access only to their own secrets. I think that unless we decide to put every user's jupyterhub pods in a separate namespace this option is not possible.

B. Add a cronjob that will look through the secrets and delete old ones that are not tied to a pod that runs jupyterhub

  • Pros:
    • One cronjob takes care of all users in a specific deployment
    • The above-mentioned issues with changing the egress policy, inserting the service account in the pod and adding Roles is fully avoided because the cronjob does not have to operate from inside the jupyterhub user pod.
  • Cons:
    • There will be some delay when unused secrets are deleted. I.e. the cronjob can operate every hour and delete the secrets that are older than X hours and that do not have an actively running pod that is associated with them. So immediately after a user jupyterhub pod is deleted it will take some time for the user's image pull secret to be deleted.

Let me know what you think. Hopefully I did not miss any major considerations in writing this up. I tried to access the k8s api from a running pod and the requests would time out as long as the egress networkPolicy from #430 is active. When I delete this egress policy then I can successfully reach the k8s API.

I am not sure if this warrants a further/wider discussion about how much access to the k8s API we give to users and how we control/restrict this.

@ableuler
Copy link
Contributor

Thanks @olevski for laying out the options. I think option B is better. I don't mind secrets being around a bit longer than needed as long as they are eventually cleaned up.

@rokroskar
Copy link
Member Author

Under Option A there is also the possibility to have the secret around way longer than necessary - it's only needed at launch time, and the time between launch and the preStop hook being executed could be days or even weeks. The cron job running in ~30 minute intervals on the other hand means there is always a clear worst-case scenario.

@ableuler
Copy link
Contributor

@rokroskar We actually discussed this: I guess we definitely need the secret until the pod has been assigned to a node (which can take a while in the case of insufficient resources). Do you know if currently k8s will reschedule a user pod on a different node in case of a node failure?

@rokroskar
Copy link
Member Author

Right, so the secret culling process has to check whether the pod that might need the secret is actually up and running already or not, to avoid the situation where the secret would be removed before the credentials are actually used.

AFAIK if the node fails the pod is gone.

@ableuler
Copy link
Contributor

AFAIK if the node fails the pod is gone.

True - I actually hope so because all the ephemeral disk space would be gone anyway...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants