Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Considerations of max user session lifetime #3015

Closed
consideRatio opened this issue Aug 23, 2023 · 6 comments
Closed

Considerations of max user session lifetime #3015

consideRatio opened this issue Aug 23, 2023 · 6 comments
Assignees

Comments

@consideRatio
Copy link
Contributor

consideRatio commented Aug 23, 2023

I just saw some straggling user pods that has been around for a month or so hogging up a total of 160 CPU/month. That could have been averted if we for example declared an upper bound on user session length.

I suggest we evaluate if we want to develop a way to help community champions decide if they want to use a safeguard like this, and if we want to introduce a default in existing clusters, and if we want to set a default for new clusters.

The jupyterhub-idle-culler, as configurable via the jupyterhub helm chart, provides this simple option. See notes about the --max-age flag.

@colliand
Copy link
Contributor

Thanks @consideRatio. I like your suggestion.

Cloud costs are a concern among hub champions. Some users, anecdotally this is rare, launch long-running jobs that persist during intervals of little UX interaction.

I suggest that 2i2c implements as default an aggressive culling strategy designed to minimize cloud costs. We should write associated documentation that describes the default and provides guidance on how hub champions can override the default with additional risk of cloud cost explosions. Partnerships should inform hub champions about the default and override options during the service launch process. With this approach, 2i2c makes opinionated choices we think will lower cloud costs without troubling most usage scenarios while offering flexibility with more risk explicitly described as the champion/community responsibility.

@consideRatio
Copy link
Contributor Author

Assuming there is consensus that we want to introduce a new behavior to existing clusters, the biggest blocker in my mind is communication.

In software projects one can cut a new major release and document a breaking change in the changelog, and the users of the software can upgrade to the new version but is kind of responsible for reading the changelog before they do.

I think we lack an equivalent to this for the service we provide. I opened #3017 about this.

@yuvipanda
Copy link
Member

Conservatively, let's just start with a week long max pod duration. That should cut the worst offenders. I agree communication needs work, but currently there's nobody assigned to it and I don't want us to block having some limit on that.

@colliand
Copy link
Contributor

I love the way @consideRatio considers the impact of changes on our users. Thanks Erik! I want 2i2c to have optimized best practices for communicating changes to our partner communities. That said, we are a small team and have to do what we can with the resources available so the guidance from @yuvipanda seems wise here. I'll write more on the policy issue.

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Aug 28, 2023
@damianavila damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Aug 28, 2023
@damianavila damianavila moved this to In Progress ⚡ in Sprint Board Aug 28, 2023
@yuvipanda yuvipanda removed their assignment Sep 7, 2023
@jmunroe
Copy link
Contributor

jmunroe commented Oct 5, 2023

PR #3042 is now unblocked.

We next modify the documentation:

https://github.com/2i2c-org/docs/blob/f7b2b87676eeeef55b4762ab0d1321b3c3720bcf/admin/howto/control-user-server.md?plain=1#L90

Would something like

Unless configured otherwise, a user's server can run for a maximum of 7 days continuously. This is to preventing unintended long running workflows from accidently consuming cloud resources. This limit can be configured for a particular hub upon request to [email protected].

be sufficient?

I think this should be included in an overall training workshop we our hub champions. I don't see it as significant enough for a direct email blast to all hub champions. If we did have a monthly newsletter ('2i2c Updates' or something) we could highlight this change in that kind of document. (But that's out of scope for this particular issue)

@damianavila
Copy link
Contributor

I think 2i2c-org/docs#193 closes this one. Feel free to reopen if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done 🎉
Development

No branches or pull requests

5 participants