Considerations of max user session lifetime #3015

consideRatio · 2023-08-23T06:34:13Z

I just saw some straggling user pods that has been around for a month or so hogging up a total of 160 CPU/month. That could have been averted if we for example declared an upper bound on user session length.

I suggest we evaluate if we want to develop a way to help community champions decide if they want to use a safeguard like this, and if we want to introduce a default in existing clusters, and if we want to set a default for new clusters.

The jupyterhub-idle-culler, as configurable via the jupyterhub helm chart, provides this simple option. See notes about the --max-age flag.

The text was updated successfully, but these errors were encountered:

colliand · 2023-08-23T14:17:27Z

Thanks @consideRatio. I like your suggestion.

Cloud costs are a concern among hub champions. Some users, anecdotally this is rare, launch long-running jobs that persist during intervals of little UX interaction.

I suggest that 2i2c implements as default an aggressive culling strategy designed to minimize cloud costs. We should write associated documentation that describes the default and provides guidance on how hub champions can override the default with additional risk of cloud cost explosions. Partnerships should inform hub champions about the default and override options during the service launch process. With this approach, 2i2c makes opinionated choices we think will lower cloud costs without troubling most usage scenarios while offering flexibility with more risk explicitly described as the champion/community responsibility.

consideRatio · 2023-08-23T15:05:13Z

Assuming there is consensus that we want to introduce a new behavior to existing clusters, the biggest blocker in my mind is communication.

In software projects one can cut a new major release and document a breaking change in the changelog, and the users of the software can upgrade to the new version but is kind of responsible for reading the changelog before they do.

I think we lack an equivalent to this for the service we provide. I opened #3017 about this.

yuvipanda · 2023-08-23T18:43:36Z

Conservatively, let's just start with a week long max pod duration. That should cut the worst offenders. I agree communication needs work, but currently there's nobody assigned to it and I don't want us to block having some limit on that.

colliand · 2023-08-24T19:30:57Z

I love the way @consideRatio considers the impact of changes on our users. Thanks Erik! I want 2i2c to have optimized best practices for communicating changes to our partner communities. That said, we are a small team and have to do what we can with the resources available so the guidance from @yuvipanda seems wise here. I'll write more on the policy issue.

Ref 2i2c-org#3015

jmunroe · 2023-10-05T16:43:26Z

PR #3042 is now unblocked.

We next modify the documentation:

https://github.com/2i2c-org/docs/blob/f7b2b87676eeeef55b4762ab0d1321b3c3720bcf/admin/howto/control-user-server.md?plain=1#L90

Would something like

Unless configured otherwise, a user's server can run for a maximum of 7 days continuously. This is to preventing unintended long running workflows from accidently consuming cloud resources. This limit can be configured for a particular hub upon request to [email protected].

be sufficient?

I think this should be included in an overall training workshop we our hub champions. I don't see it as significant enough for a direct email blast to all hub champions. If we did have a monthly newsletter ('2i2c Updates' or something) we could highlight this change in that kind of document. (But that's out of scope for this particular issue)

Ref 2i2c-org#3015

damianavila · 2023-10-20T16:13:49Z

I think 2i2c-org/docs#193 closes this one. Feel free to reopen if you disagree.

github-project-automation bot added this to DEPRECATED Engineering and Product Backlog Aug 23, 2023

github-project-automation bot moved this to Needs Shaping / Refinement in DEPRECATED Engineering and Product Backlog Aug 23, 2023

consideRatio mentioned this issue Aug 23, 2023

A procedure for communicating breaking changes is needed #3017

Closed

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Aug 28, 2023

Cull pods that run for longer than 7 days

1a7f9bc

Ref 2i2c-org#3015

yuvipanda mentioned this issue Aug 28, 2023

Cull pods that run for longer than 7 days #3042

Merged

damianavila added this to Sprint Board Aug 28, 2023

damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Aug 28, 2023

damianavila moved this to In Progress ⚡ in Sprint Board Aug 28, 2023

damianavila assigned yuvipanda Aug 28, 2023

yuvipanda removed their assignment Sep 7, 2023

damianavila removed this from Sprint Board Sep 8, 2023

consideRatio pushed a commit to yuvipanda/pilot-hubs that referenced this issue Oct 19, 2023

Cull pods that run for longer than 7 days

affbb4a

Ref 2i2c-org#3015

damianavila assigned consideRatio Oct 20, 2023

damianavila added this to Sprint Board Oct 20, 2023

damianavila assigned yuvipanda Oct 20, 2023

damianavila closed this as completed Oct 20, 2023

github-project-automation bot moved this from In progress to Complete in DEPRECATED Engineering and Product Backlog Oct 20, 2023

github-project-automation bot moved this to Done 🎉 in Sprint Board Oct 20, 2023

consideRatio mentioned this issue Oct 21, 2023

Document culling of idle kernels and inform community champions #3296

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Considerations of max user session lifetime #3015

Considerations of max user session lifetime #3015

consideRatio commented Aug 23, 2023 •

edited

Loading

colliand commented Aug 23, 2023

consideRatio commented Aug 23, 2023

yuvipanda commented Aug 23, 2023

colliand commented Aug 24, 2023

jmunroe commented Oct 5, 2023

damianavila commented Oct 20, 2023

Considerations of max user session lifetime #3015

Considerations of max user session lifetime #3015

Comments

consideRatio commented Aug 23, 2023 • edited Loading

colliand commented Aug 23, 2023

consideRatio commented Aug 23, 2023

yuvipanda commented Aug 23, 2023

colliand commented Aug 24, 2023

jmunroe commented Oct 5, 2023

damianavila commented Oct 20, 2023

consideRatio commented Aug 23, 2023 •

edited

Loading