Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/configure kernel launch terminate on events #1383

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

OrenZ1
Copy link

@OrenZ1 OrenZ1 commented Jun 2, 2024

This pull request adds the option to configure kernel container events which should cause a termination of the kernel launch process.
The user can now configure "unresolvable" events, to prevent a long wait for the timeout when such events occur.
It also allows the users to set a different timeout for some events, in example when a user knows an event may resolve after a small wait, but if it didn't, they want to terminate the kernel launch process.

@OrenZ1
Copy link
Author

OrenZ1 commented Jun 2, 2024

Discussed in #1357

@kevin-bates kevin-bates added enhancement kubernetes kernels docker gateway-provisioners Has application for gateway-provisioners (EG 4.0) labels Jun 3, 2024
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @OrenZ1 - this looks good. I'm wondering if this particular option shouldn't be documented - at which point we could add references to expected event types and reasons.

Also, once merged, it would be awesome to get this into the similar location within gateway_provisioners.

kernel_pod_events = self.get_container_events()
for event in kernel_pod_events:
if (
event.type in self.kernel_launch_terminate_on_events
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the event type and reason need to be matches, is there a doc reference we could add that enumerates the various values that could be expected? That said, I suppose folks will need to be reactive (vs. proactive) in this sense and will have the event information when troubleshooting, but it still might be helpful to have a link or two, or add the enumeration directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a docstring to handle_pending_kernel method in container.py, as well as a more detailed configuration help. I found it hard to find a detailed list of Kubernetes events by type and reason, so I added an enumeration which may differ in different platforms (or even versions of K8S itself). I agree that folks will need to be reactive, but I did also add a link to a guide I found helpful with finding the correct event type and reason.

@OrenZ1
Copy link
Author

OrenZ1 commented Jun 24, 2024

Hi! Any update on this?

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this option should have a presence in the Operator's Guide somewhere but can't really find a good location. The only way folks will know about this right now is looking at the code. Thinking we should at least add it into Command-line Options in the RemoteMappingKernelManager section.

@OrenZ1
Copy link
Author

OrenZ1 commented Jun 25, 2024

Added presence in the Command-Line Options in the RemoteMappingKernelManager section.

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thank you @OrenZ1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker enhancement gateway-provisioners Has application for gateway-provisioners (EG 4.0) kernels kubernetes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants