Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel handshaking pattern proposal #66

Merged
merged 14 commits into from
Jun 26, 2023

Conversation

JohanMabille
Copy link
Member

@JohanMabille JohanMabille commented Jan 5, 2021

Kernel Handshaking pattern

The current implementation of Jupyter client makes it responsible for finding available ports and pass them to a new starting kernel. The issue is that a new process can start using one of these ports before the kernel has started, resulting in a ZMQError when the kernel starts. This is even more problematic when spawning a lot of kernels in a short laps of time, because the client may find available ports that have already been assigned to another kernel.

A workaround has been implemented for the latter case, but it does not solve the former one.

This proposal aims to fix this issue for good, with a handshaking pattern.

Voting from @jupyter/software-steering-council

@kevin-bates
Copy link
Member

FWIW - this is what Enterprise Gateway does. However, I'm in the process of updating it to use a single response address to which the kernel (launchers) respond, rather than having a different response address for each kernel invocation. The reason for the update is that by having a single response address across all invocations, it introduces the possibility of having the server in one cluster, and the kernel in another (e.g., server running in Kubernetes, while the kernel is running in a Hadoop YARN cluster) because the (single) response address can be published.

BTW, although "local" kernels in EG do not use this approach, we have recommended the configuration of "distributed" kernels in loopback just to work around the timing issues you are trying to address since distributed kernels will go through this machinery despite the fact that they are indeed local to the server.

The other major difference is that EG utilizes kernel launchers which (preferably) embed the target kernel so as to avoid any kernel modifications to the dozens of kernel implementations.

Since this proposal requires updates to the kernels themselves, one suggestion I would make is to not overload the use of the connection file parameter and introduce a separate, and explicit, parameter that is mutually exclusive. I think that would help in troubleshooting and maintenance.

I would also suggest that the server (client in the above phrasing) be the thing that produces the connection-file (following receipt of the port information via the socket) not the kernel since the latter (incorrectly) assumes that kernels are always local.

@JohanMabille
Copy link
Member Author

However, I'm in the process of updating it to use a single response address to which the kernel (launchers) respond, rather than having a different response address for each kernel invocation. T

It's the same in this proposal, the client opens a single socket that will be used by all launched kernels to communicate their port information.

one suggestion I would make is to not overload the use of the connection file parameter and introduce a separate, and explicit, parameter that is mutually exclusive.

I'm not sure that we can specify mutually exclusive arguments in the kernelspec.

I would also suggest that the server (client in the above phrasing) be the thing that produces the connection-file (following receipt of the port information via the socket) not the kernel since the latter (incorrectly) assumes that kernels are always local.

Indeed that would be definitely better, thanks for the suggestion!

@kevin-bates
Copy link
Member

Thanks for the response Johan - glad to see this is a single response address proposal as well.

one suggestion I would make is to not overload the use of the connection file parameter and introduce a separate, and explicit, parameter that is mutually exclusive.

I'm not sure that we can specify mutually exclusive arguments in the kernelspec.

Ok. This overload approach probably fits better with backward-compatibility support.

If the new field "kernel_handshake" is missing from the kernelspec, the clients should emit a warning, and assume that the new mechanisms is not supported, and recommend that the key is added to the kernelspec.

Should we just let the kernel vendors add kernel_handshake when installing the kernelspec and leave it at that. Seems like if folks are using kernels that have no plans on being updated, this will cause some unnecessary noise.

Is kernel_handshake a boolean? If so, perhaps users could manually add it - with a value of False - to indicate that no warning should be produced. This approach would also allow supporting implementations to easily fallback in case issues are encountered with the new approach.

@blois
Copy link

blois commented Jan 5, 2021

Are domain sockets a viable approach for you to avoid port conflicts?

@jasongrout
Copy link
Member

Are domain sockets a viable approach for you to avoid port conflicts?

I was going to ask about Windows, but it seems that Windows now has domain sockets?

According to https://github.com/tokio-rs/tokio/issues/2201:

Windows 10 introduced support for unix sockets in late 2017 to Insiders builds, and the spring 2018 general Windows 10 release.

@MSeal
Copy link
Contributor

MSeal commented Jan 5, 2021

Some context and thread to implement a version of this proposal: jupyter/jupyter_client#487

I shamefully did not find time to work on the implementation before going on paternity leave, but I think it has many of the details about why and how different scenarios could be handled.

@JohanMabille
Copy link
Member Author

Are domain sockets a viable approach for you to avoid port conflicts?

I don't think we want to drop support for TCP protocol. Besides, you may want to connect to a kernel from a distant client.

@JohanMabille
Copy link
Member Author

Should we just let the kernel vendors add kernel_handshake when installing the kernelspec and leave it at that. Seems like if folks are using kernels that have no plans on being updated, this will cause some unnecessary noise.

We want kernels to update to the new mechanism so that we can drop the old one in the long run.

@blois
Copy link

blois commented Jan 6, 2021

I don't think we want to drop support for TCP protocol. Besides, you may want to connect to a kernel from a distant client.

The random port choice right now does not really work for distant clients- how common are those issues? I'm not saying that TCP should be dropped, just wondering if IPC or domain sockets can be used for your scenario to avoid the need for larger changes. We've been using IPC for this for quite a while now.

@JohanMabille
Copy link
Member Author

The random port choice right now does not really work for distant clients

This is orthogonal to distant client. I mean, you could start a kernel and a client on a machine A, and want to connect to that kernel from a client running on a machine B. In that case, you cannot use IPC sockets.

how common are those issues

In the context of Voilà, we noticed up to 30% failure (i.e. notebooks not displayed).

@Carreau
Copy link
Member

Carreau commented Jan 6, 2021

isn't there a missing fields that should be added to the connection file ?

and send the connection information to the client.

It's the same in this proposal, the client opens a single socket that will be used by all launched kernels to communicate their port information.

How will the client know which kernel is communicating its ports ? do you need like a kernel_handshake and a kernel_token?

@kevin-bates
Copy link
Member

How will the client know which kernel is communicating its ports ? do you need like a kernel_handshake and a kernel_token?

EG sends the kernel_id to the kernel and it is returned to handle interleaved launches/responses.

I guess I still have a concern about using the connection file to convey the response address as it immediately eliminates remote kernels. I think the response address (and whatever else is required) needs to be an argument to the kernel - just as the connection filename is.

Also, are there plans for encrypting the response?

@Carreau
Copy link
Member

Carreau commented Jan 6, 2021

EG sends the kernel_id to the kernel and it is returned to handle interleaved launches/responses.

How does it sends it ?

I guess I still have a concern about using the connection file to convey the response address as it immediately eliminates remote kernels.

I'm not sure I understand why this eliminates remote kernel. If the response adresse is 192.131.21.12:7737 remote kernel can still ping back to the server; or is there something I misunderstood ?

I'm also not sure what the difference is with the response adresse being part of the connection file – or the kernel argument changes.

Also, are there plans for encrypting the response?

At that point I'm not sure we can encrypt but at least we may want to authenticate maybe.

@kevin-bates
Copy link
Member

EG sends the kernel_id to the kernel and it is returned to handle interleaved launches/responses.

How does it sends it ?

As an explicit argument to the launcher: https://github.com/jupyter/enterprise_gateway/blob/master/etc/kernelspecs/spark_python_kubernetes/kernel.json#L21-L22

I'm not sure I understand why this eliminates remote kernel. If the response adresse is 192.131.21.12:7737 remote kernel can still ping back to the server; or is there something I misunderstood ?

Currently, it sounds like the connection file will contain the response address, but the file is local - so the file's contents are not available to the remote kernel and, as a result, it won't know to respond to 192.131.21.12:7737 - all it knows is the name of the connection file (which isn't very helpful and won't exist).

Also, are there plans for encrypting the response?

At that point I'm not sure we can encrypt but at least we may want to authenticate maybe.

EG also sends a public key (via another parameter) that is used by the kernel (launcher) to encrypt an AES key (to encrypt the response). On return, the AES key is decrypted using the private key and then the payload is decrypted using the AES key.

I know this is all probably more than you want but I'm just sharing experiences (and trying to ensure our functionality can somehow continue to exist in some manner).

@JohanMabille
Copy link
Member Author

JohanMabille commented Jan 6, 2021

How will the client know which kernel is communicating its ports ? do you need like a kernel_handshake and a kernel_token?

Indeed, the client should add a kernel_token to the connection file so that it can identifies which kernel is communicating its ports.

I guess I still have a concern about using the connection file to convey the response address as it immediately eliminates remote kernels. I think the response address (and whatever else is required) needs to be an argument to the kernel - just as the connection filename is.

If you pass the reponse address as an argument to the kernel, and if that kernel is remote, you need to send the response address from the client to the remote machine that will launch the kernel. It's the same with the connection file, you need to send the connection file to the remote machine where it can be dumped. In both cases, you need an intermediate actor on the remote machine.

Since the kernel command line alreay accepts a conneciton file, and since we cannot specify mutually exclusive arguments in the kernelspec, I think we should stick keep using the conneciton file.

At that point I'm not sure we can encrypt but at least we may want to authenticate maybe.

Indeed, since the connection file contains the signature shceme and the key.

I know this is all probably more than you want but I'm just sharing experiences (and trying to ensure our functionality can somehow continue to exist in some manner).

Thanks for sharing, this is much apreciated! I don't think this change will break your functionality (although there might be subtlety that I missed): the kernel is still started with a connection file, the only difference is that the kernel will communicate back its port to the client. The connection file still has to be transferred from the client machine to the kernel machine if the kernel is remote (via a gateway or any other intermediate agent).

@kevin-bates
Copy link
Member

The connection file still has to be transferred from the client machine to the kernel machine if the kernel is remote (via a gateway or any other intermediate agent).

Not true. With EG we communicate with a resource manager (kubernetes, docker swarm, hadoop YARN, etc.) to start a launcher that embeds the kernel. The launcher will ultimately produce a connection file to be consumed by its associated kernel (and local to the kernel) but all interaction between the server (client) and the launcher (kernel) takes place via arguments and the launcher code is responsible for encrypting and sending the response (thereby preventing any updates to the associated kernel). For the container-based resource managers, the arguments are passed via environment variables. No file transfers are necessary - particularly because it's not even known at the time of launch on which node the resource manager will decide to run the launcher/kernel.

At any rate, thanks for working on this and listening.

@JohanMabille
Copy link
Member Author

The launcher will ultimately produce a connection file to be consumed by its associated kernel

What does prevent you from using the mechanism you already have to pass the additional arguments (i.e. client address and dedicated socket port + kernel token) to the launcher and having it dumping the connection file with these new parameters?

@kevin-bates
Copy link
Member

That's not that big of a deal - although I was hoping (assuming I could move to this proposal) not having to open the "connection file" to extract the pieces as arguments (and believe it's better for the sake of maintainability to have them separate anyway).

If I'm understanding things, here's what the launcher would then need to do.

  1. Since the connection file argument is overloaded, the launcher will first need to determine if it's talking to a kernel that understands the handshake because what the kernelspec on the server states may not match the kernel installed on the remote node. Not sure how to best do this. For the sake of argument, let's "trust" that the configuration is proper and that if the kernelspec contains the "handshake" entry, then the remote kernel is "handshake-aware".
  2. If the launcher is talking to a "handshake-aware" kernel, yes, it would produce a "connection file" which the kernel then responds against. However, that response will be in a completely different format (non-encrypted for one) and sent to a different listener - and EG will need to figure out how to interact with that.
  3. If the launcher is talking to a "legacy" kernel (that doesn't know about handshakes - although I'm not sure how that's determined), it would do as it does today: produce the ports, encrypt the response and send it to a different listener - one that knows how to decrypt the response, and start the kernel.

I think it's imperative really, that a) the response parameters be conveyed via a different argument than the connection file and b) that kernels updated to use this approach still retain today's behavior - where the connection file contains the actual connection information. That way, applications that have already addressed port conflict issues can still operate and items listed above will no longer apply.

I think at this point, it would be best to meet online where we can probably communicate more interactively. I am willing to adjust for timezone differences - preferring mornings (Pacific TZ). I suspect it's very late for you right now and I appreciate you taking this extra time. If you want to discuss things tomorrow, just ping me in your afternoon. I'm usually at my desk by 7 am PST.

FWIW, I have a proposal (in my head for too long) similar to kernel providers but built into jupyter_client where the Popen process interactions are abstracted. I think if that layer were responsible for handling the connection - making KernelManager really just an application-level entity, then we could easily support IPC, TCP (with handshake), TCP (remote), Kubernetes, etc. as part of the base jupyter framework. This is essentially what EG does with process-proxies but there's no reason this framework can't be made available to any jupyter_client application.

@Carreau
Copy link
Member

Carreau commented Jan 7, 2021

I think we should decompose the what to pass to the kernel and the how, I think we agree on what.

A warning against CLI parameter; you do not want to pass security sensitive items via command line arguments as those might be leaked in process list to other users; hence why usually you want an env variable or a file.

Many CVEs, on notebook server were due to open chrome with a URL having a token and this being leaked to other process/users even non-privileged;

Side note; non-handshake aware kernel could be wrapped in a mini handshake aware proxy on remote host to hide details, might have to jump through hoops, but doable.

@SylvainCorlay
Copy link
Member

In my opinion, passing the handshake port in the connection file is the appropriate route.

  • other information than port numbers must be passed (tcp/ipc, the message signature scheme and key, and the client address). All of it except for the channel/handshake port is common with the current connection file.
  • the kernelspec can only have one command to start the kernel, so it should be the same regardless of the handshake mechanism if we need to support both.

In my opinion, @JohanMabille's proposal is the appropriate course of action to fix this issue. Note that it has to be fixed as the current kernel startup is inherently broken.

We would be happy to connect to discuss this in greater detail.

@JohanMabille
Copy link
Member Author

JohanMabille commented Jan 7, 2021

because what the kernelspec on the server states may not match the kernel installed on the remote node. Not sure how to best do this.

This (the kernelspec not matching the installed kernel) looks like a deployment problem independent from this JEP. This mismatch could already exist regarding the arguments that a kernel accepts for instance.

If the launcher is talking to a "handshake-aware" kernel, yes, it would produce a "connection file" which the kernel then responds against. However, that response will be in a completely different format (non-encrypted for one) and sent to a different listener - and EG will need to figure out how to interact with that.

What does prevent you from having a proxy between the client and the kernel and passing the address of this proxy in the connection file?

I think it's imperative really, that a) the response parameters be conveyed via a different argument than the connection file

I still don't understand why this would be problematic since you already need to "transfer" the information somehow (transfering the connection file or passing data to a resource manager and then generating the connection file from this data on the remote machine is an implementation detail).

FWIW, I have a proposal (in my head for too long) similar to kernel providers but built into jupyter_client where the Popen process interactions are abstracted. I think if that layer were responsible for handling the connection - making KernelManager really just an application-level entity, then we could easily support IPC, TCP (with handshake), TCP (remote), Kubernetes, etc. as part of the base jupyter framework

This is specific to jupyter_client, which is a Python implementation of a client in the Jupyter ecosystem. I think the specification should remain agnostic to the languages.

I think at this point, it would be best to meet online where we can probably communicate more interactively

I agree. What's the best medium to ping you?

@kevin-bates
Copy link
Member

Matthias is correct, we should focus on the what instead of the how. I'm sorry this has diverged but I'm just trying to make these two implementations (EG's and this proposal) as compatible as possible.

What's the best medium to ping you?

Thanks. Please use my email address associated with my GH profile.

@JohanMabille
Copy link
Member Author

@kevin-bates I've just updated the JEP based on the discussion we had today and the feedback from @Carreau . Please let me know if I missed something.

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks for your patience and collaboration Johan!

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

My only points left are:

  • I think JEP for Optional Features #92 would be a better tool to indicate feature support, rather than making it unconditional on protocol version number. But I'm okay without it, since it hasn't happened yet.
  • It's still unclear to me if a JEP like this would need the level of detail of an actual spec update, that is:
    • what does a connection file look like for a handshake startup?
    • what do the registration request/reply messages look like, exactly?
      I think it's clear what information should be there, and I think it's okay to save that level of detail to the implementations, but I'm not sure folks agree on that.


## Proposed Enhancement

We propose to implement a handshaking pattern: the client lets the kernel find free ports and communicate them back via a dedicated socket. It then connects to the kernel. More formely:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We propose to implement a handshaking pattern: the client lets the kernel find free ports and communicate them back via a dedicated socket. It then connects to the kernel. More formely:
We propose to implement a handshaking pattern: the client lets the kernel find free ports and communicate them back via a dedicated socket. It then connects to the kernel. More formally:


We propose to implement a handshaking pattern: the client lets the kernel find free ports and communicate them back via a dedicated socket. It then connects to the kernel. More formely:

- The kernel launcher is responsible for opening a dedicated socket for receiving connection information from kernels (channel ports). This socket will be refered as the registration socket.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The kernel launcher is responsible for opening a dedicated socket for receiving connection information from kernels (channel ports). This socket will be refered as the registration socket.
- The kernel launcher is responsible for opening a dedicated socket for receiving connection information from kernels (channel ports). This socket will be referred as the **registration socket**.


- The kernel launcher is responsible for opening a dedicated socket for receiving connection information from kernels (channel ports). This socket will be refered as the registration socket.
- When starting a new kernel, the launcher passes the connection information for this socket to the kernel.
- The kernel starts, find free ports to bind the shell, control, stdin, heartbeat and iopub sockets. It then connect to the registration socket and send the connection information to the client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The kernel starts, find free ports to bind the shell, control, stdin, heartbeat and iopub sockets. It then connect to the registration socket and send the connection information to the client.
- The kernel starts, finds free ports to bind the shell, control, stdin, heartbeat, and iopub sockets. It then connects to the registration socket and sends the connection information to the client.


The way the launcher passes the connection information for the registration socket to the kernel should be similar to that of passing the ports of the kernel socket in the current connection pattern: a connection file that can be read by local kernels or sent over the network for remote kernels (although this requires a custom kernel provisioner or "nanny"). This connection file should also contain the signature scheme and the key.

The kernel should not expect the registration socket to exist after it has sent its registration (i.e. it can be closed). Therefore, the kernel should disconnect from the registration socket right after it has sent its connection information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it wait for a response to acknowledge receipt? That could allow kernels to shut themselves down if the handshake never completes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'll amend the JEP.


### Remarks

This pattern is **NOT** a replacement for the current connection pattern. It is an additional one and kernels will have to implement both of them to be conformant to the Juyter Kernel Protocol specification. Which pattern should be used for the connection if decided by the kernel launcher, depending on the information passed in the initial connection file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This pattern is **NOT** a replacement for the current connection pattern. It is an additional one and kernels will have to implement both of them to be conformant to the Juyter Kernel Protocol specification. Which pattern should be used for the connection if decided by the kernel launcher, depending on the information passed in the initial connection file.
This pattern is **NOT** a replacement for the current connection pattern. It is an additional one and kernels will have to implement both of them to be conformant to the Jupyter Kernel Protocol specification. Which pattern should be used for the connection is decided by the kernel launcher, depending on the information passed in the initial connection file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we had #92 to indicate support, rather than purely a protocol version number requiring both implementations.


Although this enhancement requires changing all the existing kernels, the impact should be limited. Indeed, most of the kernels are based on the kernel wrapper approach, or on xeus.

Most of the clients are based on `jupyter_client`. Therefore, the changes should only be limited to this repository or external kernel provisioners.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Most of the clients are based on `jupyter_client`. Therefore, the changes should only be limited to this repository or external kernel provisioners.
Most of the clients are based on `jupyter_client`. Therefore, the changes should only be limited to this repository or external kernel provisioners and additional client libraries, e.g. in other languages.

@JohanMabille
Copy link
Member Author

JohanMabille commented May 25, 2023

I think #92 would be a better tool to indicate feature support, rather than making it unconditional on protocol version number. But I'm okay without it, since it hasn't happened yet.

I don't think we want this connection pattern to be optional since it's fixing a "bug" in the protocol. For me, optional features should be reserved for:

  • features really hard to implement (if not impossible because of non existing dependencies), such as a debugger
  • protocol extensions that might not be relevant for every kernel

Besides, the Optional features (at least in the current proposition) are discovered through the kernel_info_request / kernel_info_reply messages, so after the client has connected to the kernel (but I agree that we could rephrase the proposal to add the optional features to the kernelspec instead of the kernel_info message).

@minrk
Copy link
Member

minrk commented May 25, 2023

I don't think we want this connection pattern to be optional since it's fixing a "bug" in the protocol.

Fair enough.

@echarles
Copy link
Member

echarles commented Jun 11, 2023

I discussed this JEP with @JohanMabille during JupyterCon and I said I will free time to go into more details. I am planning to do that next week and will report findings ASAP.

@Zsailer
Copy link
Member

Zsailer commented Jun 12, 2023

I discussed this JEP with @JohanMabille during JupyterCon and I said I will free time to go into more details. I am planning to do that next week and will report findings ASAP.

Thank you, @echarles! I appreciate you taking the time to review this more closely. This extremely valuable since you have more expertise than most in this area. ❤️

I think it's important to acknowledge that this PR has reached enough votes to be approved. I think we should still keep this vote open until after your review; unless you find a major glaring issue that would require us to revote, we can consider this approved (since it's been open for >2 years).

That said, any questions, comments, (small) concerns you have, Eric, it would be great to have them documented here so we can have written record on things to consider during implementation.

Assuming there is no major issues found, I propose we plan to merge this during next week's SSC meeting.


We propose to implement a handshaking pattern: the client lets the kernel find free ports and communicate them back via a dedicated socket. It then connects to the kernel. More formally:

- The kernel launcher is responsible for opening a dedicated socket for receiving connection information from kernels (channel ports). This socket will be referred as the **registration socket**.
Copy link
Member

@echarles echarles Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanMabille Would the port registration socket change on each kernel launch, or is it static and the socket remains the same for each kernel launch?

(in other words, do we have as many registration sockets as kernels, or only one registration socket for all the kernels)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two options are theoretically possible (thus the wording "The kernel should not expect the registration socket to exist after it has received the acknowledge receipt (i.e. it can be closed)" in the JEP), although I guess that in most of the implementations we would keep a single socket as long as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec should be as clear as possible. If both options are possible, it should be stated clearly in words. Further questions:

  1. In your many registration socket option, what is the added value of this JEP? The initial issue was the availability of the 5 ports. If a registration port needs to be found on each kernel launch, we don't win a lot and are still subject to failure.
  2. In your single registration socket option, what process is responsible to maintain that socket? Would it be some code in jupyter-client or will each of the "client" (jupyter-server, jupyter-console...) be responsible for that? (I except you answer to be "hey this is an implementation detail...", I hope it will not be)

Copy link
Member Author

@JohanMabille JohanMabille Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec should be as clear as possible. If both options are possible, it should be stated clearly in words.

This can be done in the PR to the doc of the protocol and in the protocol schema. Letting both options opened in the JEP should be enough.

In your many registration socket option, what is the added value of this JEP? The initial issue was the availability of the 5 ports. If a registration port needs to be found on each kernel launch, we don't win a lot and are still subject to failure.

The current pattern is:
1] the client finds free ports by opening sockets on them
2] the clients closes the sockets
3] The clients communicates these ports to the kenrel
4] the kernel tries to open sockets on these ports.

So the problem is that there is a delay between finding free ports and actually opening sockets for using them. With the handshake pattern, the situation is totally different, even in the "many registration socket option": the kernel finds the free ports by opening sockets on them and then does not close them.

In your single registration socket option, what process is responsible to maintain that socket? Would it be some code in jupyter-client or will each of the "client" (jupyter-server, jupyter-console...) be responsible for that? (I except you answer to be "hey this is an implementation detail...", I hope it will not be)

The process responsible for maintaining that socket is the process that is responsible for launching the kernel, whatever it is, since it is the one passing the registration socket info to the kernel upon launch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to write that registration socket lifetime should not be in the spec, as it's an implementation detail irrelevant to kernels, but then I thought about kernel restarts. Presumably restarting kernels need to re-register their (possibly new) sockets, so the registration socket should stay open (or reopen at the same port - not always available) IF kernel restarts are expected. For example:

The client MAY close the registration socket after completing a kernel's registration.
The client MAY use a single registration socket for multiple kernels.
To restart a kernel will require the registration socket again, so the client SHOULD keep the registration socket open while it expects restarts to be possible, OR open a new socket and pass the new registration socket URL to the new process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry, I missed this one. I will update the JEP accordingly.

### Remarks

This pattern is **NOT** a replacement for the current connection pattern. It is an additional one and kernels will have to implement both of them to be conformant to the Jupyter Kernel Protocol specification. Which pattern should be used for the connection is decided by the kernel launcher, depending on the information passed in the initial connection file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanMabille Do we need to specify that coexisting launcher and kernels not supporting both patterns should be possible. e.g. old launcher (single-mode) should be able to launch new kernels (dual-mode)

Similarly, how would.a dual-mode launcher know if the kernel is single- or dual-mode? Is there a "capabilities" JEP for that. I guess it is mandatory for the launcher to know if the kernel support the new handshaking pattern.

Copy link
Member Author

@JohanMabille JohanMabille Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in the JEP:

The kernel specifies whether it supports the handshake pattern via the "kernel_protocol_version" field in the kernelspec:

  • if the field is missing, or if its value if less than 5.5, the kernel supports passing ports only.
  • if the field value is >=5.5, the kernel supports both mechanisms.

So old launchers can start new kernels, and new launchers will still be able to start old kernels. Reading the kernelspec is enough to know whether a kernel supports both mechanisms or the older one only.

@minrk
Copy link
Member

minrk commented Jun 15, 2023

I just had another thought - how does this interact with things like jupyter qtconsole --existing? Other frontends need to be able to discover and connect to kernels, but after this they have no access to that information. The connection file doesn't provide enough information to connect or retrieve the connection info. Maybe the kernel SHOULD write out its 'real' connection info to a new connection file?

@JohanMabille
Copy link
Member Author

Maybe the kernel SHOULD write out its 'real' connection info to a new connection file?

Good catch, I think we discussed this at some point, but this is definitely missing in the current wording of the JEP, I'll update it.

@echarles
Copy link
Member

  • I am -1 as the level of details of this JEP is way too sparse to avoid confusion and lacks context as definitions. More percolation sounds like it is still needed based important aspects were still surfacing those last days and were not taken into account before.
  • I am -1 as security impacts and paragdim change are simply negated (the kernel calling back the client). In case of jupyter console from a laptop with a remote kernel, this JEP enforces a mandatory connection from the xerver to the laptop. The server being potential anywhere, the firewalls (corporate or laptop) have now to be wide open. An important paragraph about that is missing.
  • I am -1 about the discussion form as many questions have been answered as "this is an implementation detail", or simply by copy/pasting the sentence of the JEP which was questioned without further context and deeper answer. This is not the level of quality and inclusion I was excepting from a collaborative technical discussion. This attitude is just perfect to silent people and generate incomplete and bad JEPs.

This gives me -3, hence my negative vote on this JEP.


- The kernel launcher is responsible for opening a dedicated socket for receiving connection information from kernels (channel ports). This socket will be referred as the **registration socket**.
- When starting a new kernel, the launcher passes the connection information for this socket to the kernel.
- The kernel starts, finds free ports to bind the shell, control, stdin, heartbeat and iopub sockets. It then connects to the registration socket and sends the connection information to the registration socket.
Copy link
Member

@martinRenou martinRenou Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The kernel starts, finds free ports to bind the shell, control, stdin, heartbeat and iopub sockets. It then connects to the registration socket and sends the connection information to the registration socket.
- The kernel starts, finds free ports to bind the shell, control, stdin, heartbeat and iopub sockets. The connection ports information will be referred as the **connection information**. It then connects to the registration socket and sends the connection information to the kernel launcher through that socket.

Copy link
Member

@martinRenou martinRenou Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we formally define the socket message format for the ports information? Or is it irrelevant to a JEP?

Copy link
Member Author

@JohanMabille JohanMabille Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the length of the discussion here, it has been decided to discuss the format of the messages in the PR updating the protocol (similarly to what was done when adding the debugger, where we described the global approach in the JEP, and discussed the details in the PR to the protocol).

@echarles
Copy link
Member

I have joined the SSC call yesterday call and we have discussed this issue. I will try to summarise my point-of-vue as most of US participants were on holiday yesterday (this just my perspective and does not represent any consensus, as we have not found any).

  • There is a well-defined process to vote on the JEPs, enough time has been given to collect the vote on this, so it is OK to just proceed and take a decision today based on the information of the above thread.
  • I said I find the usage of "umbrella words" (I said yesterday "buzz words", maybe "fancy words" is ad-hoc qualification also) like "abstraction layer", "technical details" misleading and preventing a sane an open technical discussion. Those words throw fog to readers and are shortening the real deep discussion. I can say from this thread that "a" discussion has happened, but has been "cut" by usage of such wordings.
  • An example of ambiguity I gave is the notion of "launcher" - In case of remote kernel, are we talking about the local or remote launcher.
  • Talking about the kernel callback The gist of my issue with the JEP is that making official the callback will be hard to remove it after assuming there is a technical solution that does not require it.
  • Answering Sylvain's question "what would make me happy": (1) Brainstorm to find a technical solution that would not need a socket callback as well for the local and remote kernels (this may not be possible, but if we do not try, we will never know). This is equivalent to putting everything back on the table as Sylvain said. I believe this is the thing to do instead of having baby steps going to a potentially wrong direction (2) Formalize any outcome in more details in a JEP (I gave the Internet RFC as example, although we don't need to go so far).

I hope this gives a bit more perspective to my negative vote.

@fcollonval
Copy link
Contributor

The vote is now closed with the results:

In favor: 7 (Itay, Paul, Johan, Min, Rick, Sylvain, Zach)
Against: 1 (Éric)
Abstention: 1 (Frédéric)
No vote: 2 (Isabela, Carol)

--> In light of those results, this JEP is accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.