Prevent two kernels to have the same ports #490

martinRenou · 2019-10-10T07:04:43Z

Prevent two kernels to have the same ports assigned in MultiKernelManager.

This is a workaround for #487. And it should prevent voila-dashboards/voila#408 most of the times.

This is only a workaround, and it will not prevent ZMQError 100% of the time. Because it's only an in-memory watchdog it cannot prevent another application to take those ports.

martinRenou · 2019-10-10T07:06:09Z

cc. @JohanMabille @SylvainCorlay @maartenbreddels @vidartf @jtpio @timkpaine

maartenbreddels · 2019-10-10T08:34:25Z

Looks good!

PS: FYI, it does not solve the 'notebook server restart with open notebooks' issue, the kernels do not seem to restart property. Note that this PR was no attempt to solve this.

minrk

Great! Only comment about not affecting the ipc case (which also should have no race condition issues), but otherwise +1.

jupyter_client/multikernelmanager.py

jupyter_client/tests/test_multikernelmanager.py

martinRenou · 2019-10-10T10:52:03Z

Thanks for your review @minrk !

Prevent two kernels to have the same ports assigned in MultiKernelManager

SylvainCorlay · 2019-10-10T12:59:23Z

This looks good to me!

kevin-bates · 2019-10-10T13:54:37Z

Hold on! This PR was submitted 7 hours ago and merged after 6 hours!

At a minimum this needs to be configurable to not occur. In addition, kernel manager's don't necessarily know (nor should they) if the kernel they are managing is local or remote (where this "workaround" doesn't apply)! As a result, this configurability needs to be dynamic in the sense that the decision as to whether or not locate/register the ports should be performed on each instance. Ideally this code should move into the Kernel Manager or as the Kernel Manager whether it should be done at all. Same goes for the shutdown case.

maartenbreddels · 2019-10-10T14:14:07Z

Note that this PR does not implement the idea mentioned in #487, it only avoids the issue of using port numbers multiple times, this is morely a bugfix right?

SylvainCorlay · 2019-10-10T14:20:57Z

This PR is merely a workaround and not implementing the new proposed handshake mechanism.

kevin-bates · 2019-10-10T14:29:11Z

That's not the point. This breaks clients of jupyter_client and needs to provide a means for subclasses that don't want this behavior to not have it.

martinRenou · 2019-10-10T14:35:57Z

This breaks clients of jupyter_client

How does it break it? I understand this does make sense in the case of a remote kernel, but it does not seem to me that the behavior changes much.

kevin-bates · 2019-10-10T15:43:25Z

I am referring to remote kernels.

So if I'm reading this correctly, this will unconditionally create five local ports when the kernel manager instance is created and register those ports with the MultiKernelManager. Then, when the connection information is returned from the remote kernel, those port-based members will be set relative to the remote ports (performed directly or indirectly via the kernel manager instance).

On shutdown, none of the originally allocated ports will be removed (nor closed) because the port is relative to the remote kernel, and, again if I understand correctly, SO_LINGER doesn't timeout. In addition, if the remote port happens to coincide with some other kernel's actual port, that port will be removed from the managed set - although because it's still in use by the other kernel, probably not a big deal.

So I guess the only issue is that 5 local ports are leaked for every remote kernel instance until the number of remote kernels started exhausts the ports on the local server and/or the server hangs seemingly indefinitely because the infinite loop takes forever to locate "unused" ports.

I hope I have something wrong.

martinRenou · 2019-10-10T15:54:56Z

Those 5 sockets are temporary, they are only used for finding an available port on the machine where jupyter_client is running. Those sockets are closed directly, see https://github.com/jupyter/jupyter_client/pull/490/files#diff-3251e04c5627ebe6b4cc54ddacb4bfa8R98.

This is the exact same behavior as before when jupyter_client creates 5 temporary sockets when creating the connection file: https://github.com/jupyter/jupyter_client/blob/master/jupyter_client/connect.py#L96-L106.

This behavior is definitely useless in the case when using a remote kernel. But it should not be a problem though. The only time it will be a problem is when the machine on which jupyter_client is running is out of available ports, but I guess it's unlikely to happen?

kevin-bates · 2019-10-10T16:14:07Z

@martinRenou - thank you for the explanation. I guess I wasn't following the temporal nature of things.

I guess the issue would be that because the set is not reflective of actual in-use ports, the while loop may eventually not find available ports because of the false reporting that the port it did find is still listed in the set. Not sure how "random" the use of bind(..,0) is, but if it finds the same ports (because they are not physically in use) and those ports have been "leaked" in the set, then the loop will not terminate.

martinRenou · 2019-10-10T16:34:13Z

Not sure how "random" the use of bind(..,0) is

What I understand from the socket module documentation is that it is OS-dependent: If host or port are ‘’ or 0 respectively the OS default behavior will be used., see https://docs.python.org/3/library/socket.html#socket.create_connection. So I guess we cannot really make any conclusions here.

The probability to be stuck in an infinite loop is not 0, I agree. But I suppose it is unlikely to happen because it does not happen that much that the port you look for is already taken. It happens enough to be annoying (the kernel does not start), but not enough to get stuck infinitely in this loop.

kevin-bates · 2019-10-10T16:49:11Z

Ok. Enterprise Gateway is long-running and multi-tenant - the likelyhood increases over the single-user server scenario.

If you could make this optional on a per-instance basis - that would be greatly appreciated.

SylvainCorlay · 2019-10-10T16:54:11Z

I guess the issue would be that because the set is not reflective of actual in-use ports, the while loop may eventually not find available ports because of the false reporting that the port it did find is still listed in the set.

The clean solution to this is the handshake mechanism proposed by @JohanMabille in #487, which is also a common pattern for this problem. Since the clean solution would be a major change, we went with this workaroudn instead.

I am not worried about the infinite loop because this behavior was already in connect.py as mentioned by Martin.

Ok. Enterprise Gateway is long-running and multi-tenant - the likelyhood increases over the single-user server scenario.
If you could make this optional on a per-instance basis - that would be greatly appreciated.

In the contrary, if your usage scenario has more load and more likelihood of people stepping on each other toes, you probably want this fix even more. Note that the previous behavior causes ZMQ failures almost all the time when creating 5 kernels at once.

kevin-bates · 2019-10-10T17:04:08Z

Thanks @SylvainCorlay. Because we already do what #487 is proposing we don't need this workaround. In fact, we have clients that run our "remote" model in loopback mode just to avoid the ZMQ failure issue.

SylvainCorlay · 2019-10-14T16:09:35Z

@kevin-bates we will be adding a flag so that the new behavior can be disabled before the next release.

kevin-bates · 2019-10-14T16:33:27Z

@SylvainCorlay - thank you.

It would be great if that flag is per-instance. For example, if the action to call self._find_available_port is a function of an attribute in the kernel manager (something other than transport). That way, overrides of KernelManager can influence this behavior depending on whether its a remote kernel or not, etc.

SylvainCorlay · 2019-10-14T16:42:56Z

It would be great if that flag is per-instance

I presume a trait would be fine?

kevin-bates · 2019-10-14T17:08:44Z

Yes - thank you.

martinRenou · 2019-10-15T08:08:12Z

In terms of code design, it does not seem right to me to put a flag in the KernelManager class, because it's logic only relevant when using the MultiKernelManager.

@kevin-bates why do you want a per-instance flag? Why is one single flag in the MultiKernelManager not enough? Do you have use cases when you want to cache ports for some kernel managers but not for others?

SylvainCorlay · 2019-10-15T08:39:54Z

In terms of code design, it does not seem right to me to put a flag in the KernelManager class, because it's logic only relevant when using the MultiKernelManager .

My understanding was that @kevin-bates wants the trait in MultiKernelManager, which makes sense.

martinRenou · 2019-10-15T09:20:18Z

I opened a PR #492

kevin-bates · 2019-10-15T13:55:11Z

I've submitted my responses via a review on #492 - thank you for including me on that.

meeseeksmachine · 2020-02-24T05:12:07Z

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/jupyter-client-6-0-0/3414/1

minrk reviewed Oct 10, 2019

View reviewed changes

jupyter_client/multikernelmanager.py Outdated Show resolved Hide resolved

jupyter_client/tests/test_multikernelmanager.py Outdated Show resolved Hide resolved

minrk mentioned this pull request Oct 10, 2019

Spawning many kernels may result in ZMQError #487

Open

Prevent two kernels to have the same ports

3279597

Prevent two kernels to have the same ports assigned in MultiKernelManager

SylvainCorlay merged commit f6ceeaa into jupyter:master Oct 10, 2019

martinRenou deleted the workaround_port_issue branch October 10, 2019 13:02

JohanMabille mentioned this pull request Jan 5, 2021

Kernel handshaking pattern proposal jupyter/enhancement-proposals#66

Merged

33 tasks

dangom mentioned this pull request Mar 19, 2021

Assertion failed: (process-live-p process) emacs-jupyter/jupyter#316

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent two kernels to have the same ports #490

Prevent two kernels to have the same ports #490

martinRenou commented Oct 10, 2019

martinRenou commented Oct 10, 2019

maartenbreddels commented Oct 10, 2019

minrk left a comment

martinRenou commented Oct 10, 2019

SylvainCorlay commented Oct 10, 2019

kevin-bates commented Oct 10, 2019

maartenbreddels commented Oct 10, 2019

SylvainCorlay commented Oct 10, 2019

kevin-bates commented Oct 10, 2019

martinRenou commented Oct 10, 2019

kevin-bates commented Oct 10, 2019

martinRenou commented Oct 10, 2019

kevin-bates commented Oct 10, 2019

martinRenou commented Oct 10, 2019 •

edited

Loading

kevin-bates commented Oct 10, 2019

SylvainCorlay commented Oct 10, 2019

kevin-bates commented Oct 10, 2019 •

edited

Loading

SylvainCorlay commented Oct 14, 2019

kevin-bates commented Oct 14, 2019

SylvainCorlay commented Oct 14, 2019

kevin-bates commented Oct 14, 2019

martinRenou commented Oct 15, 2019

SylvainCorlay commented Oct 15, 2019

martinRenou commented Oct 15, 2019

kevin-bates commented Oct 15, 2019

meeseeksmachine commented Feb 24, 2020

Prevent two kernels to have the same ports #490

Prevent two kernels to have the same ports #490

Conversation

martinRenou commented Oct 10, 2019

martinRenou commented Oct 10, 2019

maartenbreddels commented Oct 10, 2019

minrk left a comment

Choose a reason for hiding this comment

martinRenou commented Oct 10, 2019

SylvainCorlay commented Oct 10, 2019

kevin-bates commented Oct 10, 2019

maartenbreddels commented Oct 10, 2019

SylvainCorlay commented Oct 10, 2019

kevin-bates commented Oct 10, 2019

martinRenou commented Oct 10, 2019

kevin-bates commented Oct 10, 2019

martinRenou commented Oct 10, 2019

kevin-bates commented Oct 10, 2019

martinRenou commented Oct 10, 2019 • edited Loading

kevin-bates commented Oct 10, 2019

SylvainCorlay commented Oct 10, 2019

kevin-bates commented Oct 10, 2019 • edited Loading

SylvainCorlay commented Oct 14, 2019

kevin-bates commented Oct 14, 2019

SylvainCorlay commented Oct 14, 2019

kevin-bates commented Oct 14, 2019

martinRenou commented Oct 15, 2019

SylvainCorlay commented Oct 15, 2019

martinRenou commented Oct 15, 2019

kevin-bates commented Oct 15, 2019

meeseeksmachine commented Feb 24, 2020

martinRenou commented Oct 10, 2019 •

edited

Loading

kevin-bates commented Oct 10, 2019 •

edited

Loading