-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent two kernels to have the same ports #490
Prevent two kernels to have the same ports #490
Conversation
Looks good! PS: FYI, it does not solve the 'notebook server restart with open notebooks' issue, the kernels do not seem to restart property. Note that this PR was no attempt to solve this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Only comment about not affecting the ipc case (which also should have no race condition issues), but otherwise +1.
Thanks for your review @minrk ! |
Prevent two kernels to have the same ports assigned in MultiKernelManager
This looks good to me! |
Hold on! This PR was submitted 7 hours ago and merged after 6 hours! At a minimum this needs to be configurable to not occur. In addition, kernel manager's don't necessarily know (nor should they) if the kernel they are managing is local or remote (where this "workaround" doesn't apply)! As a result, this configurability needs to be dynamic in the sense that the decision as to whether or not locate/register the ports should be performed on each instance. Ideally this code should move into the Kernel Manager or as the Kernel Manager whether it should be done at all. Same goes for the shutdown case. |
Note that this PR does not implement the idea mentioned in #487, it only avoids the issue of using port numbers multiple times, this is morely a bugfix right? |
This PR is merely a workaround and not implementing the new proposed handshake mechanism. |
That's not the point. This breaks clients of jupyter_client and needs to provide a means for subclasses that don't want this behavior to not have it. |
How does it break it? I understand this does make sense in the case of a remote kernel, but it does not seem to me that the behavior changes much. |
I am referring to remote kernels. So if I'm reading this correctly, this will unconditionally create five local ports when the kernel manager instance is created and register those ports with the MultiKernelManager. Then, when the connection information is returned from the remote kernel, those port-based members will be set relative to the remote ports (performed directly or indirectly via the kernel manager instance). On shutdown, none of the originally allocated ports will be removed (nor closed) because the port is relative to the remote kernel, and, again if I understand correctly, SO_LINGER doesn't timeout. In addition, if the remote port happens to coincide with some other kernel's actual port, that port will be removed from the managed set - although because it's still in use by the other kernel, probably not a big deal. So I guess the only issue is that 5 local ports are leaked for every remote kernel instance until the number of remote kernels started exhausts the ports on the local server and/or the server hangs seemingly indefinitely because the infinite loop takes forever to locate "unused" ports. I hope I have something wrong. |
Those 5 sockets are temporary, they are only used for finding an available port on the machine where This is the exact same behavior as before when This behavior is definitely useless in the case when using a remote kernel. But it should not be a problem though. The only time it will be a problem is when the machine on which |
@martinRenou - thank you for the explanation. I guess I wasn't following the temporal nature of things. I guess the issue would be that because the set is not reflective of actual in-use ports, the while loop may eventually not find available ports because of the false reporting that the port it did find is still listed in the set. Not sure how "random" the use of |
What I understand from the socket module documentation is that it is OS-dependent: The probability to be stuck in an infinite loop is not 0, I agree. But I suppose it is unlikely to happen because it does not happen that much that the port you look for is already taken. It happens enough to be annoying (the kernel does not start), but not enough to get stuck infinitely in this loop. |
Ok. Enterprise Gateway is long-running and multi-tenant - the likelyhood increases over the single-user server scenario. If you could make this optional on a per-instance basis - that would be greatly appreciated. |
The clean solution to this is the handshake mechanism proposed by @JohanMabille in #487, which is also a common pattern for this problem. Since the clean solution would be a major change, we went with this workaroudn instead. I am not worried about the infinite loop because this behavior was already in connect.py as mentioned by Martin.
In the contrary, if your usage scenario has more load and more likelihood of people stepping on each other toes, you probably want this fix even more. Note that the previous behavior causes ZMQ failures almost all the time when creating 5 kernels at once. |
Thanks @SylvainCorlay. Because we already do what #487 is proposing we don't need this workaround. In fact, we have clients that run our "remote" model in loopback mode just to avoid the ZMQ failure issue. |
@kevin-bates we will be adding a flag so that the new behavior can be disabled before the next release. |
@SylvainCorlay - thank you. It would be great if that flag is per-instance. For example, if the action to call |
I presume a trait would be fine? |
Yes - thank you. |
In terms of code design, it does not seem right to me to put a flag in the @kevin-bates why do you want a per-instance flag? Why is one single flag in the |
My understanding was that @kevin-bates wants the trait in MultiKernelManager, which makes sense. |
I opened a PR #492 |
I've submitted my responses via a review on #492 - thank you for including me on that. |
This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there: |
Prevent two kernels to have the same ports assigned in MultiKernelManager.
This is a workaround for #487. And it should prevent voila-dashboards/voila#408 most of the times.
This is only a workaround, and it will not prevent
ZMQError
100% of the time. Because it's only an in-memory watchdog it cannot prevent another application to take those ports.