Zombie Process Fix #34

MSeal · 2020-03-09T00:00:58Z

This PR (probably) fixes both the dead / zombie process issues that come up when calling nbclient in nested subprocesses as well as the memory leak when running nbclient repeatidely in the same process (might fix nteract/papermill#478). These were both being caused by zmq processes not being cleaned up, which the change in moving to nbclient left around by not deleting the kernel objects after execution. The cleanup process has been made much more robust to a number of failure modes and will attempt a direct kill call on the child process if graceful shutdown is not achieved.

Also fixed a number of async issues / warnings that were cropping up. Downstream libraries were relying on some functions being synchronous, so I renamed a number of functions to async_ and used loop calls to avoid compatibility issues for the next release.

Most of these were really hard to write a meaningful unittest. I made some notebooks that recreated complex setups that caused process issues (or emitted a warning that would do so in some circumstances) and cleaned things up until warnings and hanging processes were no long around. I don't think there's an easy way to encode that into unittests unfortunately.

Needed for moving forward with #31
Fixes runtime issues, but not all test warnings for #33

davidbrochart · 2020-03-09T08:17:19Z

nbclient/client.py

+                try:
+                    # For AsyncKernelManager
+                    loop.run_until_complete(self.km.shutdown_kernel(now=True))
+                except TypeError:
+                    self.km.shutdown_kernel(now=True)


Should we check if self.km.shutdown_kernel is a coroutine, instead of using try/except?
Also, we might want to have an async version of _cleanup_kernel, which is use in blocking and async functions.

I can change to that pattern if you prefer.

davidbrochart · 2020-03-09T08:18:48Z

nbclient/client.py

@@ -330,22 +380,47 @@ def start_kernel_manager(self):
        if self.km.ipykernel and self.ipython_hist_file:
            self.extra_arguments += ['--HistoryManager.hist_file={}'.format(self.ipython_hist_file)]

-        self.km.start_kernel(extra_arguments=self.extra_arguments, **kwargs)
+        # Support AsyncKernelManager
+        if inspect.iscoroutinefunction(self.km.start_kernel):


We could also use asyncio.iscoroutinefunction, but I guess this is fine.

Based on https://stackoverflow.com/questions/36076619/test-if-function-or-method-is-normal-or-asynchronous the asyncio is probably a better choice to cover more cases. I'll change em over.

davidbrochart · 2020-03-09T08:20:15Z

nbclient/client.py


        self.kc = self.km.client()
        self.kc.start_channels()
        try:
            await self.kc.wait_for_ready(timeout=self.startup_timeout)
        except RuntimeError:
-            self.kc.stop_channels()
-            self.km.shutdown_kernel()
+            self._cleanup_kernel()


Here we might use an async version of _cleanup_kernel, since we are in an async function.

👍 will do

davidbrochart · 2020-03-09T08:21:30Z

nbclient/client.py

        try:
            await yield_(None)  # would just yield in python >3.5
        finally:
-            self.kc.stop_channels()
-            self.kc = None
+            self._cleanup_kernel()


Here we might use an async version of _cleanup_kernel, since we are in an async function.

davidbrochart · 2020-03-09T08:25:53Z

nbclient/exceptions.py

+    """
+    A custom exception used to indicate that the exception is used for cell
+    control actions (not the best model, but it's needed to cover existing
+    behvior without major refactors).


Typo behvior -> behavior.

davidbrochart · 2020-03-09T08:28:15Z

nbclient/client.py

+            self.async_wait_for_reply(msg_id, cell=cell)
+        )
+
+    async def async_wait_for_reply(self, msg_id, cell=None):


Should it be _async_wait_for_reply for name consistency with _wait_for_reply?

I was thinking of exposing async_wait_for_reply as not private so it could be used in contracts for extensibility since papermill ended up needing the blocking verison to extend behavior. If I'm doing that I should probably commit and make wait_for_reply, or revert the decision overall and keep with _async_wait_for_reply.

davidbrochart · 2020-03-09T08:31:47Z

Thanks for this PR @MSeal.
I am wondering if we should have a blocking and an asynchronous directory, as it is for jupyter_client, to clearly separate the two implementations?

MSeal · 2020-03-10T04:57:15Z

Thanks for this PR @MSeal.
I am wondering if we should have a blocking and an asynchronous directory, as it is for jupyter_client, to clearly separate the two implementations?

We could do that (or rather have two files, asyncclient.py and client.py would be simple enough given the lack of directory structure here). Only downside is that we may end up repeating some logic that we're currently wrapping in run_until_complete. It would be a be cleaner and we could have two distinct classes that clearly separate async from blocking concerns... Might need a base class to hold common synchronous code patterns. What's you preference @davidbrochart ?

Also might be a couple days before responding on this PR fyi

davidbrochart · 2020-03-10T17:29:02Z

I've made some refactoring along this line in #37.

MSeal · 2020-03-19T00:20:41Z

Assuming #37 merges, do you want me to address separating the class into blocking / non-blocking versions or finish providing lower level interfaces in each to blocking and non-blocking functions to be used by downstream extensions still?

davidbrochart · 2020-03-19T10:55:18Z

I think what we now have in #37 with blocking methods using run_sync on async methods is fine.

davidbrochart · 2020-03-26T09:28:26Z

I believe this is blocking the release. Do you need help @MSeal ?

MSeal · 2020-03-26T18:56:29Z

I've got time to take a pass tonight on updating. Should be able to more quickly respond / finish up the release now that some other threads are resolved.

MSeal · 2020-03-27T04:10:29Z

See how those changes look. Had to do more than I expected unfortunately so the diff is longer

davidbrochart · 2020-03-27T07:32:11Z

nbclient/util.py

+async def await_or_block(func, *args, **kwargs):
+    """Awaits the function if it's an asynchronous function. Otherwise block
+    on execution.
+    """
+    if asyncio.iscoroutinefunction(func):
+        return await func(*args, **kwargs)
+    else:
+        result = func(*args, **kwargs)
+        # Mocks mask that the function is a coroutine :/
+        if isinstance(result, Coroutine):
+            return await result
+        return result


In jupyter_server, we will have something similar that we call ensure_async. We use it as a wrapper on a function call when we don't know if this function is async or not. I prefer that over passing the function object and its parameters to await_or_block and let it make the call. I will try it and maybe make a PR on your zombieProcFix branch.

I'm ok with that change, thanks for helping

Replace await_or_block with ensure_async

MSeal · 2020-03-30T20:12:09Z

I haven't gotten a chance to dig into the failures -- Looks like something is trying to await a non-async call for at least one of them.

davidbrochart · 2020-03-30T20:18:15Z

I sent a PR on your branch: MSeal#2

Zombie proc fix

MSeal · 2020-03-30T20:21:15Z

:doh: -- I added that fork as a watched repo so I see those changes in a timely manner going forward.

MSeal · 2020-03-30T20:49:41Z

I think we may need to drop 3.5 support if async keeps hanging with it :/. Same failures are occurring in the nbconvert port to use nbclient with python 3.5. End of life is around the corner 2020-09-13, maybe we should just pull the trigger now and drop it.

@captainsafia @willingc I checked some known images / VMs that were pinned to 3.5 and those I know about have moved the defaults forward to 3.6 or 3.7. Is there internals you know of that depend heavily on 3.5? If we drop support here / in jupyter_client (where I think the issue also manifesting) it would mean papermill, nbconvert, and scrapbook would have to follow to 3.6+.

captainsafia · 2020-03-31T02:47:01Z

Is there internals you know of that depend heavily on 3.5? If we drop support here / in jupyter_client (where I think the issue also manifesting) it would mean papermill, nbconvert, and scrapbook would have to follow to 3.6+.

Based on my inquiries, it seems like most people are on 3.6 already.

MSeal · 2020-03-31T17:27:22Z

I asked around a bit more to confirm the change would generally be accepted. I'm going to go forward with dropping 3.5 support so we don't mislead people into failing processes.

MSeal · 2020-03-31T17:55:44Z

@davidbrochart Good with a merge / release?

davidbrochart

Looks great, let's release!

davidbrochart · 2020-03-31T19:17:30Z

BTW, do you need help in the future for the releases?

MSeal · 2020-03-31T19:32:24Z

@davidbrochart That would be great. I'll get this one as PyPI permissions are more restrictive than github for jupyter project.

I would really like to have more python developers doing release management across the projects so I'm not a bottleneck. I'll get that conversation going about promoting PyPI permissions to more people with the core team.

MSeal · 2020-03-31T22:41:58Z

@davidbrochart Actually if you wanted to help with this one, we could use an announcement to discourse and the jupyter mailing list. You did the lion share of the async work and it'd be great to publicize the changes in the release there. Up to you, I can also write it up

davidbrochart · 2020-04-01T08:02:56Z

@MSeal sure, let me try.

MSeal requested review from choldgraf and davidbrochart March 9, 2020 00:00

MSeal mentioned this pull request Mar 9, 2020

adding a binder example #7

Merged

MSeal force-pushed the zombieProcFix branch from 83d18bf to f044a7b Compare March 9, 2020 00:12

davidbrochart reviewed Mar 9, 2020

View reviewed changes

MSeal mentioned this pull request Mar 19, 2020

Close notebook after execution nteract/papermill#479

Closed

Fixed a number of async issues and enhanced process cleanup

b3a9169

MSeal force-pushed the zombieProcFix branch from f044a7b to b3a9169 Compare March 27, 2020 04:01

Fixed tox issues

fb2285a

davidbrochart reviewed Mar 27, 2020

View reviewed changes

davidbrochart and others added 4 commits March 27, 2020 08:52

Replace await_or_block with ensure_async

ca653f1

Merge pull request #1 from davidbrochart/zombieProcFix

f502655

Replace await_or_block with ensure_async

Merge remote-tracking branch 'upstream/master' into zombieProcFix

a92f697

Replace is_alive mock with monkeypatch

0367a7d

Merge pull request #2 from davidbrochart/zombieProcFix

4d78e3c

Zombie proc fix

Fixed flake8 issue

2be7ee2

davidbrochart mentioned this pull request Mar 31, 2020

nbclient.client.setup_kernel context manager does not shutdown kernel on exit #42

Closed

Removed python 3.5 support

e91436b

MSeal mentioned this pull request Mar 31, 2020

Replaced execute preprocessor internals with nbclient calls jupyter/nbconvert#1211

Merged

davidbrochart approved these changes Mar 31, 2020

View reviewed changes

MSeal merged commit 2d05335 into jupyter:master Mar 31, 2020

davidbrochart mentioned this pull request Apr 1, 2020

Use jupyter_client's AsyncKernelManager jupyter-server/jupyter_server#191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zombie Process Fix #34

Zombie Process Fix #34

MSeal commented Mar 9, 2020

davidbrochart Mar 9, 2020

MSeal Mar 19, 2020

davidbrochart Mar 9, 2020

MSeal Mar 10, 2020

davidbrochart Mar 9, 2020

MSeal Mar 10, 2020

davidbrochart Mar 9, 2020

davidbrochart Mar 9, 2020

davidbrochart Mar 9, 2020

MSeal Mar 10, 2020

davidbrochart commented Mar 9, 2020

MSeal commented Mar 10, 2020

davidbrochart commented Mar 10, 2020

MSeal commented Mar 19, 2020

davidbrochart commented Mar 19, 2020

davidbrochart commented Mar 26, 2020

MSeal commented Mar 26, 2020

MSeal commented Mar 27, 2020

davidbrochart Mar 27, 2020 •

edited

Loading

MSeal Mar 27, 2020

MSeal commented Mar 30, 2020

davidbrochart commented Mar 30, 2020

MSeal commented Mar 30, 2020

MSeal commented Mar 30, 2020

captainsafia commented Mar 31, 2020

MSeal commented Mar 31, 2020

MSeal commented Mar 31, 2020

davidbrochart left a comment

davidbrochart commented Mar 31, 2020

MSeal commented Mar 31, 2020

MSeal commented Mar 31, 2020

davidbrochart commented Apr 1, 2020

Zombie Process Fix #34

Zombie Process Fix #34

Conversation

MSeal commented Mar 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidbrochart commented Mar 9, 2020

MSeal commented Mar 10, 2020

davidbrochart commented Mar 10, 2020

MSeal commented Mar 19, 2020

davidbrochart commented Mar 19, 2020

davidbrochart commented Mar 26, 2020

MSeal commented Mar 26, 2020

MSeal commented Mar 27, 2020

davidbrochart Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MSeal commented Mar 30, 2020

davidbrochart commented Mar 30, 2020

MSeal commented Mar 30, 2020

MSeal commented Mar 30, 2020

captainsafia commented Mar 31, 2020

MSeal commented Mar 31, 2020

MSeal commented Mar 31, 2020

davidbrochart left a comment

Choose a reason for hiding this comment

davidbrochart commented Mar 31, 2020

MSeal commented Mar 31, 2020

MSeal commented Mar 31, 2020

davidbrochart commented Apr 1, 2020

davidbrochart Mar 27, 2020 •

edited

Loading