-
-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel frequently restarts during sessions #1037
Comments
This still occurs unpredictably. One day it will work and another day it will not. I can't even clarify the differences in contexts where it works and where it does not. This feels like some sort of memory leak or hand-shaking failure with ZMQ on the Julia side. there are two obvious immediate pre-cursors: either a cell appears to be running forever and probably Jupyter terminates the kernel for exceeding some time threshhold for a reply or the kernel simply restarts. Would the log messages from the terminal session where Jupyter starts be helpful. Here are two suspicious looking things:
And
Could the mismatch in protocol versions be causing this? |
See the troubleshooting section of the manual. I haven’t seen this problem myself. |
I added a message that extending Jupyter's timeout for kernels to 20 seconds seemed to solve the problem when attempting to run Plots for the first time. I am sorry to report that this no longer works. It would appear that success or failure is very nearly random. The only solution is to reboot my Mac. Since the process relies on some version of ZeroMQ, many memory-related issues are likely to be the culprit. I still suspect some kind of memory leak. Try to use Plots and see if that doesn't cause the kernel to die for you while Plots spins compiling for the first time. I'll soon be switching away from Plots (much as I truly like it) and will see if it is the culprit, but it is the only thing that seems to kill the Julia kernel in otherwise quite elaborate notebooks. |
So, the problem appears to be memory usage. Either Julia itself, IJulia and Jupyter appears to have a massive memory leak.
So, it turns out that Jupyter itself sets a memory limit called NotebookApp.max_buffer_size. The default is 536870912 bytes or around 537-ish mbytes or half a Gig. While Julia shows 983-ish at rest, not all of that counts against the Jupyter notebook--only the data that gets created (and the space to hold code). I don't know how much Julia with packages consumes at rest, but let's say it's around 650 mbytes. That means I have consumed around 330 mbytes. Not clear how when varinfo() reports much less. But, there could be much higher transient usage as my code has lots of allocations, especially when preparing a plot. I'll look at that... ...and see what dumb thing I am doing. For the meantime, I'll increase Jupyter's limit and report back. |
Last reporrt for a while. I gave Jupyter 1G of memory. Kernel stays running longer. Ran my code several times. Memory usage grows. From the first run to the final run jupyter resource usage increases by 470 mbytes. I run Sys.free_memory() a couple of times and it reports free memory reduced by over 500 mbytes. So, both are in the same ballpark. I will build in some diagnostics into the notebook to track this more carefully. I will also run the same sequence (easy because I use Jupytext so I have a Julia script for the notebook) in the REPL and see how free memory usage changes. Granted there can be other things happening behind my back (like email coming in). I doubt that accounts for much but I'll make sure to stop all the obvious apps within my control. Comparing the notebook runs and the REPL runs will (maybe) highlight if Julia has the memory leak (doubtful methinks but of course the REPL doesn't have an arbitrary ceiling like Jupyter notebook) or the combination of IJulia/notebook. |
Well, since I started this I am going to close it. The trouble seems almost random. I thought allowing a kernel a longer timeout would fix it. Sometimes, yes; sometimes, no. Then I thought it was a memory problem. So, I gave Jupyter a very large memory buffer. Sometimes, yes; sometimes, no. Then, I add Sys.free_memory calls to the notebook to see what was happening to available memory--could there be a memory leak. Well, this is not such a good way to find real free memory on Macos. Macos is sort of greedy at grabbing memory to cache stuff it thinks it may need to reload. But, it will release that cached file memory as needed. But, queries to free memory may or may not show that depending on how the query runs. Plus, there are constant crazy things happening in background. And regardless of the amount of free memory sometimes the notebook ran; sometimes, no. Then I thought Plots to GR was the problem (because the kernel almost always hung on running a function that created a plot). So, I switched to the plotly backend thinking that might reduce the load/compile time, the memory footprint, or simply be more reliable on Apple Silicon. Well, the sort of worked for a while. Then it didn't. Then the cell where I run the simulation started hanging the kernel. That was a first--it hadn't happened in months and months. So, all this is so arbitrary and inconsistent there is really no way anyone can diagnose it. I think the problem is a mix of Apple Silicon builds and Intel builds. Julia "beta" 1.7.2 offered a native Apple Silicon build. But maybe not some components of Jupyter even though there is now a beta of Python 3.x that is native. What about zeromq? So, I threw in the towel and went back to Intel builds for Macos for Julia--using stable release 1.7.3. I deleted my install of Julia and all the packages and started over. Now, I can run all of the cells of the notebook over and over. Never hangs. So, that is the price of the "bleeding edge". There will be hard things to debug. Not sure what the best tool would be: probably Valgrind. So, closing this. |
This has been reported many times and usually attributed to configuration errors.
Let me describe:
~/Library/Jupyter/kernels/julia-4-threads-1.7
and kernel.json is
The connection between jupyterlab (really notebooks) is frequently lost. This outcome is highly inconsistent. Sometimes it will run for hours. Sometimes, it fails the first time it is used.
Basically, I think the inmemory pipe from the notebook process to Julia fails to make a successful connection after a few (3-4) invocations. Hunches:
jupyter lab
from the command line. This seemed to work once and then this, too, experienced the same failure.using IJulia; jupyterlab()
with NO instance of VS Code running.I suspect the investigation will need to be done from the Julia side and might take core debugging, which is a bear, since it's unlikely Python folks will give this much concern though perhaps the Jupyter maintainers will be more supportive.
It is tempting to say it must be my Julia code. It is a very big package with a moderate amount of data. But, as suspect as much of my code is, I can run everything in a terminal session over and over with no hangups at all.
In fact, my current approach--to get some benefit from notebooks--is to run a jupytext .jl script file based on the notebook using VS Code's "cell by cell" for example:
This works ok using shift-option-enter to step through executing cells. I first setup an external Julia REPL because the internal Julia REPL can encounter problems with VS Code's internal terminal. (Really do like VS Code--best game in town--but it takes them a while to get new features fully stabilized.)
It would be really nice to get this sorted out and I'll help with my limited skills (in low-level coding). Notebooks are helpful for sequential testing of code sequences (to see intermediate results), to document results, and to document "how-to's".
And, no, Pluto is not the answer. I've tried and it's still a bit too simplistic and imposes too many restrictions--chiefly one variable outcome per cell--for it to be practical. I also think 'Pure Julia' is not always the best answer. It is good that some tools can enable multiple language implementations--this is to be encouraged.
The text was updated successfully, but these errors were encountered: