-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Shared memory for NBLAST #87
base: master
Are you sure you want to change the base?
Conversation
- creates scores array from shared memory buffer - fills array with scores
Could we use a SharedMemoryManager to handle the cleanup? The context manager form should make it fairly convenient. Doing a single copy of the shared numpy array at the end (to get it out of the shared buffer before it's unlinked) should be a negligible overhead. I'd probably write a wrapper class around that with a |
Yes context manager + making a copy of the array would solve that issue. I was thinking mostly of scenarios where the NBLAST score matrix is almost larger than the available memory in which case a copy might tip it over the edge. Btw: here is a benchmark of NBLASTing the 25k hemibrain neurons against each other. I did cheat by only running a small fraction of the pairwise comparisons (hence this test finished after ~5mins) but the array sizes should have been the same as in a real NBLAST. Not sure how much I trust the profiler though: these drops in memory consumption are rather odd - are those spin dumps? |
If you don't have the memory to copy the score matrix in memory then you can just do whatever you want to it inside the context manager, I suppose. If you're that tight on memory then you definitely want to be able to manually free the memory (by jumping out of a context manager) rather than waiting for GC to get it in |
This looks like it would be a good fit: https://pypi.org/project/shared-ndarray2/ , but it's py3.8+ only. Those memory drops might be the garbage collection running, I guess? |
You could also avoid the copies all being done at the end by writing them in to the dataframe as they come out: scores = pd.DataFrame(np.zeros((len(query_dps), len(target_dps)),
dtype=this.dtype),
index=query_dps.id, columns=target_dps.id)
with ProcessPoolExecutor(max_workers=n_cores) as pool:
# Each nblaster is passed to its own process
futures = [pool.submit(this.multi_query_target,
q_idx=this.queries,
t_idx=this.targets,
scores=scores) for this in nblasters]
for f in concurrent.futures.as_completed(futures):
res = f.result()
scores.loc[res.index, res.columns] = res.values I have a suspicion |
Had a look at |
I've voiced my opinions on bumping the minimum version 😅 It doesn't break existing versions of navis, which people are welcome to keep using on their old python versions (this can be noted in the docs); it just means that new improvements and features in navis can make use of new improvements and features in python, rather than the library having to work around limitations which have already been solved in the language. Most of the scientific ecosystem does seem to follow NEP29 so it's not an outlandish stance for navis to take. |
For what it's worth: the main hold up was reticulate catching up and it is now at least using 3.8 - I think they also made it easy to install newer versions with e.g. pyenv. Happy to bump to 3.8 but to be honest except for the shared memory, 3.8 doesn't really add much :D |
Just raised a PR on reticulate rstudio/reticulate#1185 which (I think) explicitly runs tests against python 3.8, 3.9, and 3.10; passes for me https://github.com/clbarnes/reticulate/actions/runs/2090867132 ! |
Nice! Wait... so what are they running their tests against now? |
Seems to be whatever python happens to be lying around in the CI environment. Fortunately a certain subset of the python ABI doesn't change very quickly so I wasn't expecting any issues with the newer versions, but it just didn't look like they were tested explicitly. |
Reticulate now explicitly tests against 3.8-10! |
Note to self: we might be better off using Minimal example: >>> import os
>>> import psutil
>>> import ctypes
>>> import numpy as np
>>> from multiprocessing.sharedctypes import RawArray
>>> from multiprocessing import Process
>>> # Initialize the RawArray
>>> ARRAY_SHAPE = (10_000, 10_000)
>>> ARRAY_SIZE = int(np.prod(ARRAY_SHAPE))
>>> shared_array = RawArray(ctypes.c_double, ARRAY_SIZE)
>>> # Wrap shared array as numpy array in parent thread
>>> shared_array_np = np.ndarray(ARRAY_SHAPE, dtype=np.float64, buffer=shared_array)
>>> def test_write(buff):
>>> shared_array = np.ndarray(ARRAY_SHAPE, dtype=np.float64, buffer=buff)
>>> shared_array[0, :10] = 1 # write a few values
>>> # Pass the shared array to the child process
>>> p = Process(target=test_write, args=(shared_array,))
>>> p.run()
>>> del p # delete process to remove reference to shared_array
>>> print(shared_array_np)
array([[1., 1., 1., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])
>>> # Check memory footprint of parent process
>>> mem_info = psutil.Process(os.getpid()).memory_full_info()
>>> print(f"Resident Set Size: {mem_info.rss / 1e9:.2f}")
Resident Set Size: 0.84Gb
>>> # Importantly, we can delete the buffer variable and the numpy array is just fine
>>> del shared_array
>>> import gc
>>> gc.collect()
>>> print(shared_array_np)
array([[1., 1., 1., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])
>>> # Memory is also still reserved
>>> mem_info = psutil.Process(os.getpid()).memory_full_info()
>>> print(f"Resident Set Size: {mem_info.rss / 1e9:.2f}")
Resident Set Size: 0.84Gb
>>> # Only after actually deleting the `shared_array_np` is the memory released
>>> del shared_array_np
>>> mem_info = psutil.Process(os.getpid()).memory_full_info()
>>> print(f"Resident Set Size: {mem_info.rss / 1e9:.2f}")
Resident Set Size: 0.04Gb Note: when running the above in iPython/Jupyter, memory might not be released right away because of a reference to the array/buffer in the history (check |
With larger NBLASTs, memory consumption becomes an issue. During the NBLAST each child process holds only the scores it works on in memory. On completion, however, that data has to be sent to the parent process which takes time and causes a memory spike (I'm pretty sure it makes a copy). Subsequently, we need to stitch the scores into one big matrix which again takes time and memory.
This PR is work in progress but the basic idea is this:
multiprocessing.shared_memory.SharedMemory
to reserve some memoryAdvantages:
nblast_smart
)Disadvantages:
The memory management is a bit more annoying because unlike your run-of-the-mill numpy array the
SharedMemory
needs to be explicitly released by callingSharedMemory.close()
. If one doesn't do this before exiting the session, there is aUserWarning
. For now I solved this by hooking intoatexit
but I'm pretty sure that if the user simply deletes the array/dataframe (i.e. something likedel scores
) the memory is not freed. That bit needs a bit of exploration - perhaps we can implement some custom__del__
method or something.I have so far only implemented this for
navis.nblast
- all other functions in this branch still use the old approach. Some quick and dirty benchmarks show some decent speed up, in particular for large NBLASTs. Haven't done proper memory profiling yet.@clbarnes any thoughts on this? In particular on the somewhat opaque issue with memory not being freed.
Code reviews welcome (the whole NBLAST module is becoming fairly complex)!