-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory manager design (for eviction and garbage collection) #102
Comments
This can also happen at any internal state. It just means that the Python reference count has dropped to zero.
Number of Reserved Tasks that reference it the stronger necessary condition for safety, given the proposed model without the ability to change scheduling decisions and add dependencies to in-flight tasks.
I'm not sure what you mean by instance here. The PArray object will be alive at the scope that is spawning the tasks since it needs to be listed in IN/OUT/INOUT.
Let's only call evict when the next task wouldn't be able to fit.
The classical hashmap -> linked list LRU implementation for each device should be sufficient. |
maybe here you could only check SHARED parray first since that is what replicate usually be |
The reason why I mentioned the state was b/c the PArray runtime has one Python reference count. The runtime releases the reference count only when the coherency state becomes INVALID. (There is some slide cases like slicing)
Agree. I was also on board with you but just explained it incorrectly. So, it should be tasks at the phase after mapping phase.
We should differentiate the instance of
Cool
Cool
Do you have any idea on this? I am stuck with this. |
Gotcha. Thanks! I find this terminology confusing. To me the whole distributed object is the PArray and the internal data are just cupy/numpy buffers.
I don't know. Eviction tasks seem expensive, but it is a solution. Probably the easiest and most flexible. My other suggestions from our chat were (1) the thread that is running the mm (probably scheduler thread?) calls in Python via a callback and performs the move. This has the disadvantage of blocking scheduler execution during the copy call (& 2 GIL accesses). (2) Any worker that wakes up to run a task checks an eviction workqueue before/after its execution and performs the update.
This is a good policy suggestion! |
I think the problem was I didn't know how to call evict() from C++ side. I discussed this with Yineng and we designed an implementation. I will write it on the next thread. |
|
LGTM, thanks! |
This is implemented in #100. |
Terminology:
Garbage collection:
Eviction:
The text was updated successfully, but these errors were encountered: