-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Shared-memory parallel computing #4580
Conversation
Great to see this. My main observation is that this has a lot of overlap with DArrays, and should maybe be a feature of that type. |
Yep, but with one important exception: you can guarantee that |
That said, I'm supportive of the idea of merging (fewer types is better for everyone), if a good way forward exists. |
It is probably good enough to have common interfaces and a common ArrayDist type (i almost called it Distribution, but...) |
Sounds good. For the benefit of others who might want to review this, perhaps some additional explanation is in order. This is set up to make it basically trivial to handle embarrassingly-parallel problems (although it's more flexible than that). You create a
and then you can manipulate it just as an ordinary array (with what should be the same performance):
etc. There's no special implementation of When you want to execute a parallel operation, you say
The arguments can be anything, and in particular you can have as many (or as few) shared arrays in that list as you want. There are major performance advantages to making sure that the To take advantage of parallel processing, the function you pass to
Assuming For quickly-running operations, you get some performance benefit by bypassing the ordinary synchronization methods, like so:
where The bottom line: my hope is that for many purposes this should be a reasonable substitute for SMP multithreading, without most of the dangers. See also #1802. |
This is certainly the way to go until we do real multi-threading. It would be great to work in some of |
I guess what some people call "real" multithreading I call "incredibly unreliable and difficult" multithreading. |
IMHO, we should support "incredibly unreliable and difficult" multithreading at some point, only because "performance" is one of the main drivers of Julia. And we should provide all the tools we can to people who can deal with the said complexity of multithreading, and want to squeeze out every bit of performance. If I were to design a large scale system for reliability, I am fully with you that a message passing model is the way to go. But since a) we benchmark ourselves with folks building high performance stuff in C and b) CPUs are cramming more and more cores, we should provide a means for developers to leverage those cores without copying copious amounts of data between processes. An alternative, though, maybe a model wherein the API is still message passing but a) the runtime itself is multi-threaded and |
bump |
To make it work across platforms, there are two issues:
Perhaps both of those functions should be moved to Those are the only problems that I know of (aside from writing documentation, etc). But perhaps others might have objections that run more deeply? If so, this might be a good time to voice them. |
I should add that some googling has not turned up an OSX variant of |
That would be difficult on a mac. I am sure it requires superuser, and more so, those are pretty intrusive changes for julia to make on someone's system. |
Right. It would have be something we'd do once, when Julia is installed (presumably by making the changes to |
Have you looked into |
I haven't actually tried this, but wouldn't simply using shm_open and shm_unlink work the same way without requiring /dev/shm? As Unix domain sockets can be used to pass file descriptors around, one might try the following strategy:
Essentially, I'm trying to say that a nicer equivalent to Python's multiprocessing with some convenience tools on top looks doable without mucking with Julia's internals, and that's already a 80% solution. With proper internals-mucking, you might later be able to do "better threads than 'real threads'" with the same basic approach, which should be a 100% solution (there's always ccall if one really absolutely must have 'real' threads). |
I believe that is what @amitmurthy is doing in the PTools.jl package. |
This is great feedback. I didn't even know about I've been mulling this over. I like playing soccer, and can often dribble around several opponents, but sometimes you realize it's time to pass the ball to a teammate. This is such a time. There are quite a few people who know much more about this particular topic than I do, and in any event I have weeks worth of work (which will stretch to months when coupled with real life) to do in HDF5, Images, and ImageView. Anyone who wants to see this feature in 0.3 is invited to have a go. If this code is a useful starting point, it's already in the |
The renaming of the branch is certainly not necessary! |
@timholy , my modified branch using shm_open and shm_unlink is at https://github.com/amitmurthy/julia/tree/amitm/sharedarrays .
Please do check it out and merge into @RauliRuohonen, @JeffBezanson , |
Amit, thanks so much for picking this ball up! As far as I can tell your changes look fine, so I merged it. |
@RauliRuohonen: since you know Python's multiprocessing, and also the things about it you don't like and would like to change, you're probably the best one to tackle these issues. (If you are new to Julia/github development, see the CONTRIBUTING.md file.) Otherwise, from my perspective the only thing missing is Windows support. In the absence of a restructuring from @RauliRuohonen, and given Amit's "seal of approval" of the overall design, I may be able to at least take a stab at implementing that over the next couple of weeks. |
@timholy , I am implementing this idea of yours in DArray itself. Give me a couple of days... |
@amitmurthy, of course! |
@timholy Should we close this in light of @amitmurthy 's new PR on the topic? |
This is 0.3 material or beyond.
A recent mailing-list discussion, and particularly a comment by Jason Riedy, pointed out a potential strategy for shared-memory parallelism that circumvents some of the problems with other strategies. I've fleshed that basic idea out here, hopefully in a way that provides a fairly nice Julian interface.
This is an RFC for several important reasons: at the moment it's Linux-only, there are API issues to consider, I haven't integrated it into the build process of Julia (it's easier to test and tweak this way, if someone else wants to do so), there are no docs, etc.
Noteworthy features:
pfork
, this has a more modest overhead. On my machinepfork
has approximately 500ms minimum latency even for a trivial computation. Here, thepcall
form is limited by the event queue, and on my machine seems to have a minimum latency of about 40ms. The busy-wait versionpcall_bw
gets that down to less than 1ms per process.In my view the main API issue to consider is how to handle the return value(s). As an example of the conceptual challenge, many Julia functions have a return like
fill!()
, where the filled array is returned---this is nice because then you can chain together function calls very naturally. However, serializing that output will result in a huge performance hit. So the convention here is to return two things: the full output of the function call on process id 1 (which requires no serialization), and the output of each other process as an array of remote references. In cases where the output is already being stored in a pre-allocatedSharedArray
, none of these outputs matter anyway, because such results are immediately available in the array that exists in process id 1.CC @amitmurthy.