-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Multithreading Experiment #6741
Conversation
Interesting. |
Thanks Jeff. Could you give me a hint in which function the actual code generation is performed. Allocation are the functions in You are right that its does not seem feasible to minimize the interaction with the runtime to zero. But my impression is that in a "hot loop" a lot less interaction happens than compared to for instance in Python. And please note that I am absolutely aware that this work is pretty naiv. On the other hand one can learn a lot about the Julia internals when debugging race conditions :-) |
@tknopp, ages ago I started treading down a similar path, implementing direct wrapping of
|
Thanks Tim. |
Ok, locking |
The reason to use C++11 threads was that it should be cross platform. But I have just seen that libuv provides cross platform threads and mutexes so this seems to be a better solution. |
Vaguely related: https://groups.google.com/d/msg/julia-dev/GTzZlo4Rhrk/ezZhx2wlLnAJ. The approach here is to use a work thread that has no interaction with the Julia runtime to do C work. The tricky part is from Julia, generating a function that can get arguments from a buffer and call a C function to do the actual work, without invoking any GC or code compilation – these generated wrapper functions only use |
Thanks Stefan, I seem to have missed that post. Maybe I should use the I was kind of surprised that my naiv approach worked and although it is still certainly broken I think that we should proceed here. The tricky part is to determine which functions need locks and which not. I currently don't see any crashes in my test program when using It would also be great if someone has a function in mind in Base that would greatly benefit from mutlithreading. In my test I used a matrix-vector multiplication, which is however not so CPU bound. Cheated a little by using Int128 though :-) |
Ported this to libuv threads/mutex so that it does not rely on C++11. Should hopefully satisfy Travis now. |
If nothing in Base comes immediately to mind, Images would be a great test case. |
yes I actually am looking for 3 test cases. I have made some progress with c) and have found a setting where PS: c) is of course just meant as an intermediate test case :-) |
Cc: @stevengj |
The nested locks I had for |
I believe thread-local gcstack would be ok. You would have to modify the GC to walk every thread's stack instead of the current one only (see the use of gcstack in gc.c). |
@carnaval: The code is quite messy in the current state because I have tried something here and there. I have not a lot experience with garabage collectors so I am not entirely sure if I get this done alone. My idea for the thread-local gcstack would be to use a map of pointers:
and in How to handle garbage collection is indeed an interesting issue. In the first round I would simply disable it. |
Using a |
Indeed thanks. Was not aware of TLS. Seems that libuv as support for it (https://github.com/joyent/libuv/blob/master/include/uv.h#L2202) |
I just realize that when I disable the gc when the threads are running I could also just let JL_GC_PUSH/POP be noops. |
Some more findings / improvements. I added some missing locks around the Locking code generation seems to be quite tricky. I had a situation were one thread was in An alternative that is currently commited is to lock |
I have been looking into the So I really need to understand what all this task switching is about. |
Regardless, it is good that you mention it in the 'docs'. Wouldn't it be good to have a critical section macro, for example, within a thread one can do @critical println("hello") to be converted to lock(someMutex)
println("hello")
unlock(someMutex) though there is an issue: where to initialize the mutex. Does julia have something like static variables? Something probably equally or more important would be to have somethings like @atomic a += 2 I assume LLVM provides a way to indicate that. I am thinking of the openMP constructs I frequently use that make a difference when working with multithreading. |
Another approach I've used in C is to implement some kind of message queue, and do all printing from the main thread. |
I have already tried to lock I am quite new to coroutines so while I understand the concept it is not entirely clear when task switches can happen in Julia code. So I am unsure which is the approach to go:
|
You're also not allowed to do I/O from anything but the main thread because the uv interface is not thread safe. |
Note that |
Ah thanks @loladiro. I had not yet tried to do I/O in a thread so have not seen that. This kind of restricts us to computational things which is probably ok. My dream was a little to also use threads when doing GUI programming where one usually also does I/O in threads to let the GUI still be responsive. |
You might try turning off COPY_STACK in task.c Since when libuv gained thread pools, i thought it had become (more) thread-aware |
task.c does not compile when I undef |
Ok I got I have further fixed some locks that should now make it possible to run (raw) threads interactively from the REPL (gc has to be disabled!). With that I was able to spawn a thread that was periodically incrementing an array value and see this in action from the REPL. |
and not just an unspecific error message
I improved a little the exception handling. Before, only an unspecific excpetion was rethrown in the main thread (i.e. a flag: there was an excption in a thread). Now the actual exception is thrown. TLS is very useful for this stuff. |
@tknopp Sounds great, I will give it a try later this week or next week, will keep you posted. |
I am closing this as the |
ah wrong button |
Yes, threads is the branch now. |
This is an experiment on running Julia code in multiple threads (see #1790). While the code is certainly pretty broken I still like to get feedback whether the approach might be feasible.
The idea is to first precompile the function that is supposed to be run in the thread. To this end I use
jl_get_specialization
, which seems to do what I want. The hope is that the interaction with the Julia environment are minimal after precompilation so that race conditions are less likely to happen.I have implemented two functions
parapply
andparapply_jl
that take a function that is supposed to be run several times with the last argument being an index that covers a predefined range. Whilepar apply
implements the scheduling in C,parapply_jl
implements it in Julia and exploits a direct thread API that is exposed to Julia. For the examples I have triedparapply
works whileparapply_jl
crashes sometimes.While this is a pretty naiv solution the real work is now to determine the places where race conditions can occur. Any help on this would be really appreciated.
This was developed on OSX with clang. For gcc one needs other compiler flags to enable c++11 threads.