-
-
Notifications
You must be signed in to change notification settings - Fork 55
Conversation
067dc93
to
8beb5d6
Compare
So this has been pretty painful. |
This passes CuArrays tests again. Summarized:we now have a couple of LLVM passes that rewrite IR to make it GPU compatible:
|
This reminds me of |
Similar indeed. A proper fix seems tricky though, because then you'd need LLVM to reason about the thread mask and only return when all threads are active (assuming this issue is related due to exiting from a function during divergence... but shmem seems related too). It looks like CuArrays mapreduce triggers this quite easily though, so let's hope that helps us spot any changes in upstream irgen (ie. patterns in Julia-generated IR that generate a CFG ptxas doesn't like) before running into this again. |
Apparently LLVM managed to optimize many
throw
s away, which breaks now that we don't inline everything anymore. Let's try to work around some.