Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composable multithreading with OpenBLAS #1067

Open
giordano opened this issue May 3, 2024 · 3 comments
Open

Composable multithreading with OpenBLAS #1067

giordano opened this issue May 3, 2024 · 3 comments
Labels
multithreading Base.Threads and related functionality

Comments

@giordano
Copy link
Contributor

giordano commented May 3, 2024

Julia comes with OpenBLAS and that's our default libblastrampoline backend. OpenMathLib/OpenBLAS#4577 (which was a continuation of OpenMathLib/OpenBLAS#2255 by @stevengj), which should come in the not-yet-released OpenBLAS v0.3.28, introduced callbacks to multithreading backends in OpenBLAS, which should enable composable multithreading with Julia. I'm opening this issue for discussing if/how we want to take advantage of that, bearing in mind that by default our BLAS calls go through libblastrampoline, which is backend-agnostic, which may complicate things for writing generic code. I think @jpsamaroo had some ideas involving Threads.@threads-powered callbacks or ScopedValue.

Also, ref JuliaLang/julia#43984.

@giordano giordano added the multithreading Base.Threads and related functionality label May 3, 2024
@jpsamaroo
Copy link
Member

Regarding composability, I think the first "obvious" thing to do is to implement our callback with Threads.@spawn (or Threads.@threads, if it makes more sense) to spawn BLAS jobs, by default. Of course, we want to retain compatibility with BLAS.set_num_threads settings, so we can limit the number of tasks that we spawn to whatever the global setting is (or not spawn tasks at all, just directly executing jobs, when set_num_threads(1)).

However, one additional feature that could be quite nice is making this parallelism configurable with ScopedValues. In particular, there are utilities like Threads.@threads and libraries like Dagger which already provide their own multi-threaded parallelism, which would do better without BLAS being automatically multithreaded. In such cases, it would be ideal if user code (which may or may not use BLAS) could be wrapped in a with block that would allow disabling BLAS multithreading for operations executed within an already-multithreaded context. This could allow for overall performance gains by better utilizing known data locality and allowing libraries like Dagger to accurately track where code is executing. It should be quite straightforward to add such a mechanism using ScopedValues, and should have very minimal overhead.

Aside: How this will work for non-OpenBLAS setups, or considering libblastrampoline, is not something I have really thought about. I think we can consider the ScopedValue support to be a hint for the desired amount of internal parallelism, which would allow a graceful degradation back to the global setting when the BLAS library doesn't support this kind of mechanism.

@stevengj
Copy link
Member

stevengj commented May 3, 2024

To start with, it would be nice to just hack this in by calling the new openblas_set_threads_callback_function function, without worrying too much about portability or API, and run some experiments to see how well it scales with Julia threads vs. its own threads.

@stevengj
Copy link
Member

stevengj commented May 7, 2024

At this point, I'm told that the composable backend is still somewhat buggy — it has a deadlock because it currently still assumes that its parallel tasks are scheduled onto separate threads which can spinlock-synchronize — see OpenMathLib/OpenBLAS#4418 (comment) … feel free to chip in if you have better suggestions to fix that.

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

No branches or pull requests

3 participants