-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using multiple threads/cores on OSX #18
Comments
It should be possible and it should be working. Could try to run the function It returns 0 if linking against OpenMP works and 1 if not. |
It is, but you need to compile with not the default mac toolchain since that does not work (I think they have a non standard implementation or something similar). I am personally using another module to check for it automatically here https://github.com/samuelstjean/spams-python/blob/master/setup.py#L114 Seems like it was also added to a special package to do just that if it's cleaner to use https://github.com/astropy/extension-helpers |
Thanks @samuelstjean, I'll look into what you coded, we should reuse it.
|
I am personally using the older version, directly stolen from astropy, so just including this one file could be an idea. If stuff breaks we would have to fix it though, or copy an updated one, so it should not require too much work if it works as is and nobody touches it. |
I asked them about releasing on PyPI more often (c.f. astropy/extension-helpers#38). Depending on their answer, we will either directly integrate their file as you did or just use |
Sorry for the delay. I just tested this version, but it does not work, only 1 core is used. I found that I could fix the issue with
['cc', '-fopenmp' ,'-o', to ['cc', '-Xpreprocessor', '-fopenmp', '-lomp', '-o',
if check_openmp() == 0:
cc_flags.append('-fopenmp')
link_flags.append('-fopenmp') to if check_openmp() == 0:
cc_flags.append('-Xpreprocessor')
cc_flags.append('-fopenmp')
link_flags.append('-lomp') |
@daducci Thanks for the report. I'll do some tests (and eventually report to https://github.com/astropy/extension-helpers if needed). |
Seems like someone also found it astropy/extension-helpers#40, my guess is that those flags are for gcc, and apple is using clang as the default compiler. For a very long time, their version had issues or something else, and people had to install a third party gcc to get stuff working (like by using homebrew for example). Seems like the easy way here is to let extension helper do the job and put the correct flag for the correct compiler, or hack it in somehow, but that sounds like trouble in the future to make it work properly, and on old versions also at the same time |
I am digging a bit on this. As long as
[1] See
|
I am trying to fix I'll keep you updated (I will propose a PR when it is working). Update: PR on the way astropy/extension-helpers#42 |
@daducci could you tell me if installing the modified version of
|
No, in my case only 1 core is used. :-( Multiple cores are used only when installing the version with the manual patch I suggested few days ago. Dunno why this one does not work, as it seems all the modifications in there match mine. |
@daducci ok thanks, I'll continue digging. Since the manual patch works, I hope we are not foo far from finding the solution! |
@daducci I think the issue was found (astropy/extension-helpers#42 (comment)). You are using the new Apple CPUs if I recall correctly ? (hence not Intel-based, hence a different location for |
Yes, I am! Sorry, I didn't think Apple would change paths/etc... based on the vendor of its CPUs! |
No problem, we will find a solution, the person suggested one in his response. |
@daducci Could you tell me the output of Edit: and also |
|
@daducci could you retry ? (maybe uninstall every thing before or try in a clean environment)
On my MacOS VM, multi-threading seems to work. |
Hi @gdurif , sorry for the long delay, I have been off for Easter. Unfortunately, it doesn't work; I can make it work only when using the "manual trick" from a previous message. Here is the code snippet I use to benchmark: import time
import spams
import numpy as np
from tqdm import tqdm
# data generation
np.random.seed(0)
X = np.asfortranarray(np.random.normal(size=(100,200)))
X = np.asfortranarray(X / np.tile(np.sqrt((X*X).sum(axis=0)),(X.shape[0],1)))
D = np.asfortranarray(np.random.normal(size=(100,1000)))
D = np.asfortranarray(D / np.tile(np.sqrt((D*D).sum(axis=0)),(D.shape[0],1)))
tic = time.time()
for i in tqdm(range(500)):
alpha = spams.lasso( X, D=D, lambda1=0.15, numThreads=-1 )
print( f"{time.time() - tic:.2f}s") If needed, we can chat one of these days and make some test in real time. Just let me know if this can help! |
Hi @daducci |
Hi guys, what about resuming this discussion to try getting to the bottom of it? |
Looks like it will be complicated for now, since building on arm is done by cross-compiling with the thing I am using, and that means homebrew pulls stuff for x64 instead. It does build without putting in the libs, so people might be able to work around it by installing openblas themselves. |
Well if we wait a bit normally there will be some buildbots directly on arm64 for macs. As of now, this means it will be impossible to install a premade blas, unless we want to compile our own each time, but I would not do that because it means we would need to check the test and fix ourselves every little bugs since we build from source. Hopefully this will fix it pypa/cibuildwheel#1204 As of now, I am using the conda version on windows, the cent os version on linux and the homebrew version for mac to avoid that, but that means we won't have a blas version for mac M1 until someone else makes it for me. With a bit of luck maybe the M1 version can use the libs from the x64 version as it apparently emulates stuff behind the scene for you, so if those builds work we could do that (and normally it should be multithreaded, but I don't have a mac so...). |
Well I tried some new arm mac build, but I don't have access to one to check if everything works as supposed. They are built with some emulation layer, so I can not test them either on the build machines. It should be fine if you want to give it a try before putting them back here https://github.com/samuelstjean/spams-python/releases |
Is it possible to use multiple threads/cores under OSX? It works on Linux, but I'm not able to use this feature on the Mac.
The text was updated successfully, but these errors were encountered: