-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to run native tests of neanderthal sucessfull #127
Comments
Even adding the
|
So it seems to me that neither with lastet global MKL nor latest |
Using older
-->
|
Same without MKL installed:
|
What I don't really get is the whole .1 and .2 suffix to libmkl_rt.so in newer versions. I understand that it's a versioning thing, but the official documentation in these newer version (https://www.intel.com/content/dam/develop/external/us/en/documents/onemkl-developerguide-linux.pdf) explicitly states `libmkl_rt" as the build dependency, exactly what I was always using to build neanderthal-mkl... I guess that I'll have to see how to re-build neanderthal to the latest MKL, and distribute that version as the "official" one. This should probably require users to upgrade their MKL to the recent one, too. |
I have seen that they symlink to each other ".1 and .2 suffix to libmkl_rt.so in" in the lastet version. |
Doing this, finds the libmkl_rt.so , but it fails on something else, see in comment.
|
If I do not use "bytedeco", I get an other error:
|
This version of MKL l_onemkl_p_2022.1.0.223.sh /opt/intel/oneapi/mkl/2022.1.0/lib/intel64
|
and indeed , by setting LD_LIBRY_PATH to "/opt/intel/oneapi/mkl/2022.1.0/lib/intel64", it seems to find it. |
I give up at this point in time. Probably some "old" setup, which "today" cannot be re-created anymore. (as library versions are gone) |
It seems to me that your MKL distribution misses This should and usually is automatically in the right place, but it might be broken in some installations, as people create countless variations. |
I am just looking at that. It is true that my Dockerimage has less things then a "normal OS". |
I switched know to install mkl as debian package, does not make it better:
|
Ok, I finally found a working solution, in teh form of a Dockerfile:
Very simple, even. |
But it is not true, in my view, as the instructions here suggest, Add a MKL distribution jar [org.bytedeco/mkl-platform-redist "2020.3-1.5.4"] as your project’s dependency. Neanderhtal will use the native CPU MKL binaries from that jar automatically, so you don’t need to do anything else This does fail:
with
So I would say it requires "a lot of luck", if adding "[org.bytedeco/mkl-platform-redist "2020.3-1.5.4"]" and "doing nothing else" is indeed working. |
It seem to me that the installation of "intel-mkl" via "apt" does more then only putting the required ".so" files somewhere. (which the bytedeco jar can only do) |
I always recommend installing intel mkl globally as this is what I use. Everything else is something that people ask me to support and I am trying to satisfy these demands as much as I can. Any help in that regard is always welcome, but there the ground moves from time to time. |
The stuff that you see is needed only for building native dependencies, which is what I need. For using neanderthal, only the visibility of the appropriate .so files should be enough (I've tested this multiple times on multiple OSes, but who knows ;) |
One way to address this is to try to maintain a single Docker image for the Clojure Data Science community. It is setup to allow the R and python bindings to Clojure to work out of the box. I know that the Clojure community is not a very big fan of Docker based development, but maybe it is worth to extend the above docker image to explicitly support neanderthal and therefore deep diamond out of the box. What to do you think ? I could give it a go and try to setup all needed stuff for |
Of course it would be good to have it as an option. I don't use docker, but some people certainly prefer it, so I don't see how sharing 3rd party setups could hurt. It would be best if you could set it up as a github repo, and link it here. |
@behrica Hi Carsten, I have the needed MKL libs for Linux, Mac, and Win that I created for installation for Saite. They are all in compressed archives. These have always worked for me across various machines, and OS versions (only Intel Mac - no new Arm stuff) and Win10 for Windows. For Linux and (Intel) Mac, aerosaite, the self installing uberjar variant, comes with scripts for running it that setup the paths for the MKL. This too, has always worked for various users. Aerosaite automatically downloads and installs the MKL libs to a local directory relative to the .saite home directory. BUT, you could manually grab these if you wish and install them in some similar location that makes sense for you. I am unsure about how to automatically set the path for Win (someone recently gave me an idea of what it should be so maybe the next release the Win scripts will have that as well). I'm not sure if your setup is 'special' in some way that would keep this from working, but it may be worth a try. As I say, this has always worked. The scripts are in the resources folder at the aerosaite github (link above). |
Hi @jsa-aerial that is really helpful. Maybe we can make this or some more focused standalone version of this an official recommendation for people that for some reason or another can't make the official vendor binaries work on their system? |
Just a quick note: Neanderthal's MKL dependency does not need any installation other than the lib files being in any location where the appropriate OS looks for shared libraries. Even copy/paste works. |
@jsa-aerial I do agree that we should have more "instructions" / variants to get MKL installed (and deep-diamond working) "Working" I measure by have the all deep-diamond tests passing. Even for non Docker users, reading the Dockerfile can be useful: It is nearly working... It would be very helpful, if somebody with more knowledge on CUDA / OpenCL / Linux would have a look. The dockerfile can be build as usual with Currently I get this error, and I am not sure what to try next.
Could somebody help out with this ? |
As you see in the Dockerfile I settled on Cuda 11.4, with 11.6 I had even more weiredd issues and did not come "this far". That won't work as ClojureCUDA is tied to specific CUDA version that should be installed on your machine in addition to nvidia drivers. This is currently 11.6.1 Additionally, Deep Diamond requires Nvidia's cuDNN too. On Arch Linux, both are available as packages (cuda and cudnn) through pacman. On other systems, they are fairly widely available, and nvidia offer click-through installers too on their main website. Pleasesee details at clojurecuda web page. |
The question is "which precise shared libraries" it needs. It seems to me, that given on how MKL is installed, "different libraries" do get installed in the appropriate places. And I had issues with wrong GLIBC versions and so on. |
This means your setup should be ok. Your only implementation is nvidia, which supports Opencl 1.2. OpenCL 3 is basically 1.2 repackaged, And Opencl 2 has most features, but is left as a vestige as Nvidia and Apple sabotaged it. Complicated, I know... |
Thanks, that helps.
It happens during "lein test", I suppose during the tests which are using OpenCL. So I assume correctly that "opencl-3.0-1.5.7" requires "openCL 3.0" (or at least more then 2.1) ? |
It means that javacpp has some problems finding opencl on your system. However, note that Neanderthal/ClojureCL does not use javacpp for that, but another unrelated library. Javacpp dependency on OpenCL is probably coincidental as I don't directly use it, and javacpp dnnl library tries to load it on its own (there is an old solved issue at javacpp github that might give more info). IF neanderthal opencl tests pass it means everything should be ok with your system's opencl. Why javacpp has problems? My hunch is because your docker setup misses something, but I can't be sure since I don't use docker. |
My Dockerimage is an Ubuntu 20.04 image. So it is Ubuntu in most regards. I would like to promote usage of 'neanderthal', but the installation of it (or better said it's dependencies MKL and CUDA / OpenCL) are a gigantic hurdle. I am thinking somehow as well, that 'Docker' is the only way out, but that is not shared by lots of people, unfortunately. I think that "maintaining and publishing" a Dockerfile and Image with a "working deep-diamond" where the user only need to type "docker run --gpus all xxxx" is important in this. I thought I can do this on my own, but I think this is not the case. I know too little on CUDA, OpenCL and "extending Java with native code" in order to bring this forward myself. The installation instructions are too general to allow me to further work on the Dockerfile efficiently. |
I propose to go a step back, and I work on a "Minimal" Dockerfile (ubuntu based) which has only the goal to setup MKL, Cuda, OpenCL to get the "neanderthal" test suite working inside of it. Maybe I could contribute that Dockerfile to the "neanderthal" GitHub. I am not sure, if I can get it "working" by myself, but maybe we could collaborate on it in some form. At least by "reviewing" it and trying to see, if I do something which cannot work. What do you think ? |
Yes, sure. Fortunately, Neanderthal is a Java library, so it does not care whether it runs in docker or wherever else. As for the github, that's why I think the best home for the docker setup is a separate github repository. I understand that it looks overwhelming, but I believe it is mostly because you're trying to fit together 10 moving parts of which you don't have experience with half of them. In reality, it is MUCH simpler: For Neanderthal MKL to work, you ONLY need MKL .so files somewhere on your LD_LIBRARY_PATH. That's it. If other software using MKL work (pytorch or whatever) Neanderthal should work. For Neanderthal CUDA backend, you ONLY need properly installed CUDA by Nvidia. If other CUDA-based software works, Neanderthal should (assuming you're not using some 3-rd party package system such as anaconda etc. that set their own local CUDA etc.) For OpenCL it's similar... Basically, there should not be any specific requirement by Neanderthal et al. other than having vanilla installations of these technologies as prescribed by their vendors, or simpler. I would definitely recommend either following the setup recommended in Getting Started until you understand these moving parts, or at least following @jsa-aerial Saite setup, which seem to help in this regard. |
Frankly, if you want just works automatically "out of the box", aerosaite is the quickest and easiest route. Certainly for Linux users this is pretty much guaranteed to work. For CPU. I think you are being naive about putting something together for automatic GPU use. There you are up against all the issues about getting the GPU usable completely aside from Neanderthal/DeepDiamond. There are just way too many variations, requirements and dependencies. |
... and, of course, for GPU computing to work, you'd have to have recent vendor drivers installed properly. That, usually, is not automatic anyway. |
That sounds like a reasonable/good idea. Suggestion on how to proceed? |
I'm not familiar with how saite works, so I don't know precisely, but is there a way to provide the basic MKL and/or CUDA distribution without other parts of saite and even without Neanderthal? Anyway, it might be a good option for people who can't or don't want to follow my official guides to have the scripts you provide as an option, and if it works sufficiently predictable, we can link to your repository as and option from the getting started guide. The only drawback I see is that it would make users read these guide even less, and it would appear more complicated.
Perhaps if I have written: the user has to copy these 7 .so files at folder X, and must add this folder to LD_LIBRARY_PATH, and must restart shell, it would have been simpler. Instead, I opted to write a more versatile guide with all popular options, and users being impatient get lost in the sea of choices... |
This could be. But is this even true when using Docker ? Have you tried it ? Or does Docker at least help ? Or at least that Dockerfile can be "parametrized" (so not assuming one fixed one for every situation, but a template) So that the Dockerfile is at least a "base or template" which then a user can modify , which is hopefully easier then "installing from scratch" |
For MKL, the links I quoted above satisfy this - they are just (g)zipped archives of the necessary sharable libs for each platform. That's it. So, no Saite and no Neanderthal and no DeepDiamond. For the reasons I mentioned above, I decided to not support GPU, because it depends on way more than just the base platform just to get the GPU itself working for computation for you. Basically, in that case you are on your own for getting and installing the correct drivers and any other requirements.
That sounds fine - the scripts for Linux and (Intel) Mac have worked fine for several users - out of the box. If you are not using Saite, you would just need to grab the bits for running your stuff. These things are very small as there is in fact, very little that needs to be done.
Yes, that would be a drawback - any black box route will keep people from understanding what is really going on.
Maybe you can have a "TL;DR" section where you state this and then refer others to the details? |
I have seen this in Python land. "pip install tensorflow-gpu" was working for me out-of-the-box. |
Of course it is true using Docker - Docker is not some magic thing that somehow automatically knows what type (vendor, model, version) GPU, how many, and what the drivers are and if they are properly installed. You'd have to have a Docker image for all the combinations Myself, I don't much like Docker, but understand those who do... |
Agree, but I would hope that over time all vendors will produce "one driver", which work for all their GPUs. The we could have a parameterized Dockerfile, which just gets the "vendor". So I still think that only a "view people maintain a Dockerfile" needed to know all nifty details, while the majority of user could just "use" the Docker file or image. Similar to the JVM abstraction. |
As far as I know, pip install tensorflow-gpu does not install cuda, it expects cuda to be available on your system. Exactly as Neanderthal. But the difference is that Neanderthal will throw an exception if you call absent cuda backend, while tensorflow might automatically fall back to the default engine, whatever it is? OTOH, conda does (AFAIK) install CUDA, but an internal one. I could do that, if you comit to my (hypothetical) proprietary environment such is conda. You still have to make sure that the right GPU drivers are present. |
@blueberry One more confusing point in the instructions is the required (or workable) CUDA version. From my experience it does "for example", not work to use CUDA 11.4 and "[org.jcuda/jcuda "11.6.1"]" (which we get by default). I just had this case and got an uggly.
Explicite downgrading to "[org.jcuda/jcuda "11.4.1"]" solved it. So it seems that "versions of native libraries" and "Clojure/Java dependencies" need to match more precisely then the instructions suggest. (at least from my understanding) Again the only "more user friendly" form to help users in this I can see, is Docker. Which can be setup in a way that it "freezes" both, native libraries and deps.edn in a known state (at least for documentation purpose) |
Each version of Neanderthal CUDA backend is tied to the CUDA version specified in its dependency to JCuda. So, for the latest version, it is 11.6. If it says 11.4, that's because I missed to update the docs. |
Thanks , ClojureCUDA docu says currently this, which seems to say "any CUDA 11.x"
I hope I help with this comments, if not let me know... |
I updated the docs of ClojureCUDA to clarify this. You might use any CUDA version with ClojureCUDA. However, if the CUDA version on your system does not match the one that ClojureCUDA depends on in project.clj, you have to specify explicit dependency to the matching JCuda version in YOUR project.clj It's similar for Neanderthal, but it might be that a very outdated CUDA does not support all features that I use, and break at will. Ditto for DD, but I don't expect old CUDA versions so work successfully. |
I would say there is zero chance of this happening. Not a small chance, but no chance. There are too many legitimate reasons for them to not do this. |
And it DOES (give or take a detail or two) *but you have to state that explicit version, and versions in your project.clj has to match the version installed on your machine. If you specify 11.4 in project.clj, while you install whatever CUDA is shipped with Arch (11.7 currently I believe) it will not work. Which brings us to one detail: If you do this today, the default JCuda version that Neanderthal/ClojureCUDA uses is 11.6. You have to have that on your OS. CUDA 11.7 is not supported yet (although it might now be what arch installs by default). |
As far as I can see, your system has CUDA 11.6.1, which is exactly what is expected, so you should not change any default. 11.4.1 generally shouldn't work on your machine (or if it does, it's more luck than anything else). |
yeah, one reason for me to insist in Docker is "multiple computers". |
This is a known gotcha in JCuda. Your system has an old GLIBC. Not your Arch Linux, that one is up-to-date, but your Docker-provided system, which is, if I remember correctly, an Ubuntu one, v 20-or-so. That one ships with a bit older GLIBC. The trouble with GLIBC is that it's version is so fundamentally hard-coded in your environment that it's very difficult to use another one, you have to use the one provided by your system. And your system provides an old one, which breaks JCuda, which was compiled with a recent one. Your Arch Linux should work, am I correct? Native dependencies are tricky ;) |
yes, It is about that. Took me a while to figure it out, |
Fortunately, you can help solve it. That would require that you build JCuda on your (older) system, and these binaries will then work on newer systems too! Please check out this issue: |
I created PR #128 |
While we discussed
uncomplicate/deep-diamond#15
the issue that Neanderthal does not find any more the libmkl_rt.so (even when globaly installed) came up as an other issue.
I prepared a Dockerfile which exposed the issue, maybe useful.
The text was updated successfully, but these errors were encountered: