-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regresssions in MKL 2025.0 #83
Comments
Only SYCL components of MKL need 2.28 as it is needed by DPC++ runtime. Defer to @mkrainiuk for the remaining issues. |
That constraint was fixed, and the same 75/95 failures now also appear completely without any change to the compilers (logs). @mkrainiuk, please advise what's going on here or how we can fix it. |
Looks like oneMKL might have some API changes, adding @sknepper for confirmation. |
Thanks for the response!
In the blas metapackage we only build the tests from https://github.com/Reference-LAPACK/lapack/ and run them against the various blas implementations. The MKL packages themselves aren't built in conda-forge, they're only repackaged, so I cannot offer logs on that. Presumably they should be available somewhere Intel-internally?
Not sure if my info there is incorrect or out of date, but didn't MKL use to build both ILP64 & LP64 symbols into the same library? |
In these logs, it looks like Linux was successful while Windows had failures. Am I understanding the logs correctly, @h-vetinari ? As Maria said, these "Parameter x was incorrect on entry to" errors often relate to incorrect configuration of the LP64/ILP64 interfaces. Selected domains provide API extensions with the _64 suffix (for example, SGEMM_64) for supporting large data arrays in the LP64 library, which enables the mixing of data types in one application. |
Yes, the linux issue has been resolved in #84, all the remaining problems are on windows.
So far we haven't been actively distinguishing (that I know of) which integer model we use for MKL (though we do for OpenBLAS for example). So the answer is probably whatever Reference-LAPACK (3.9 resp. 3.11) does by default on windows. How would I be able to set this correctly? Just define |
May be not the direct answer, but there is a tool from intel to figure out proper linker arguments: |
Thanks. This suggests to link So far, we've only needed to point to Is that not sufficient anymore, presumably? |
One other thought I had - there are some known issues on AMD Windows, which will be fixed in an upcoming patch release (oneMKL 2025.0.1). Was this run on an AMD or Intel system? |
I think azure pipelines has various CI agents in their pool, but most are intel AFAIK (Skylake X or so). OTOH, the fact that it's reproducible exactly across 4+ runs also means that it's either independent of the CPU architecture, or that it's happening on all of the agents that we happened to draw. |
in general with those pipelines based on experience it's around 90/10 Intel/AMD ratio that you can expect. |
Any updates here? @mkrainiuk @ZzEeKkAa @sknepper @napetrov @oleksandr-pavlyk |
@h-vetinari To select the required interface, there are some environment variables that can be defined. Check out this page: https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-windows/2025-0/using-the-single-dynamic-library.html From above:
By default, mkl_rt linking enables LP64 interfaces and Intel OpenMP threading. I am new to conda-forge testing - can you please provide more details about how Netlib lapack is used here? How does netlib-lapack work with MKL here? |
Hi! :)
Everything is public, you just need to click on the link at the end of a PR and then Do note that azure will delete the logs of PRs after a month, so if see something like "cannot be found", we'll just have to rerun things.
As I pointed out, we don't use the link advisor. We want to link to the actual library (or libraries) that contain the symbols, behind whatever amount of indirection or symlinks happens. It's fine by us if the SOVERSION of that DLL changes occasionally, that's not the issue. Of course I don't mind changing the link setup if necessary, but that's why I was asking what changed.
The blas setup in conda-forge is somewhat unusual. By default, every project (needing BLAS/LAPACK) will compile against the netlib API & ABI, but because we've set up the other BLAS flavours to conform to that same ABI, we can switch out the BLAS implementation based on user choice upon installation, without having to recompile the artefact that got built (or without having to build everything multiple times). To validate that this process works, the blas-feedstock will build the test suite from netlib, link it against whatever BLAS flavour (in this case MKL), and then run the tests to see that everything is working (more details). This is the part that started failing since MKL 2025.0 |
Gentle ping @vmalia @mkrainiuk @ZzEeKkAa, and happy new year! 🥳 |
In addition to the question whether mkl now really requires
__glibc >=2.28
on linux, I tested MKL 2025.0 against the test suite from netlib lapack, and it seems there's some substantial test failures.the simplest upgrade runs into constraints with libhwloc, see here (--> not the fault of this feedstock per se, but cannot test)
testing MKL 2025.0 against LAPACK 3.9.0 together with the switch to flang yields 75/95 failures (logs):
testing MKL 2025.0 against LAPACK 3.11.0 (together with the switch to flang) also yields 75/95 failures (logs):
The reason why I'm almost certain that it's unrelated to the switch to flang, is that MKL 2024.2 + flang only has the following failures (logs):
The errors roughly look as follows
Perhaps this is created to some linkage issue? Was something changed w.r.t. the compiler setup for MKL 2025.0 that could have affected the symbol names?
CC @ZzEeKkAa @Alexsandruss @oleksandr-pavlyk @isuruf
The text was updated successfully, but these errors were encountered: