-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[[email protected]] Add patch to fix CGEMM/CTRMM/DNRM2 on Apple M1/M2 #8022
Conversation
The patch is not applying cleanly: https://buildkite.com/julialang/yggdrasil/builds/8000#018d4684-0eea-427e-a169-f7f3059fa192/622-12767 |
yes, let me try to reconstruct the patch |
I'm just learning the ropes here, but I think the issue might be that we also need the M2 patch from commit caa29451, since otherwise the patch does not apply properly to the 0.3.23 release |
Yes, but at this point it's easier to have a single cumulative patch. Something like (untested!) should work diff --git a/cpuid_arm64.c b/cpuid_arm64.c
index 809f48e95a..e586f9a3c2 100644
--- a/cpuid_arm64.c
+++ b/cpuid_arm64.c
@@ -267,8 +267,9 @@ int detect(void)
}
#else
#ifdef __APPLE__
- sysctlbyname("hw.cpufamily",&value,&length,NULL,0);
- if (value ==131287967|| value == 458787763 ) return CPU_VORTEX; //A12/M1
- if (value == 3660830781) return CPU_VORTEX; //A15/M2
+ sysctlbyname("hw.cpufamily",&value64,&length64,NULL,0);
+ if (value ==131287967|| value == 458787763 ) return CPU_VORTEX;
#endif
return CPU_ARMV8;
#endif
diff --git a/kernel/arm64/cgemm_kernel_8x4.S b/kernel/arm64/cgemm_kernel_8x4.S
index 24e08a646a..f100adc7af 100644
--- a/kernel/arm64/cgemm_kernel_8x4.S
+++ b/kernel/arm64/cgemm_kernel_8x4.S
@@ -49,7 +49,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#define pCRow3 x15
#define pA x16
#define alphaR w17
-#define alphaI w18
+#define alphaI w19
#define alpha0_R s10
#define alphaV0_R v10.s[0]
diff --git a/kernel/arm64/cgemm_kernel_8x4_thunderx2t99.S b/kernel/arm64/cgemm_kernel_8x4_thunderx2t99.S
index 29a68ff227..2c63925be2 100644
--- a/kernel/arm64/cgemm_kernel_8x4_thunderx2t99.S
+++ b/kernel/arm64/cgemm_kernel_8x4_thunderx2t99.S
@@ -49,7 +49,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#define pCRow3 x15
#define pA x16
#define alphaR w17
-#define alphaI w18
+#define alphaI w19
#define alpha0_R s10
#define alphaV0_R v10.s[0]
diff --git a/kernel/arm64/ctrmm_kernel_8x4.S b/kernel/arm64/ctrmm_kernel_8x4.S
index 5c08273975..e8f1d8cf30 100644
--- a/kernel/arm64/ctrmm_kernel_8x4.S
+++ b/kernel/arm64/ctrmm_kernel_8x4.S
@@ -49,10 +49,10 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#define pCRow3 x15
#define pA x16
#define alphaR w17
-#define alphaI w18
-#define temp x19
-#define tempOffset x20
-#define tempK x21
+#define alphaI w19
+#define temp x20
+#define tempOffset x21
+#define tempK x22
#define alpha0_R s10
#define alphaV0_R v10.s[0]
diff --git a/kernel/arm64/dznrm2_thunderx2t99.c b/kernel/arm64/dznrm2_thunderx2t99.c
index e342b0b63f..6077c85dd1 100644
--- a/kernel/arm64/dznrm2_thunderx2t99.c
+++ b/kernel/arm64/dznrm2_thunderx2t99.c
@@ -27,7 +27,7 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include "common.h"
-
+#include <float.h>
#include <arm_neon.h>
#if defined(SMP)
@@ -404,7 +404,8 @@ FLOAT CNAME(BLASLONG n, FLOAT *x, BLASLONG inc_x)
#else
nrm2_compute(n, x, inc_x, &ssq, &scale);
#endif
- if (fabs(scale) <1.e-300) return 0.;
+ volatile FLOAT sca = fabs(scale);
+ if (sca < DBL_MIN) return 0.;
ssq = sqrt(ssq) * scale;
return ssq; |
Seems reasonable to pull that in as well. It is not surprising that we need more than one patch. |
I've added the M2 patch together with the actual fix, including the history information, and it seems to be passing. Thanks to both for your guidance. |
Not quite. You'll need to make a PR for Julia against branch https://github.com/JuliaLang/julia/tree/backports-release-1.10 to update the build of OpenBLAS, see for example instructions at JuliaLang/julia#52762 (comment). In your case you can skip second point because you aren't changing the upstream version. |
Should we be using
|
We are, see PATCH 6/8 (line 187) |
Ah thanks. I didn't realize that the patches are stacked on top of each other. |
Alright, I hope I did that correctly. (Perhaps the PR title should be changed). |
…uliaPackaging#8022) * add patch openblas#4003 to v0.3.23 * include e0f8b4fe in patch
Ref JuliaLang/LinearAlgebra.jl#1051
This should bring the fix from OpenMathLib/OpenBLAS#4003 to Julia 1.10