Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure in LinearAlgebra/lapack.jl in comparing LAPACK.sytri! to inv for an almost-symmetric matrix #1101

Open
jishnub opened this issue Oct 20, 2024 · 9 comments
Labels
ci Continuous integration

Comments

@jishnub
Copy link
Collaborator

jishnub commented Oct 20, 2024

In the aarch64-apple-darwin job https://buildkite.com/julialang/julia-master/builds/41270#0192a51c-3afb-4a6f-a53f-3c3b9f537ed4,
there is

The global RNG seed was 0xf3fab14bb37f52ec47cdcf1f85e908bd.
Error in testset LinearAlgebra/lapack:
Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-master/julia-d3cd3f4a3a/share/julia/stdlib/v1.12/LinearAlgebra/test/lapack.jl:441
  Expression: (triu(inv(A)), triu(LAPACK.sytri!('U', B, ipiv)), rtol = eps(cond(A)))
   Evaluated: Float32[1.1648309 0.25645927  -0.82741505 0.3340336; 0.0 -0.6714291  0.5142702 -0.2888328;  ; 0.0 0.0  0.086971946 0.14950116; 0.0 0.0  0.0 0.38216618]  Float32[1.164831 0.25645974  -0.82741505 0.3340336; 0.0 -0.6714287  0.5142697 -0.2888328;  ; 0.0 0.0  0.08697438 0.1495005; 0.0 0.0  0.0 0.3821662] (rtol=1.9073486e-6)
ERROR: LoadError: Test run finished with errors
in expression starting at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-master/julia-d3cd3f4a3a/share/julia/test/runtests.jl:100
ERROR: A test has failed. Please submit a bug report (https://github.com/JuliaLang/julia/issues)
including error messages above and the output of versioninfo():
Julia Version 1.12.0-DEV.1426
Commit d3cd3f4a3aa (2024-10-19 14:08 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 8 × Apple M2
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Environment:
  JULIA_INSTALL_DIR = julia-d3cd3f4a3a
  JULIA_SHELL = /bin/bash
  JULIA_CPU_TARGET = generic;apple-m1,clone_all
  JULIA_TEST_MAXRSS_MB = 3800
  JULIA_CMD_FOR_TESTS = julia-d3cd3f4a3a/bin/julia .buildkite/utilities/timeout.jl julia-d3cd3f4a3a/bin/julia
  JULIA_TEST_VERBOSE_LOGS_DIR = /private/var/tmp/agent-tempdirs/default-honeycrisp-HL2F7YQ3XH.0/tmp/jl_JXug2a
  JULIA_IMAGE_THREADS = 4
  JULIA_BINARYDIST_FILENAME = julia-d3cd3f4a3a-macaarch64
  JULIA_CPU_THREADS = 4
  JULIA_NUM_THREADS = 1
  JULIA_VERSION = 1.12.0-DEV
  JULIA_TEST_IS_BASE_CI = true

Unfortunately, I am not being able to replicate it locally using the same seed but on a different platform (x86_64-linux-gnu). The issue seems to be related to floating-point accuracy.

@jishnub jishnub added the ci Continuous integration label Oct 20, 2024
@giordano
Copy link
Contributor

The failure reported above is reproducible for me on both aarch64-darwin and aarch64-linux with Base.runtests("LinearAlgebra/lapack"; seed=0xf3fab14bb37f52ec47cdcf1f85e908bd), I guess it depends on the OpenBLAS kernel used.

@giordano
Copy link
Contributor

giordano commented Oct 20, 2024

Minimal reproducer should be

using LinearAlgebra
A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
A = A + transpose(A); #symmetric!
B = copy(A);
B,ipiv = LAPACK.sytrf!('U',B);
isapprox(triu(inv(A)), triu(LAPACK.sytri!('U',B,ipiv)); rtol=eps(cond(A)))

which fails for me also on x86_64-linux. BTW, the testset does fail for me on

julia> versioninfo()
Julia Version 1.12.0-DEV.1431
Commit fc40e629b1d (2024-10-18 19:33 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 384 × AMD EPYC 9654 96-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 384 virtual cores)

This system is using Cooperlake OpenBLAS core type:

julia> using OpenBLAS_jll, LinearAlgebra

julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.28  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Cooperlake MAX_THREADS=512"

For what is worth:

using LinearAlgebra
A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
A = A + transpose(A); #symmetric!
B = copy(A);
B,ipiv = LAPACK.sytrf!('U',B);
c1 = triu(inv(A));
c2 = triu(LAPACK.sytri!('U',B,ipiv));
@show norm(c1 - c2) / max(norm(c1), norm(c2))
@show eps(cond(A))

results in

norm(c1 - c2) / max(norm(c1), norm(c2)) = 2.119341f-6
eps(cond(A)) = 1.9073486f-6

@jishnub
Copy link
Collaborator Author

jishnub commented Oct 20, 2024

Locally, I obtain

julia> using LinearAlgebra

julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];

julia> A = A + transpose(A); #symmetric!

julia> B = copy(A);

julia> B,ipiv = LAPACK.sytrf!('U',B);

julia> isapprox(triu(inv(A)), triu(LAPACK.sytri!('U',B,ipiv)); rtol=eps(cond(A)))
true
norm(c1 - c2) / max(norm(c1), norm(c2)) = 8.0597636f-7
eps(cond(A)) = 1.9073486f-6
julia> using OpenBLAS_jll, LinearAlgebra

julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.28  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=512"

julia> versioninfo()
Julia Version 1.12.0-DEV.1438
Commit e08280a24fb (2024-10-20 01:04 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_EDITOR = subl

@aravindh-krishnamoorthy
Copy link
Contributor

aravindh-krishnamoorthy commented Oct 24, 2024

Adding a few top level tests, which basically make the same LAPACK calls:

julia> using OpenBLAS_jll, LinearAlgebra
julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.27  USE64BITINT DYNAMIC_ARCH NO_AFFINITY SkylakeX MAX_THREADS=512"

julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
julia> A = A + transpose(A); #symmetric!
julia> B = bunchkaufman(A);
julia> R = bunchkaufman(A,true); # rook pivoting

julia> isapprox(inv(B), inv(A); rtol=eps(cond(A)))
false
julia> isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A)))
true
julia> isapprox(inv(R), inv(A); rtol=eps(cond(A)))
true
  • Btw, in which version / configuration does this test pass? Haswell?

@jishnub
Copy link
Collaborator Author

jishnub commented Oct 25, 2024

Yes, the test passes for me on Haswell:

julia> using OpenBLAS_jll, LinearAlgebra

julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.28  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=512"

julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];

julia> A = A + transpose(A); #symmetric!

julia> B = bunchkaufman(A);

julia> R = bunchkaufman(A,true); # rook pivoting

julia> isapprox(inv(B), inv(A); rtol=eps(cond(A)))
true

julia> isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A)))
true

julia> isapprox(inv(R), inv(A); rtol=eps(cond(A)))
true

@aravindh-krishnamoorthy
Copy link
Contributor

Yes, the test passes for me on Haswell:

Thank you very much for the confirmation! Could you kindly compare your outputs to mine in the JLD2 file below?

The JLD2 file was generated in Julia as follows:

julia> using JLD2
julia> save("56255-1.jld2",Dict("A" => A, "B" => B, "R" => R, "inv(B)" => inv(B), "inv(A)" => inv(A), "inv(R)" => inv(R),"cond(A)" => cond(A), "eps(cond(A))" => eps(cond(A))))

And its contents can be inspected as follows:

julia> using JLD2
julia> L = load("56255-1.jld2")
Dict{String, Any} with 8 entries:
  "cond(A)"      => 31.674
  "B"            => BunchKaufman{Float32, Matrix{Float32}, Vector{Int64}}(Float32[0.858494 -0.220169  0.218334 0.84004
  "A"            => Float32[1.03302 1.13542  1.28012 1.48514; 1.13542 0.437221  0.911591 0.804944;  ; 1.28012 0.9115
  "inv(R)"       => Float32[1.16483 0.256459  -0.827415 0.334034; 0.256459 -0.671429  0.51427 -0.288833;  ; -0.82741
  "inv(B)"       => Float32[1.16483 0.25646  -0.827415 0.334034; 0.25646 -0.671429  0.51427 -0.288833;  ; -0.827415 
  "inv(A)"       => Float32[1.16483 0.256459  -0.827415 0.334033; 0.256459 -0.671429  0.51427 -0.288833;  ; -0.82741
  "R"            => BunchKaufman{Float32, Matrix{Float32}, Vector{Int64}}(Float32[1.92042 0.663282  1.42713 0.84004; 1
  "eps(cond(A))" => 1.90735f-6

julia> L["cond(A)"]
31.67395f0

Although in my case isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A))) is true, I suspect that the problem might lie in B already, i.e., SSYTRF. Will be interesting to know (+ I've never looked at an S- version in detail so far).

JLD2 File
  • Copy the below text to clipboard.
  • Issue $ uudecode | bunzip2 > 56255-1.jld2
  • Paste the clipboard contents into the console.
  • Load the .jld2 file as indicated above.
begin-base64 644 -
QlpoOTFBWSZTWVWJmgEABwj/////////////////////////v//f///7////
//3//+//4AmW+AAAFVxbePZR3s3FuXZ7bnIanhoiAptEyNNU9PREeUb0U9Am
1MJkw1HijI9T1PUG0nqbUaeRPUeoD0gxM1Hqfqaj0aaEGxqnpD0ZT1PKbU0G
1Bp5Ro2psp6EekbEmnpPJBjKeUNNPSeo8UGkppoFP1R6g8mjKflBPU009NR+
pMj1NAG1ANND1DQPUAAejUNDTQGgA0A9QNAGQAAANANGgAANAPUBoDQ0BoBq
aUGGlMaKbCmh6jajTQepkHqNB6R6j1AAGhoADQABoADQAAAAAAAA0GgAAGgA
AAAAAAGpoEKT8pHlPUNDGppo0ND1AA0ADQAA0AAAAGg0AADQAGQAPUAD1AAA
AAAAAAAA0AEAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaBJEIR
kRo0lPTek2oU3qbJT9U080k9GpoGelPRtU/TVMT9SehGmTIPFNpP1T1HqNNl
PUaYQbSHqeUGgeoAaPU0yDQwIYRpkGyT1HqAyGmgA9I8o0izhLSh6C7BrpFk
NrY1gICwGCoAP+DzWJCTVImKVMJeoN6s3iW0OmpGkqxB4RIKflWyAPwYCjRa
QbRy8Bk3TaNzzxqMaHxMBPmGCfgCBjZ/IorScIzCoxc2czjBg312gnx8eFrA
c4xTNO0uepbTPqOReBpmKARPGAX9bh8XgX8xE82wLkIGexGZwEYPH7OBYZIF
hMAi9AZAwyQZPlrlXIxkkZQdUBRdjnKs7hTSSFqGFPd0F5POw1pqSZh6K4zY
1WJM8khFS1IYt1YcgRNOIdbDrXVo1yegTJAi8EDAhUEjjDHCwBJIkhgEGHO4
cddcBXcCnCtyCi7odDYwNHeXM/n5qUphhMUicALINnAGSkyAZhIC+TIoEyEI
p1QDgDwTgk5O5rkzMjkmV1ZxNUVVSJ2BK43LteDBjbd1Y02RKwBSBMkF6MwY
2NBV2GaXNVp7QSZJVE0IpnGURsQnhBAdAwIxAC9M974UYbGcWnIkMNPvWDQC
GIIYGTIUyaBBwBxMIvGEwwk1A7auaIXSVRM8mEQGGOLEFyERMbASGQMyBTsk
v6Xr72cHRWQMBV9p1AK7v47oHEyQtaySBxkhhYSc3eD9IUAV8wbqs4iAUV4C
gIdxCjTVNKWoD1h6c2BkG2FNqtC2aoGUGNywNYvrwbioEGnYGHciFr7fadv2
+ug/Wbn3LdY9ytUVfg778LbsOFa4dtFsruHILERCBU2CBNBRiCYgG+CK+7Lw
RZJflitWoJyEBA8UdKpERMgYlJM4nWMZmt5XaTlC+ryRYLPfilNU5gELIRD/
uOfnv9HJWLf7CFTRemHMJsf54t/1fCD/AautRijBUwmzrOhmRHAeHzL94F6a
t5uUGd3OVZ4Qa/ic/TEKATB+ouBhhDzNktMbXITdCU1C8wYBbiWg0QFqklMz
p/H7TqSi1vNSmY9dy/OmqRmgMcAM2hWYGO1IDRSa7Ps8Slppm7fglvGR1IMG
QlAZRUTcDfc1WAtMBwQ4mVr/FuYM4B6nprJxPo2OcFfSNzrN34eWF8uu2uSA
IjHMYQF2AmVVUwL3GZs+KYX8JJIMtN3NN6UxHy0WAFx4ZXZChecpeIuILoD8
n18Oxc935rD35XqDb/jDyHSi6MPDySfCCv/GAqpjIuHcXvmwGgpKDj3aOzRV
gtw02hdnzLCR8FaJzDCamZd7sjJ8NsMufYgbaQr4NEqKBO02SkJ/wuIXtww8
bv/XBoqy6vNrsxgHBRJDTXhxq1r9ot0bqV+9sH0iM1lBImArFvfEDoEoEn10
PTlR9xUHi2xm+YG9dKuHPNY0D6WQsdeVC2T9XpChBTxVd82ij6cIxI1wxOOC
alnndhcWLdVlMb8gRSNjr+JyL5QgU2ZbgAnDqKepQ0zMXbF0EXkDNNUxyBa/
LwGzJV5404XkAGXDK97onplC2SoVMwZYAdpFGokPKHUamShaci1kFp7Zd1Ea
ZP8NYUMqIom6MeZwgdvULQIs8UPvqTPuMHGuIIt+a9V4J055GDQXseMTI0Qp
obGHewiP3MyWg6/yIYPEPUIEQ9DpfZCG3YwkE2f6PH8sUG1qfwkSR21oQSZO
FOThfGYhxi2I66jKVDmqK4UuBMY1flaKqQqS9Jbw9GGJk+6UI8duse/j4ygP
mX8ZHTmHJ8UAvojBeSvHDC/ELbuIeQLRQhdbOe3tLOUNwYda4sgWtv9G492y
mJh3y2pbnSGor/pd9EqKJs2JhX9yPSDIRpP4p63RXR+kJNhIUZX33F32rKoQ
G5qtrwKEzOijDIx6vc6CGbE0QJ01FKRLCRxQWk83aoFQTNzF0aSUHNKZJyKY
uiVE+FKUyH4IerlYU3Jko6MJd8PqlCyaaeOXJvswXll5oI6HRz+Xx5eZSimH
5IyhWYEAF60H6AlXIX6CqvnAXW/qq69WjAVLYjlEAU4qspr3I5A7JZdz+ee1
VMREOk7mRoAIJMCIa1wDAYtUY8mzIEPCsotMOoE18CmeEyTwEfdxhy+FGax4
31+bRrII/LpbvR0lr+OhVMWCkDAS6IsPRE30DVIQQxF+JNHuRNpUkYulhRzI
drx8eF2YtLZbAkbzZlgM6Fo2Xivgw4ZSsA1Sf5vwUwzc81chE5ip3NuVvwQW
IV0Zky1QRWttwFJmaz/UxZHxYgFworb0aE6eyAhPJHxzLp3hQFVikwvkpbvO
JNqcrOi5YECDyogsQDOGolKTwyIfhRxSvOpXlmQfMX4l9c8dkbuuQZ6GqMo7
ciC9mULEYktWA2GaQH3T4EOyNJMNozeyMmcJsBaQaChGFd06NJE46KZiEQ85
5NAr3h7t+HkWastxMQxBeJ2QoWUC5KIGdE84mD3nhnVGfDSfRnGCkOC1piu9
5TbQBjVMKPItCxMQQTBI1IDrIQrou8xgwSAYhrKdx1kjmE6eWo9FKJDoGhAq
I2cBoOiAkcP1lnQF8iEkIrEBEmiKgsgTxAkTDBRpDebCTPoINqWtjMxhQWcF
aU28eQANLY7uXQbwPC2OOe5wrZrBh42GCRVp78rGSZekRtK2jrg6sQ/5k/Qd
/WxJp52+Rb8I3HUfZ/S7HfUHwOjqPBAzMLI0gbL4SYxU7vQFLc4t+wdOHs5F
rrwKBkgzE1YJhS7BGq2FZOtKgNZOd+jqvh1N8w3tkB9yz6izF3ngH0ezcgcJ
q9ApKPx8CQJm0irpM2y0y5jtQdPAPl4SY+V7eh/T1XrC6IXuA78Mpjpv50I8
LMXyRfKDIsYJp9Y686V9WGi2/Djnj2ux8vYzeXygoUbJvvAhK9J+CFHZehlB
E7Ku/1g2YRDYzbb+0Z2icpFDW284wZEyzJxxE+PH1JUF3gJC8kaD7AcBA4Hr
ZmrR4oYkid21YHAKbcmIxSA+b5FWAtqouQMaWzdN0VGsYMFQEQmL2U0CsoV7
qqtXbJvpoRgxg5ZVqO5FNtQofvA63mODGoPJCjzbKPNh3f6N6c8G6x6EwhsS
CRpsHPhSXVR/WVFFH+xPNareU36YwahEpvo3Txmd98cyYuNYhFyCVLWZq0xc
RJ1IL+F7qMrN937+rjnObBz4QhVJpkrFgjI2poQbYF/Lc1kRyowncCSOcE6U
CIniRQThgiJx2FS0MpIjtXUn1TFABXE1fMuM1fzKBcJ7aDEPb1FmgBIhq10z
pSzyAQKNj1Otl/XYhgaoQqHhfJe4ifnwncToNejISamyAUAv67M4a47TArXc
BBwMVZwHNNwNKlHJLdL1Tj1UuRkjeEeZtEDwlcRK9rKVgLaye5bsGMG0ZRPJ
Vei4eAMbVXirfy6r0+eAtBPEFgRVPFYpCkPjOnE5wqeq4ZpWAlTosbYP9CjY
ohnjFHNZYBJCfkhogIUCy57bBEaDrXHa6s6WAZmIO8f4EROiyH5g9vRxuY/8
AYyxdEGX2X3NxbNeFJZGr+NuAfYazut16/eAszhv48+eKF6tuJvm3EuTUMU6
gmOchHWZYBZJqAGho/2EDHLAcYDHesbnbbUVOpKGG1RwD1CbRb0NfmBrQ1Qa
eq0K0HLZu3hc2rnrg+wF718ac/RtAaIWfSnA/2dTL2n9aTtWKuZwklP0DA0k
gGehtnfUtqUrVjknIsIa37MY0P3qrj/i7kinChIKsTNAIA==
====

@jishnub
Copy link
Collaborator Author

jishnub commented Oct 29, 2024

Here's what I obtain:

Comparison between matrices
julia> L["cond(A)"] == cond(A)
true

julia> L["A"] == A
true

julia> L["inv(A)"] == inv(A)
false

julia> L["inv(A)"] - inv(A)
10×10 Matrix{Float32}:
 -1.19209f-7  -2.38419f-7   1.78814f-7  -1.78814f-7   5.96046f-8  -2.68221f-7  -1.19209f-7   2.98023f-7   4.17233f-7   0.0
 -1.19209f-7   1.78814f-7   1.19209f-7   0.0         -1.78814f-7   1.49012f-7  -2.98023f-8  -2.08616f-7  -5.96046f-8   2.98023f-8
  1.19209f-7   1.19209f-7  -2.08616f-7   2.98023f-8   5.96046f-8   1.71363f-7   4.47035f-8  -1.19209f-7  -1.19209f-7  -1.3411f-7
 -2.98023f-8  -1.78814f-7   2.98023f-8   1.19209f-7   5.96046f-8  -5.96046f-8  -5.96046f-8   5.96046f-8  -5.30854f-8   2.98023f-8
  5.96046f-8  -1.78814f-7   2.98023f-8   2.98023f-8   5.96046f-8  -2.38419f-7  -5.96046f-8   1.93715f-7   1.19209f-7   5.96046f-8
 -7.45058f-9   1.19209f-7   0.0         -5.96046f-8  -5.96046f-8   1.19209f-7   2.98023f-8  -9.68575f-8  -5.96046f-8   0.0
  5.96046f-8   8.9407f-8   -7.45058f-8   8.9407f-8    0.0         -2.98023f-8   0.0         -8.9407f-8   -1.19209f-7   8.9407f-8
  5.96046f-8  -1.19209f-7  -5.96046f-8   5.96046f-8   2.98023f-8  -2.23517f-8  -2.98023f-8   5.02914f-8   2.98023f-8   4.47035f-8
  1.19209f-7   5.96046f-8  -1.78814f-7  -4.93601f-8   1.19209f-7   2.98023f-7   5.96046f-8  -8.9407f-8   -1.49012f-7  -2.08616f-7
 -5.96046f-8  -5.96046f-8   2.98023f-8   0.0         -1.49012f-8   5.96046f-8  -4.47035f-8   1.49012f-8   5.96046f-8   0.0

julia> L["B"].U - B.U
10×10 Matrix{Float32}:
 0.0  -1.78814f-7   2.98023f-7  0.0         -1.19209f-7  -2.38419f-7  -1.49012f-7  -2.98023f-7   4.47035f-8  0.0
 0.0   0.0         -1.19209f-7  2.98023f-8   1.78814f-7   1.19209f-7   1.19209f-7   1.19209f-7   1.49012f-8  0.0
 0.0   0.0          0.0         5.96046f-8   5.96046f-8   0.0         -1.19209f-7  -1.19209f-7   1.19209f-7  0.0
 0.0   0.0          0.0         0.0         -2.38419f-7  -1.19209f-7   0.0         -1.19209f-7  -5.96046f-8  0.0
 0.0   0.0          0.0         0.0          0.0          7.17118f-8  -5.96046f-8  -2.98023f-8   2.98023f-8  0.0
 0.0   0.0          0.0         0.0          0.0          0.0         -1.19209f-7  -1.19209f-7   1.19209f-7  0.0
 0.0   0.0          0.0         0.0          0.0          0.0          0.0          1.19209f-7   1.19209f-7  0.0
 0.0   0.0          0.0         0.0          0.0          0.0          0.0          0.0          0.0         0.0
 0.0   0.0          0.0         0.0          0.0          0.0          0.0          0.0          0.0         0.0
 0.0   0.0          0.0         0.0          0.0          0.0          0.0          0.0          0.0         0.0

julia> L["B"].D - B.D
10×10 Tridiagonal{Float32, Vector{Float32}}:
 3.57628f-7   0.0                                                                            
 0.0         -4.76837f-7  0.0                                                                 
             0.0         0.0  0.0                                                             
                        0.0  9.53674f-7   0.0                                                 
                            0.0         -5.96046f-8  0.0                                      
                                        0.0         4.76837f-7  0.0                           
                                                   0.0         0.0  0.0                       
                                                              0.0  5.96046f-8   0.0           
                                                                  0.0         -5.96046f-8  0.0
                                                                              0.0         0.0

julia> L["inv(B)"] - inv(B)
10×10 Matrix{Float32}:
 -4.76837f-7   1.19209f-7   8.34465f-7  -5.96046f-8  -7.7486f-7    7.45058f-9   5.36442f-7  -1.04308f-6   1.13249f-6  -7.15256f-7
  1.19209f-7   3.57628f-7  -2.08616f-7   0.0         -1.78814f-7   4.47035f-8   8.9407f-8   -2.98023f-8  -2.38419f-7   5.96046f-8
  8.34465f-7  -2.08616f-7   1.66893f-6   2.98023f-8  -6.55651f-7  -9.08971f-7  -2.38419f-7  -2.92063f-6   1.84774f-6  -6.25849f-7
 -5.96046f-8   0.0          2.98023f-8   5.96046f-8   5.96046f-8  -5.96046f-8  -2.98023f-8  -1.49012f-7   1.46218f-7   5.96046f-8
 -7.7486f-7   -1.78814f-7  -6.55651f-7   5.96046f-8   5.36442f-7   4.76837f-7   2.98023f-7   1.72853f-6  -7.15256f-7   8.9407f-8
  7.45058f-9   4.47035f-8  -9.08971f-7  -5.96046f-8   4.76837f-7   3.57628f-7  -1.78814f-7   1.57952f-6  -1.19209f-6   4.76837f-7
  5.36442f-7   8.9407f-8   -2.38419f-7  -2.98023f-8   2.98023f-7  -1.78814f-7  -4.17233f-7  -5.96046f-8  -2.98023f-7   3.12924f-7
 -1.04308f-6  -2.98023f-8  -2.92063f-6  -1.49012f-7   1.72853f-6   1.57952f-6  -5.96046f-8   5.24521f-6  -3.12924f-6   1.08778f-6
  1.13249f-6  -2.38419f-7   1.84774f-6   1.46218f-7  -7.15256f-7  -1.19209f-6  -2.98023f-7  -3.12924f-6   1.78814f-6  -7.7486f-7
 -7.15256f-7   5.96046f-8  -6.25849f-7   5.96046f-8   8.9407f-8    4.76837f-7   3.12924f-7   1.08778f-6  -7.7486f-7    5.36442f-7

julia> L["inv(R)"] - inv(R)
10×10 Matrix{Float32}:
  0.0          2.08616f-7  -5.96046f-8   3.8743f-7    2.98023f-8  -5.02914f-7   0.0          1.19209f-7  -1.78814f-7   2.68221f-7
  2.08616f-7   2.38419f-7  -1.19209f-7   1.78814f-7   0.0         -1.78814f-7  -2.98023f-8  -7.45058f-8  -2.38419f-7   5.96046f-8
 -5.96046f-8  -1.19209f-7   1.04308f-7  -2.38419f-7  -5.96046f-8   1.56462f-7   4.47035f-8  -1.19209f-7   2.38419f-7  -1.3411f-7
  3.8743f-7    1.78814f-7  -2.38419f-7   1.78814f-7   1.49012f-7  -5.96046f-8  -1.78814f-7   2.98023f-8  -3.51109f-7   1.78814f-7
  2.98023f-8   0.0         -5.96046f-8   1.49012f-7   1.19209f-7  -1.19209f-7  -6.70552f-8   1.49012f-7  -1.19209f-7   4.47035f-8
 -5.02914f-7  -1.78814f-7   1.56462f-7  -5.96046f-8  -1.19209f-7   3.57628f-7   1.49012f-7  -1.49012f-8   3.27826f-7  -5.96046f-8
  0.0         -2.98023f-8   4.47035f-8  -1.78814f-7  -6.70552f-8   1.49012f-7   5.96046f-8  -1.19209f-7   2.98023f-8   5.96046f-8
  1.19209f-7  -7.45058f-8  -1.19209f-7   2.98023f-8   1.49012f-7  -1.49012f-8  -1.19209f-7   1.49012f-7  -1.19209f-7   5.96046f-8
 -1.78814f-7  -2.38419f-7   2.38419f-7  -3.51109f-7  -1.19209f-7   3.27826f-7   2.98023f-8  -1.19209f-7   3.20375f-7  -5.96046f-8
  2.68221f-7   5.96046f-8  -1.3411f-7    1.78814f-7   4.47035f-8  -5.96046f-8   5.96046f-8   5.96046f-8  -5.96046f-8  -2.38419f-7

julia> L["R"].U - R.U
10×10 Matrix{Float32}:
 0.0  0.0  1.78814f-7   1.86265f-7  -2.79397f-8   1.49012f-7   0.0          0.0         0.0  0.0
 0.0  0.0  5.96046f-8  -2.38419f-7   0.0         -1.19209f-7   2.98023f-8  -5.96046f-8  0.0  0.0
 0.0  0.0  0.0         -4.47035f-8   5.21541f-8   0.0          3.72529f-8   0.0         0.0  0.0
 0.0  0.0  0.0          0.0          8.3819f-8    8.9407f-8    5.96046f-8   7.45058f-9  0.0  0.0
 0.0  0.0  0.0          0.0          0.0          0.0          8.9407f-8    0.0         0.0  0.0
 0.0  0.0  0.0          0.0          0.0          0.0         -5.96046f-8   0.0         0.0  0.0
 0.0  0.0  0.0          0.0          0.0          0.0          0.0          0.0         0.0  0.0
 0.0  0.0  0.0          0.0          0.0          0.0          0.0          0.0         0.0  0.0
 0.0  0.0  0.0          0.0          0.0          0.0          0.0          0.0         0.0  0.0
 0.0  0.0  0.0          0.0          0.0          0.0          0.0          0.0         0.0  0.0

julia> L["R"].D - R.D
10×10 Tridiagonal{Float32, Vector{Float32}}:
 -1.19209f-7   0.0                                                                      
  0.0         -2.38419f-7   0.0                                                          
              0.0         -5.96046f-8   0.0                                              
                          0.0         -5.96046f-8  0.0                                   
                                      0.0         7.45058f-8   0.0                       
                                                 0.0         -2.98023f-8  0.0            
                                                             0.0         0.0  0.0        
                                                                        0.0  0.0  0.0    
                                                                            0.0  0.0  0.0
                                                                                0.0  0.0

@aravindh-krishnamoorthy
Copy link
Contributor

Here's what I obtain:

Comparison between matrices

Thank you very much again, @jishnub. Please see below for some general points and some points on your above comparison.

Regarding your comparison, I see (analysis below) that when the deltas to BunchKaufman's D and U are applied on my PC, the inversion seems to satisfy the current rtol=eps(cond(A)), indicating that the issue might lie in (X)SYTRF already. NOTE that there are no tests in test/lapack.jl presently for (X)SYTRF. Tests are only for the combination with (X)SYTRI.

Analysis of the comparison
# Original A matrix from above
julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
julia> A = A + transpose(A); #symmetric!

# Load the JLD file as well as define the deltas (Haswell to SkylakeX).
julia> using LinearAlgebra, JLD2
julia> L = load("56255-1.jld2");
julia> B = L["B"];
julia> dU = Matrix{Float32}([
       0.0  -1.78814f-7   2.98023f-7  0.0         -1.19209f-7  -2.38419f-7  -1.49012f-7  -2.98023f-7   4.47035f-8  0.0
        0.0   0.0         -1.19209f-7  2.98023f-8   1.78814f-7   1.19209f-7   1.19209f-7   1.19209f-7   1.49012f-8  0.0
        0.0   0.0          0.0         5.96046f-8   5.96046f-8   0.0         -1.19209f-7  -1.19209f-7   1.19209f-7  0.0
        0.0   0.0          0.0         0.0         -2.38419f-7  -1.19209f-7   0.0         -1.19209f-7  -5.96046f-8  0.0
        0.0   0.0          0.0         0.0          0.0          7.17118f-8  -5.96046f-8  -2.98023f-8   2.98023f-8  0.0
        0.0   0.0          0.0         0.0          0.0          0.0         -1.19209f-7  -1.19209f-7   1.19209f-7  0.0
        0.0   0.0          0.0         0.0          0.0          0.0          0.0          1.19209f-7   1.19209f-7  0.0
        0.0   0.0          0.0         0.0          0.0          0.0          0.0          0.0          0.0         0.0
        0.0   0.0          0.0         0.0          0.0          0.0          0.0          0.0          0.0         0.0
        0.0   0.0          0.0         0.0          0.0          0.0          0.0          0.0          0.0         0.0]);
julia> dD = Matrix{Float32}([
       3.57628f-7   0.0          0    0            0           0           0    0            0           0
        0.0         -4.76837f-7  0.0   0            0           0           0    0            0           0
         0           0.0         0.0  0.0           0           0           0    0            0           0
         0            0          0.0  9.53674f-7   0.0          0           0    0            0           0
         0            0           0   0.0         -5.96046f-8  0.0          0    0            0           0
         0            0           0    0           0.0         4.76837f-7  0.0   0            0           0
         0            0           0    0            0          0.0         0.0  0.0           0           0
         0            0           0    0            0           0          0.0  5.96046f-8   0.0          0
         0            0           0    0            0           0           0   0.0         -5.96046f-8  0.0
         0            0           0    0            0           0           0    0           0.0         0.0]);

# The original algorithm fails the current rtol
julia> isapprox(inv(B), inv(A); rtol=eps(cond(A)))
false                                                                                                                                                                                                                                                   julia> norm(inv(B)-inv(A))
8.977805f-6

# However, applying the diagonal and U deltas passes the test!
julia> B1 = BunchKaufman(B.LD + dD + dU, B.ipiv, 'U', true, false, 0);
julia> isapprox(inv(B1), inv(A); rtol=eps(cond(A)))
true
julia> norm(inv(B1)-inv(A))
4.343118f-6

Outcome:

  • Unfortunately, I could not yet decide if I should look further at lapack.jl and SSYTRF to isolate the problem or recommend increasing the rtol value analogous to (X)GELSD/Y.
  • Given that isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A))) is true, the tolerance is likely too tight.
  • Computing an rtol bound for Bunch-Kaufman decomposition-based inversion and utilizing it to revise the current rtol is perhaps the best way to go.
  • I might have to think a bit further on this.

@aravindh-krishnamoorthy
Copy link
Contributor

aravindh-krishnamoorthy commented Nov 7, 2024

Extending the analysis in [1] given below for inversion, it seems that the required rtol can be significantly higher than eps(cond(A)).

The following script (for Gaussian matrices) seems to agree with this.

function worstN(N)
R = zeros(N,1)
E = zeros(N,1)
B = zeros(Bool,N,1)
for i in 1:N
    A = (X->(X+X')/2)(Float32.(randn(4,4)))
    A1 = inv(A)
    A2 = inv(bunchkaufman(A))
    R[i] = norm(A1-A2)/max(norm(A1), norm(A2))
    E[i] = eps(cond(A))
    B[i] = isapprox(A1, A2; rtol=eps(cond(A)))
end
maximum(R./E)
end

for i = 1:100 display(worstN(1000000)); end

Note: Also applies to Float64.

[1] N. J. Higham, "Stability of the Diagonal Pivoting Method with Partial Pivoting," Report no. 265, https://www.netlib.org/lapack/lawnspdf/lawn105.pdf

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration
Projects
None yet
Development

No branches or pull requests

3 participants