-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generic fallback for Base.LinAlg.BLAS: scal!, blascopy! ... #351
Comments
I think we have most of these already but with a different naming, e.g. |
Thank you, should have searched better. It is even faster than the BLAS translation: julia> @benchmark Base.LinAlg.BLAS.blascopy!(length(A),A,1,B,1)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 907
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 126.00 ns (0.00% GC)
median time: 128.00 ns (0.00% GC)
mean time: 133.54 ns (0.00% GC)
maximum time: 291.00 ns (0.00% GC)
julia> @benchmark copy!(A,B)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 973
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 73.00 ns (0.00% GC)
median time: 74.00 ns (0.00% GC)
mean time: 78.62 ns (0.00% GC)
maximum time: 458.00 ns (0.00% GC) I assumed BLAS was the fastest thing |
Hello, I think this might need some reopening. julia> @benchmark Base.LinAlg.BLAS.scal!(length(A),1.0,A,1)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 990
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 16.00 bytes
allocs estimate: 1
minimum time: 42.00 ns (0.00% GC)
median time: 44.00 ns (0.00% GC)
mean time: 45.65 ns (1.66% GC)
maximum time: 1.78 μs (96.25% GC)
julia> @benchmark scale!(1.0,A)
BenchmarkTools.Trial:
samples: 6602
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 623.05 μs (0.00% GC)
median time: 752.92 μs (0.00% GC)
mean time: 753.82 μs (0.00% GC)
maximum time: 1.04 ms (0.00% GC) |
Scaling with 1 takes the fast path in BLAS. |
Yes. This is pretty stupid and needs some cleanup. julia> @benchmark scale!(A,1.0)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 997
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 19.00 ns (0.00% GC)
median time: 19.00 ns (0.00% GC)
mean time: 19.35 ns (0.00% GC)
maximum time: 72.00 ns (0.00% GC)
julia> @benchmark scale!(1.0,A)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 9
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 0.00 bytes
allocs estimate: 0
minimum time: 2.50 μs (0.00% GC)
median time: 2.53 μs (0.00% GC)
mean time: 2.55 μs (0.00% GC)
maximum time: 10.84 μs (0.00% GC) |
Oh. That is interesting. |
@lopezm94 You probably know but what @KristofferC meant was that BLAS checks if the scaling factor is one and just returns if true. For all other values (all real use cases) Julia is much closer to the BLAS speed and should be even faster for small arrays. In any case, we should make sure that either none or both |
diff --git a/base/linalg/dense.jl b/base/linalg/dense.jl
index b0fbe81..a4324e8 100644
--- a/base/linalg/dense.jl
+++ b/base/linalg/dense.jl
@@ -10,6 +10,8 @@ const ASUM_CUTOFF = 32
const NRM2_CUTOFF = 32
function scale!{T<:BlasFloat}(X::Array{T}, s::T)
+ s == 0 && return fill!(X, zero(T))
+ s == 1 && return X
if length(X) < SCAL_CUTOFF
generic_scale!(X, s)
else
@@ -18,6 +20,8 @@ function scale!{T<:BlasFloat}(X::Array{T}, s::T)
X
end
+scale!{T<:BlasFloat}(s::T, X::Array{T}) = scale!(X, s)
+
scale!{T<:BlasFloat}(X::Array{T}, s::Number) = scale!(X, convert(T, s))
function scale!{T<:BlasComplex}(X::Array{T}, s::Real)
R = typeof(real(zero(T))) maybe |
Seems reasonable. |
In cases where one wants to use BLAS functions to reduce memory allocation, it is annoying that it is only defined for Arrays of BlasFloat types. Adding custom fallback operators inside packages is becoming increasingly more common for me, see IterativeSolvers.jl#79 and InplaceOps#9.
A change has already been done for axpy! (see #5189), but I think is now time for the rest of the functions, if its ok for me to open a PR about this.
The text was updated successfully, but these errors were encountered: