Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic fallback for Base.LinAlg.BLAS: scal!, blascopy! ... #351

Closed
lopezm94 opened this issue Jul 30, 2016 · 9 comments
Closed

Add generic fallback for Base.LinAlg.BLAS: scal!, blascopy! ... #351

lopezm94 opened this issue Jul 30, 2016 · 9 comments

Comments

@lopezm94
Copy link

In cases where one wants to use BLAS functions to reduce memory allocation, it is annoying that it is only defined for Arrays of BlasFloat types. Adding custom fallback operators inside packages is becoming increasingly more common for me, see IterativeSolvers.jl#79 and InplaceOps#9.

A change has already been done for axpy! (see #5189), but I think is now time for the rest of the functions, if its ok for me to open a PR about this.

@lopezm94 lopezm94 changed the title Add generic fallback for Blas.LinAlg: scal!, blascopy! ... Add generic fallback for Blas.LinAlg.BLAS: scal!, blascopy! ... Jul 30, 2016
@lopezm94 lopezm94 changed the title Add generic fallback for Blas.LinAlg.BLAS: scal!, blascopy! ... Add generic fallback for Base.LinAlg.BLAS: scal!, blascopy! ... Jul 30, 2016
@andreasnoack
Copy link
Member

I think we have most of these already but with a different naming, e.g. scale! and copy! should cover the functionality of BLAS.scal! and blascopy! and they have generic fallbacks. We couldn't come up with a new name for axpy! so in that case we use the BLAS name.

@lopezm94
Copy link
Author

lopezm94 commented Jul 30, 2016

Thank you, should have searched better. It is even faster than the BLAS translation:

julia> @benchmark Base.LinAlg.BLAS.blascopy!(length(A),A,1,B,1)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     907
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     126.00 ns (0.00% GC)
  median time:      128.00 ns (0.00% GC)
  mean time:        133.54 ns (0.00% GC)
  maximum time:     291.00 ns (0.00% GC)

julia> @benchmark copy!(A,B)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     973
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     73.00 ns (0.00% GC)
  median time:      74.00 ns (0.00% GC)
  mean time:        78.62 ns (0.00% GC)
  maximum time:     458.00 ns (0.00% GC)

I assumed BLAS was the fastest thing

@lopezm94
Copy link
Author

lopezm94 commented Aug 1, 2016

Hello, I think this might need some reopening. copy! is fine but scale! doesn't seem to be optimized with BLAS.scal! when possible

julia> @benchmark Base.LinAlg.BLAS.scal!(length(A),1.0,A,1)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     990
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  16.00 bytes
  allocs estimate:  1
  minimum time:     42.00 ns (0.00% GC)
  median time:      44.00 ns (0.00% GC)
  mean time:        45.65 ns (1.66% GC)
  maximum time:     1.78 μs (96.25% GC)

julia> @benchmark scale!(1.0,A)
BenchmarkTools.Trial: 
  samples:          6602
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     623.05 μs (0.00% GC)
  median time:      752.92 μs (0.00% GC)
  mean time:        753.82 μs (0.00% GC)
  maximum time:     1.04 ms (0.00% GC)

@KristofferC
Copy link
Member

Scaling with 1 takes the fast path in BLAS.

@andreasnoack
Copy link
Member

Yes. This is pretty stupid and needs some cleanup.

julia> @benchmark scale!(A,1.0)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     997
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     19.00 ns (0.00% GC)
  median time:      19.00 ns (0.00% GC)
  mean time:        19.35 ns (0.00% GC)
  maximum time:     72.00 ns (0.00% GC)

julia> @benchmark scale!(1.0,A)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     9
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     2.50 μs (0.00% GC)
  median time:      2.53 μs (0.00% GC)
  mean time:        2.55 μs (0.00% GC)
  maximum time:     10.84 μs (0.00% GC)

@KristofferC
Copy link
Member

Oh. That is interesting.

@lopezm94 lopezm94 reopened this Aug 1, 2016
@andreasnoack
Copy link
Member

@lopezm94 You probably know but what @KristofferC meant was that BLAS checks if the scaling factor is one and just returns if true. For all other values (all real use cases) Julia is much closer to the BLAS speed and should be even faster for small arrays. In any case, we should make sure that either none or both scale! methods call BLAS when beneficial. If we want to track that here then we should adjust the issue title.

@KristofferC
Copy link
Member

diff --git a/base/linalg/dense.jl b/base/linalg/dense.jl
index b0fbe81..a4324e8 100644
--- a/base/linalg/dense.jl
+++ b/base/linalg/dense.jl
@@ -10,6 +10,8 @@ const ASUM_CUTOFF = 32
 const NRM2_CUTOFF = 32

 function scale!{T<:BlasFloat}(X::Array{T}, s::T)
+    s == 0 && return fill!(X, zero(T))
+    s == 1 && return X
     if length(X) < SCAL_CUTOFF
         generic_scale!(X, s)
     else
@@ -18,6 +20,8 @@ function scale!{T<:BlasFloat}(X::Array{T}, s::T)
     X
 end

+scale!{T<:BlasFloat}(s::T, X::Array{T}) = scale!(X, s)
+
 scale!{T<:BlasFloat}(X::Array{T}, s::Number) = scale!(X, convert(T, s))
 function scale!{T<:BlasComplex}(X::Array{T}, s::Real)
     R = typeof(real(zero(T)))

maybe

@andreasnoack
Copy link
Member

Seems reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants