-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classes of unary operations in broadcast over sparse arrays #18309
Comments
Worth benchmarking whether the current implementation is any different performance-wise than a simpler implementation that combines classes 1 and 2, and calls |
I'd be in favor of getting rid of the first class, for the reasons you mention. |
Benchmarks below. tl;dr: (1) For most operations defined in class one (fourteen trigonometric and hyperbolic functions), merging class one into class two would improve default performance (i.e. when not expunging numerical zeros) by up to a few percent and degrade performance by less than that when also calling Thoughts on whether to trade the performance reductions for the greater behavioral consistency, reduced implementation complexity, and moderate performance improvements? Thanks! using BenchmarkTools
for (fn, desc) in (
(sin, ", representative of most functions in class one"),
(round, ", representative of remaining functions in class one (non-pathological case)"),
(floor, ", representative of the pathological case") )
println("Results for $(string(fn))$(desc)")
for N in [10^2, 10^3, 10^4]
A = sprand(N, N, 0.1) * (fn == floor ? 1 : 10)
nz2zbench = @benchmark Base.SparseArrays._broadcast_unary_nz2z_z2z($fn, $A)
nz2nzbench = @benchmark Base.SparseArrays._broadcast_unary_nz2nz_z2z($fn, $A)
nz2nzdzbench = @benchmark dropzeros!(Base.SparseArrays._broadcast_unary_nz2nz_z2z($fn, $A))
println("--> For size-$(size(A)) sparse matrix with $(nnz(A)) entries")
println("-- --> nz2nz versus nz2z")
println(judge(median(nz2nzbench), median(nz2zbench)))
println("-- --> nz2nz with dropzeros! versus nz2z")
println(judge(median(nz2nzdzbench), median(nz2zbench)))
end
println()
end yields Results for sin, representative of most functions in class one
--> For size-(100,100) sparse matrix with 956 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
time: -8.31% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
time: +4.27% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
--> For size-(1000,1000) sparse matrix with 100181 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
time: -4.44% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
time: +0.43% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
--> For size-(10000,10000) sparse matrix with 10002843 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
time: -3.93% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
time: +0.86% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
Results for round, representative of remaining functions in class one (non-pathological case)
--> For size-(100,100) sparse matrix with 1017 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
time: -25.20% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
time: +35.54% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
--> For size-(1000,1000) sparse matrix with 99922 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
time: -33.23% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
time: +40.89% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
--> For size-(10000,10000) sparse matrix with 9998682 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
time: -18.28% => improvement (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
time: +40.83% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
Results for floor, representative of the pathological case
--> For size-(100,100) sparse matrix with 944 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
time: +25.27% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
time: +89.14% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
--> For size-(1000,1000) sparse matrix with 99952 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
time: +247.06% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
time: +407.48% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
--> For size-(10000,10000) sparse matrix with 10001131 entries
-- --> nz2nz versus nz2z
BenchmarkTools.TrialJudgement:
time: +399.42% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
-- --> nz2nz with dropzeros! versus nz2z
BenchmarkTools.TrialJudgement:
time: +495.62% => regression (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance) |
The implementation of
broadcast
overSparseMatrixCSC
s separates unary functions into three classes (with disjoint logic): (1) unary functions that map zeros to zeros and may map nonzeros to zeros; (2) unary functions that map zeros to zeros and nonzeros to nonzeros; and (3) unary functions that map both zeros and nonzeros to nonzeros.In the first class, when a nonzero in the original sparse matrix maps to exact zero under, e.g.,
sin
orround
, the implementation expunges that entry.On the one hand, this behavior seems like a holdover from the aggressive-stored-zero-expunging era, now inconsistent with behavior elsewhere. Additionally, this behavior comes at the cost of a branch and additional communication on every iteration of an inner loop, the inability to apply an
@simd
decoration on that inner loop, and some additional code.On the other hand, in some cases, for example
broadcast
inground
over a sparse matrix with unity-balanced or standard normally distributed entries, a substantial fraction of the nonzeros might map to zero, and expunging those zeros may be advantageous for some applications (though perhaps not for others).Should this first class still exist? Thoughts? Best!
Ref. #17265.
The text was updated successfully, but these errors were encountered: