-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix v[i] = -0.0
and support reinterpret
#296
Conversation
v[i] = -0.0
and support reinterpret
Codecov Report
@@ Coverage Diff @@
## main #296 +/- ##
==========================================
+ Coverage 93.71% 93.74% +0.02%
==========================================
Files 12 12
Lines 7433 7431 -2
==========================================
Hits 6966 6966
+ Misses 467 465 -2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Is it really useful to allow |
` @test m[1, 1] === -0.0` looks really bad as `iszero(-0.0)` is true :/
|
AbstractSparseArray <: AbstractArray, so operations like broadcasted addition with a scalar are supported even though they produce dense results. reinterpret falls into the same category: an operation that is defined for all other AbstractArrays and produces a dense result. Both broadcasted addition with a scalar and reinterpret could, with specialized code here, produce sparse results. Nevertheless, I see a clear hierarchy of sparse result > dense result > error and don't think that the possibility of producing a sparse result should get in the way of removing an unnecessary error. The reason this error exists is because of behavior problems, not because of conversion to a dense array (source). This PR fixes the behavior problem. Variable default values would let both these operations cleanly produce sparse results (and allow IEEE compliance with non-infinite floating point results), but that is not for this PR.
How do you feel about the current behavior? julia> x = sprand(5, .5)
5-element SparseVector{Float64, Int64} with 3 stored entries:
[1] = 0.00352086
[3] = 0.111171
[5] = 0.223453
julia> x .= -0.0
5-element SparseVector{Float64, Int64} with 3 stored entries:
[1] = -0.0
[3] = -0.0
[5] = -0.0
julia> x .=== -0.0
5-element SparseVector{Bool, Int64} with 3 stored entries:
[1] = 1
[3] = 1
[5] = 1
julia> collect(x .=== -0.0)
5-element Vector{Bool}:
1
0
1
0
1 |
looks like intended behavior? the rule is do not make new entries unless non-zero but keep entries in an array unless dropzero |
i also think @dkarrasch spent a bit of time getting |
This is the rule in the publically documented API: "In some applications, it is convenient to store explicit zero values in a SparseMatrixCSC. These are accepted by functions in Base (but there is no guarantee that they will be preserved in mutating operations)."
This implementation calls It's also worth noting that static arrays' zero method is faster than the existing julia> @btime SparseArrays._isnotzero(x) setup=(x = @SVector rand(20))
6.143 ns (0 allocations: 0 bytes)
true
julia> @btime x !== zero(x) setup=(x = @SVector rand(20))
4.909 ns (0 allocations: 0 bytes)
true
julia> @btime x !== zero(x) setup=(x = @SVector zeros(20))
1.781 ns (0 allocations: 0 bytes)
false
julia> @btime SparseArrays._isnotzero(x) setup=(x = @SVector zeros(20))
1.915 ns (0 allocations: 0 bytes)
false More importantly, actual benchmarks indicate a small (negligible) speedup from this PR: julia> function f(n, m)
s = spzeros(SVector{10, Float64}, n, n)
for _ in 1:m
s[rand(eachindex(s))] = @SVector rand(10)
end
out = 0
for _ in 1:m
out += sum(s[rand(eachindex(s))])
end
out
end
f (generic function with 1 method)
julia> @benchmark f(1_000, 10_000) # master
BenchmarkTools.Trial: 119 samples with 1 evaluation.
Range (min … max): 27.345 ms … 43.278 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 28.643 ms ┊ GC (median): 0.00%
Time (mean ± σ): 30.044 ms ± 3.999 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▇█▂
▃▅▇████▄▃▃▁▃▁▁▁▄▄▃▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▄▃▄ ▃
27.3 ms Histogram: frequency by time 43.2 ms <
Memory estimate: 3.51 MiB, allocs estimate: 19.
julia> @benchmark f(1_000, 10_000) # PR
BenchmarkTools.Trial: 169 samples with 1 evaluation.
Range (min … max): 26.936 ms … 52.152 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 28.515 ms ┊ GC (median): 0.00%
Time (mean ± σ): 29.270 ms ± 2.905 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▆█▆▂
▂▁▆█████▅▆▅▃▃▃▂▃▃▂▂▃▁▁▁▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▃▁▁▁▁▁▁▂▁▁▁▂ ▂
26.9 ms Histogram: frequency by time 41.5 ms <
Memory estimate: 3.51 MiB, allocs estimate: 19.
julia> versioninfo()
Julia Version 1.10.0-DEV.27
Commit 90934ff730 (2022-11-19 21:56 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin21.3.0)
CPU: 4 × Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 1 on 2 virtual cores
Environment:
LD_LIBRARY_PATH = /usr/local/lib
JULIA_PKG_PRECOMPILE_AUTO = 0
Addressing the case of multiple zeros more reliably is the point. A classic assumption folks make about arrays is that if you set an element to a value and then check back later (without any intervening writes) to see what the value is, it will be the exact same value you set initially. This PR makes SparseArrays conform to that behavior without a substantial runtime cost. |
But in many cases it would probably be better to get an error for that addition. And if you really want a dense result, then why use a sparse array to begin with? I wouldn't say that reinterpret is in the same category here, because that would always give a non-sparse looking result, so that is always bad. |
I suppose there are two things at play here: fixing |
Not super specific to this PR, but my take is that |
This is how the GraphBLAS spec works and how I intend the Finch.jl supported ecosystem to work as well. I don't think we can do this until v2.0 here though. |
Branching off the reinterpret question into a larger discussion on operations that produce dense results: #307 |
I agree. Right now you have to jump through hoops to insert stored entries (for example adding and subtracting some value at the same index...). |
Triage approves. Triage spent a while discussing the objection that there could be a major performance regression if someone counts on assigning Triage briefly discussed the objection that someone may have counted on the semantics of |
The reasoning about "sensible use cases" is that the potential issue would be if you're assigning a lot of -0.0 values into a sparse array and counting on them being discarded in order to not blow up memory; but if you're doing that, then you have a performance problem anyway since you're doing O(n*m) work, even if you weren't using O(n*m) storage. Basically, reasonable sparse matrix code shouldn't be be relying on assignments no being stored in the first place. |
backport-1.9 because of JuliaLang/julia#48079 |
What is the purpose/usecase for |
#289 from the OP is one such example |
No, you wouldn't be able to use the result of |
"anything useful" is quite a broad umbrella. Three potentially useful things I can imagine doing are taking a random sample with
The strong justification that motivated the introduction of this error—incorrect behavior—is now gone. In any case, it should be possible to implement an efficient specialization for |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.9 1.9
# Navigate to the new working tree
cd .worktrees/backport-1.9
# Create a new branch
git switch --create backport-296-to-1.9
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 d4c36be30c62762bb1b9224c1963b723494260f9
# Push it to GitHub
git push --set-upstream origin backport-296-to-1.9
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.9 Then, create a pull request where the |
* assign whenever v !== zero(eltype(sparsecollection)) * test -0.0 * allow reinterpret * test reinterpret
* assign whenever v !== zero(eltype(sparsecollection)) * test -0.0 * allow reinterpret * test reinterpret
@dkarrasch, did this make it into 1.9? I'm still seeing the bug it is supposed to fix: @v1.9) pkg> activate --temp
Activating new project at `/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_FokowT`
(jl_FokowT) pkg> st
Status `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_FokowT/Project.toml` (empty project)
julia> versioninfo()
Julia Version 1.9.0-rc2
Commit 72aec423c2a (2023-04-01 10:41 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.3.0)
CPU: 8 × Apple M2
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 1 on 4 virtual cores
julia> using SparseArrays
julia> sort!(sprand(100, .1))
ERROR: `reinterpret` on sparse arrays is discontinued.
Try reinterpreting the value itself instead.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] reinterpret(#unused#::Type, A::SparseVector{Float64, Int64})
@ SparseArrays ~/.julia/juliaup/julia-1.9.0-rc2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/SparseArrays/src/abstractsparse.jl:79
... |
It should be on the latest backport PR in Julia. |
mysparsearray[index] = v
wheneverv !== zero(eltype(sparsecollection))
reinterpret
Similar to @StefanKarpinski's suggestion here
Fixes #289
Fixes #294
Fixes #304
Fixes #305
Fixes JuliaLang/julia#48079
The primary motivation for this PR is to fix the bug where someone expects all AbstractArrays to be
reinterpret
able (there is an::AbstractArray
method defined in Base) and someone else tries passing their method a sparse array and runs into an error.