Dense instead of sparse matrix returned during differentiation #1537

vboussange · 2024-11-03T13:05:09Z

Hi there,
I found an inconsistent behaviour when differentiating a function which takes in an AbstractCuSparseArray versus the same function differentiating an AbstractSparseArray

using CUDA
using Zygote
using SparseArrays

logpos(a) = a > 0 ? log(a) : zero(a)
l(A) = sum(logpos.(A))

A = sprandn(Float32, 10,10,0.4)

dA = gradient(l, A)[1] # returns a sparse array, as expected

Acu = sparse(CuArray(A))
dAcu = gradient(l, Acu)[1] # returns a dense CuArray

Maybe someone here has an idea of where could this come from?

The text was updated successfully, but these errors were encountered:

ToucheSir · 2024-11-05T00:35:32Z

I suspect the issue is with missing rule(s) in ChainRules since Zygote has almost nothing in the way of machinery for diffing sparse arrays. What happens if you set CUDA.allowscalar(false) before running the MWE?

vboussange · 2024-11-11T14:30:13Z

What happens if you set CUDA.allowscalar(false) before running the MWE?

No error thrown, same dense array returned.

ToucheSir · 2024-11-11T21:02:57Z

Ok, I dug into this a bit more and have a reduced MWE:

julia> Zygote.unbroadcast(A, collect(A)) |> summary
"10×10 SparseMatrixCSC{Float32, Int64} with 41 stored entries"

julia> Zygote.unbroadcast(Acu, cu(collect(Acu))) |> summary
"10×10 CuArray{Float32, 2, CUDA.DeviceMemory}"

unbroadcast is an internal function used by the broadcast rules, and the divergence between CPU and GPU happens here:

Zygote.jl/src/lib/broadcast.jl

Line 59 in 9b6dd08

    
           _project(x, x̄)  # ProjectTo handles reshape, offsets, structured matrices, row vectors

.

The gist is that the actual gradient computed from the broadcast is dense. However, the CPU path knows how to "project" that back into a sparse gradient so the end result is sparse. This is done via the projection machinery in ChainRulesCore.jl. ChainRulesCore has projection machinery for CPU sparse matrices, but not GPU ones.

So ultimately, I'd say there are two ways to look at this. One is that returning the sparse matrix isn't saving much and arguably using more compute. Re-sparsifying the dense gradients yourself afterwards might work well enough. The other interpretation is that this is a missing projection rule in ChainRulesCore, in which case this may be worth a feature request.

vboussange · 2024-11-12T08:18:54Z

Thanks for digging into this! I may request a feature to ChainRulesCore then. Cheers!

vboussange changed the title ~~CuArray instead of SparseCuArray returned during differentiation~~ Dense instead of sparse matrix returned during differentiation Nov 3, 2024

ToucheSir added the CUDA All things GPU label Nov 5, 2024

ToucheSir added the ChainRules adjoint -> rrule, and further integration label Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dense instead of sparse matrix returned during differentiation #1537

Dense instead of sparse matrix returned during differentiation #1537

vboussange commented Nov 3, 2024 •

edited

Loading

ToucheSir commented Nov 5, 2024

vboussange commented Nov 11, 2024

ToucheSir commented Nov 11, 2024

vboussange commented Nov 12, 2024

Dense instead of sparse matrix returned during differentiation #1537

Dense instead of sparse matrix returned during differentiation #1537

Comments

vboussange commented Nov 3, 2024 • edited Loading

ToucheSir commented Nov 5, 2024

vboussange commented Nov 11, 2024

ToucheSir commented Nov 11, 2024

vboussange commented Nov 12, 2024

vboussange commented Nov 3, 2024 •

edited

Loading