-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: AbstractSparseMatrixCSC interface #33054
Conversation
To use algorithms defined in SparseArrays, a matrix `A` of type `AbstractSparseMatrixCSC` | ||
must satisfy the following constraints. | ||
|
||
### Matrix `A` and all vectors constituting `A` have one-based indexing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think one-based indexing is necessary, in principle. But I think most of the code in SparseArrays.jl requires it so I think it's better to specify it.
|
||
```julia | ||
@assert nnz(A) <= length(rowvals(A)) | ||
@assert nnz(A) <= length(nonzeros(A)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment (below) says 0 <= length(nonzeros(A))
but I suppose it was a typo? I'm using nnz(A) <= length(nonzeros(A))
instead (above).
# Assumes that 0 <= length(nzval) < typemax(Ti) |
|
||
```julia | ||
@assert size(A, 1) <= typemax(Ti) | ||
@assert nnz(A) <= typemax(Ti) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is something I don't understand in #31724. It was mentioned that both m = size(A, 1)
and n = size(A, 2)
must be representable by Ti
:
Lines 88 to 90 in fb9cd34
* `SparseMatrixCSC(m,n,colptr,rowval,nzval)` perform consistency checks for arguments: | |
`colptr` must be properly populated and lengths of `colptr`, `rowval`, and `nzval` | |
must be compatible with `m`, `n`, and `eltype(colptr)`. |
I understand that size(A, 1) <= typemax(Ti)
has to hold because we use rowvals(A)
for indexing to the rows [1]. However, size(A, 2)
only appears as the indexing into getcolptr(S)
so why does Ti
matter?
Also regarding the current comment:
julia/stdlib/SparseArrays/src/sparsematrix.jl
Lines 8 to 9 in 19a2b2e
# Assumes that nnz <= length(rowval) < typemax(Ti) | |
# Assumes that 0 <= length(nzval) < typemax(Ti) |
- It says
nnz(A) < typemax(Ti)
. But isn'tnnz(A) == typemax(Ti)
OK? - I suppose
length(rowvals(A)) <= typemax(Ti)
andlength(nonzeros(A)) <= typemax(Ti)
don't have to hold because the tail of those vectors can just be ignored?
[1] But technically size(A, 1) > typemax(Ti)
sounds OK as long as the structural nonzeros appear only in the upper 1:typemax(Ti)
rows?
`AbstractSparseMatrixCSC` subtype `TS` must provide the following methods: | ||
|
||
* [`size(::TS)`](@ref size) | ||
* [`getcolptr(::TS) :: AbstractVector{<:Ti}`](@ref getcolptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getcolptr
is probably going to be renamed to colptrs
#33041
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #25118 @simonbyrne suggested coloffsets
which I think is a cleaner name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
|
||
* [`size(::TS)`](@ref size) | ||
* [`getcolptr(::TS) :: AbstractVector{<:Ti}`](@ref getcolptr) | ||
* [`rowvals(::TS) :: AbstractVector{<:Ti}`](@ref rowvals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #25118 @simonbyrne suggested rowindices
. It is a better name but rowvals
is already a documented public API so I don't know if we want to change this at this point. But this may be a good opportunity to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again agree. But documented APIs should not be changed. We can do this for 2.0, or whenever we can version stdlib / or move sparse out of stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, I think we can define rowvals
to be an alias to rowindices
like this:
rowindices(S::SparseMatrixCSC) = getfield(S, :rowval)
rowvals(x) = rowindices(x)
and then use rowindices
internally everywhere.
In the documentation, we can mention that the use of rowvals
is highly discouraged and will be deprecated in the next major release.
This means that for end-users rowvals
keep working as-is but for package authors of custom array types they are forced to use rowindices
from the get-go.
* [`size(::TS)`](@ref size) | ||
* [`getcolptr(::TS) :: AbstractVector{<:Ti}`](@ref getcolptr) | ||
* [`rowvals(::TS) :: AbstractVector{<:Ti}`](@ref rowvals) | ||
* [`nonzeros(::TS) :: AbstractVector{<:Tv}`](@ref nonzeros) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #25118 @simonbyrne suggested values
. Indeed, it's a bit strange that nonzeros(A)
can contain zeros.
Unfortunately, values(::SparseMatrixCSC)
is already a valid method which is defined to be an identity function by the fallback. But maybe it could be considered a "minor change" if we define appropriate key(::AbstractSparseMatrixCSC)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reference, #22907 is the PR where keys
, values
and pairs
are introduced for arrays. There seems to be no discussion on the treatment of structured and sparse arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it would be breaking for functions like findmax
implemented in #22907.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think values
is a bit generic. Perhaps storedvals
? But still can't change the documented API. Note it for future change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storedvals
sounds like a good idea to me. If we go this way, we can rename nonzeroinds
for SparseVector
to storedindices
as well (ref #33050 (comment)).
I am not sure how to handle the API changes. For already exported APIs, we cannot delete them - and adding new ones that duplicate things is perhaps not right. So, maybe the thing to do is hold off on API cleanup until we have an opportunity to do so. |
Would be good to hear from @KristofferC if there are other suggestions. |
For those following along, please note that there is no change to the sparse matrix field names being proposed in any of these PRs. @tkf The clear thing we should do here is only provide the new APIs. For now, I think for consistency purposes, let's mirror the accessor names to be the same as the field names (except for m and n, where we clearly want something better). Having a mix of old names and new names seems more confusing to me. |
@ViralBShah How about waiting a bit to see until something happens in #25118? We can keep things conservative here if no consensus emerges in that issue (say) before 1.4 feature freeze. |
Yes, we can wait for some consensus to emerge there. Is it possible to mark this as a draft pr? |
It looks like that's impossible https://github.community/t5/How-to-use-Git-and-GitHub/Feature-Request-Switch-from-ready-to-draft-in-pull-requests/td-p/19107 |
Cross referencing https://github.com/JuliaLang/julia/issues/31330#issuecomment-473575718 :). |
We have moved the SparseArrays stdlib to an external repository. Please open this PR against that repository: https://github.com/JuliaLang/SparseArrays.jl Thank you! |
This PR should finalize the changes to make SparseArrays.jl extensible (for previous PRs, see: #30173, #32953, #33039, etc.). Now that #33039 is (hopefully) about to get merged I'd like to know what is the best way to formalize the interface for
AbstractSparseMatrixCSC
. I created a first draft based on the comments:julia/stdlib/SparseArrays/src/sparsematrix.jl
Lines 3 to 9 in 19a2b2e
To make it easy to review the documentation, I put the docstring in a markdown file so that GitHub renders it in a (somewhat) readable form. I'll move it to
AbstractSparseMatrixCSC
's docstring before this PR is merged. You can read the latest rendered version here: https://github.com/tkf/julia/blob/RFC-AbstractSparseMatrixCSC/stdlib/SparseArrays/docs/src/AbstractSparseMatrixCSC.mdThere are a few things that are not clear. Especially discussions related to #31724. I'll make inline comments about them.
TODOs before merge:
getcolptr
to something (coloffsets
?) cf Rename: getcolptr -> colptrs #33041.AbstractSparseMatrixCSC
.AbstractSparseMatrixCSC