-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getindex(A::SparseMatrixCSC, ...) creates dense column vector: out of memory #12860
Comments
Please see #11324 - support for sparse vectors is planned for 0.5. |
The problem is not only for sparse vectors. It's for all sparse matrices In principle you should not convert a sparse array into a dense one. We as users take our data types seriously and do not like unexpected conversions. |
@cschwarzbach I'm not sure if you are new to participating in open source communities, but the tone and the way this paragraph was written is not necessary. It would have sufficed to simply say "I believe take dense slices shouldn't be produced from sparse matrices", and did not require you explaining how careful you are when using selecting a data structure or how much difficulty this implementation weakness in a free, open-source software caused you - it can be read as insulting to people who have done the work that exists so far that you are otherwise benefiting from. @eldadHaber: @jiahao understands that very well, he was referring to that issue because ideally slicing a column from a sparse matrix would give you a sparse vector. Same comment goes for your tone as for @cschwarzbach - there doesn't really exist such a clear user-developer distinction here, you are not paying for a product, but participating in the development of a pre-release open source software project. |
No intention to be insulting and I appologize if it seems so. We certainly understand the effort that is invested here. So please, can we get back to the point. |
I think what @jiahao and @IainNZ meant was that the point is well taken. We're just waiting for sparse vectors support to be merged into Base to switch |
Just a reminder, it's best if the people who most care about an issue implement the fix themselves; you guys obviously care a lot about this. (That doesn't mean that some hero won't step up to the plate and do your work for you, but you can never count on when/if that will happen.) Want to give it a try? It might not be hard. See https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md |
@cschwarzbach Can you suggest a heuristic that will work for you? |
Cc: @tanmaykm |
@IainNZ I'm sorry if I hurt anybody's feelings; I had no intention to do so. I've deleted the offending section in my post. I appreciate the hard work that the Julia community has done to pull this amazing project together. |
If you can put your fix it will be better.
|
Of the three algorithms for Of course, one could come up with a more elaborate heuristic which considers dense vectors up to a certain size acceptable. However, this seems to me very problem and hardware dependent. I'm afraid of opening Pandora's box. |
I also suggest to stick to the sparse algorithm. I would add in the help file that if the size of the index set is large then the user should density her array before getting the index. This will give the choice in the hand of the user. |
I would really like to tweak the heuristics, and not slow down things for people who benefit from this. Clearly, we just need to put a ceiling on the largest dense vector we create, for which we can figure out a good threshold. |
The relevant performance test are here: https://github.com/JuliaLang/julia/blob/349a4e197728d010d107472d44ba9ccece0876f7/test/perf/sparse/getindex.jl It would be great to add above test case to it (well, maybe with smaller size). At least my sparse matrices (and I suspect many others) are pretty square. Then allocating a dense vector of |
A[I,J]
for a sparse matrix A and index vectors I and J is implemented insparse/sparsematrix.jl
. The computational core consists of three different algorithms which are implemented inBase.SparseMatrix.getindex_I_sorted_bsearch_A
,Base.SparseMatrix.getindex_I_sorted_bsearch_I
andBase.SparseMatrix.getindex_I_sorted_linear
. A heuristics is employed to choose the best algorithm for a given problem size. The second and third algorithm internally use a dense vector of sizesize(A,1)
as cache. For moderately sized matrices, the memory footprint is not significant. For very large sparse matrices of, e.g.,size(A,1) > 10^9
, the memory footprint becomes significant. At the same time, the heuristics that is used to pick the best algorithm fails. See the following code example.The generated screen output looks as follows:
This is with a very recent Julia 0.4:
Note that the heuristic choice (
getindex_I_sorted(A,I,J)
equivalent toA[I,J]
) allocates 1 GB memory and has the longest runtime of the three algorithms.I think that the conversion of even just a column of a sparse matrix to a dense vector should be avoided.
The text was updated successfully, but these errors were encountered: