-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Row Major Arrays #84
Comments
Several existing algorithms follow LAPACK and allow you to pass For sparse matrices, this is an interesting approach, and perhaps the only viable one for making both |
CC @madeleineudell. |
I know that a flexible data layout would require a lot of work to get all operations efficient. But maybe one can use Julia fallbacks in the first iteration for row-major data. |
The |
For sparse matrices, however, I see no conceptual reason to not support a |
Yes a transposed array type would solve this although it is still somewhat inconvenient as one has to fill the array in a transposed way (My application has row major order matrices stored on disk). But having CSR in addition to CSC is the same as having column major + row major arrays. |
Having both CSR and CSC for sparse matrices are desirable. However, for dense matrices, having two kinds of layouts would complicate things a lot: If we want to add the support of row-major arrays, it is much more preferable to add a |
The I am wondering if I am the first to run into issues with row major data and Julia arrays. Numpy defaults to row major ordering. |
Isn't it mostly a matter about habit? If you are more used to row major you'd prefer to write your loop efficient for row major. Fortran is columns major and considered okay for numerical programming. |
I think
|
@andreasnoackjensen: Technically this is not a problem. But my implementation of the Kaczmarz algorithm now operates on columns instead of rows. This makes the code less readable. Its not the biggest issue ever but whenever I see that Julia has a corner where it is not completely awesome I feel reponsible to report an issue on this :-) |
When this discussion is expanded to sparse arrays as well as dense ones, it becomes clear that there are algorithms for which one storage format is much faster than another. It's important to be able to support the needs of advanced users who care about these details while making sure that defaults are strong enough that casual users don't get confused. For example, scipy supports 7 different sparse matrix formats, with very clear documentation on the strengths and weaknesses of each. So as a programmer, I can decide to eg build a matrix using coo format and convert it to csr for efficient A_b, or csc for efficient b'_A. But all these matrix types are clearly named as such, and actually raise warnings if you, say, try to set indices in a csr matrix. |
Not really. For dense arrays, memory locations are arranged in a predictable manner---calculating the address is a matter of arithmetic. For sparse arrays, it's completely different: there isn't any way to predict where something is stored without looking up the indexing. You're likely to get 2 cache misses just to find out where something is stored, and a third to fetch it. |
I wholeheartedly agree to support multiple sparse matrix formats. I personally have long been waiting for CSR in my own applications. Having said that, the need of multiple layouts for dense arrays is not as obvious. |
I guess having more options is not always a good thing even though a more efficient solution exists. Left turns are inefficient in right-hand traffic but it is minor inefficiency compared to allowing mix of left and right-hand traffic. The good news are that you have an efficient implementation for the transposed problem. Now you only have to write the version for the non-transposed problem:-) The efficiency/simplicity tradeoff seems different for sparse, as pointed out by others. The only problem is that not even CSC is covered fully yet so there is a lot of work to be done. |
I like the metaphor of mix of left and right-hand traffic. |
@timholy Well, you can emulate CSR in the same way as CSC as emulating a row major array using a column major array: Keep the transposition in mind when working with these matrices. Maybe I have overestimated my problem and it is to specific. I had thought that the column/row major issue would be more common. |
That's true; instead of calling them
|
I like the idea of the |
@timholy: exactly. And quite similar one can shape up a strided array type that has special cases where one has row-major or column major data. I have written myself a c++ template library that does exactly that and it worked out quite well with an interface very similar to the numpy ndarray. So there is no principle problem with that approach. But its clear that it increases the complexity of all array stuff a lot. Maybe that I seem to be the first to run into the row-major ordering problem is a sign that the current design is the right one. |
At least for sparse matrices, I personally feel that this increases the codebase complexity a bit too much. I thought about this when I started writing the sparse implementation, and thought that it is better to have one complete and well-performing implementation of a sparse matrix data structure, before experimenting with new data structures. Making all the CSC code generic for CSR, and efficiently doing operations on combinations of CSC and CSR seems like a lot of work. We certainly want COO format, but we might as well do that as part of the sparse N-d array implementation. |
I am closing this one, since we generally will do more sparse storage formats as packages. |
This isn't strictly for sparse, but I think it could be subsumed under potential solutions to #42. |
I have already asked this on julia-users (https://groups.google.com/forum/?fromgroups=#!searchin/julia-users/Row$20Major/julia-users/FUU7C6Pvx3Q/Q6iUkUQM8xAJ) but maybe Github is the better place for this.
There are situations where row-major arrays are more suitable. In particular I am thinking about Kaczmarz algorithm which operates on matrix rows and this is simply more efficient on row-major arrays.
Solutions:
I know that this is core stuff of Julia and understand any objections in changing things here. Still, Julia seems to be not that flexible in this regards as Numpy and it would be great if we could get a clean solution for this.
The text was updated successfully, but these errors were encountered: