-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we want fixed-size arrays? #5857
Comments
I should also point out that one of the historical impediments to the popularity of ImmutableArrays is the long package-loading time:
Now that precompilation has arrived, this is no longer an impediment, so we may not need any kind of CC @twadleigh |
+1 for better functionality for fixed-size arrays. I'm not quite sure what kind of implementation that you are proposing? I, for one, would welcome language support for fixed-size, mutable, arrays (or at least vectors, from which it should be reasonable to build fixed-size array support). I seem to recall that this was up for discussion at some point, but I can't remember when. |
A candidate implementation was in that post. (Missing lots of functions, of course.) I agree that mutability is very nice, so rather than using a |
👍 I would say that the performance improvement is worth it, especially for applications with lots of 2D and 3D geometry transformations, where mutability seems desirable too. |
Actually, the idea of having it simply contain an array has a certain appeal: it would make it much easier to interop with existing arrays, and give us easy interop with C:
Functions for which we want to be able to mix regular arrays and |
👍 One of the secret features of the generic linear algebra functions that @andreasnoackjensen and I have been silently injecting into And yes, that sounds like a fantastic GSoC project. |
@timholy if FixedArray contains an Array or NTuple like that, the data is not contiguous and FixedArray could not be packed in a struct. So C-interop would still be a bit hampered. |
@ihnorton, true if you're thinking about them as a field of a struct, but I was thinking more about passing them to LAPACK when needed (e.g., for factorizations). |
Fixed-size arrays are definitely useful. Since these arrays are usually used in applications with high performance requirement (geometric computing, etc). Therefore, computation over these arrays should be implemented as fast as possible. However, IMO, such computation should be implemented based on SIMD instead of cartesian indexing: Here is a snippet of C++ codes for 2x2 matrix multiplication # suppose SSE vector a holds a 2x2 float matrix A, and b holds B
__m128 t1 = _mm_movelh_ps(a, a); // t1 = [a11, a21, a11, a21]
__m128 t2 = _mm_moveldup_ps(b); // t2 = [b11, b11, b12, b12]
__m128 t3 = _mm_movehl_ps(a, a); // t3 = [a12, a22, a12, a22]
__m128 t4 = _mm_movehdup_ps(b); // t4 = [b21, b21, b22, b22]
t1 = _mm_mul_ps(t1, t2); // t1 = [a11 * b11, a21 * b11, a11 * b12, a21 * b12]
t3 = _mm_mul_ps(t3, t4); // t2 = [a12 * b21, a22 * b21, a12 * b22, a22 * b22]
__m128 r = _mm_add_ps(t1, t3); // the result Small matrix algebra can usually be computed within a small number of cpu cycles. Even a waste of a couple CPU cycles would lead to considerable performance penalty. Also note that BLAS & LAPACK are aimed for matrices of sufficiently large size (e.g. 32 x 32 or above) -- due to their overhead. Using BLAS & LAPACK to small fixed size matrices would be even much slower than a naive for-loop. I have long been considering writing a package for small matrices. Just that necessary facilities for SIMD has not been supported. |
This issue is related to #2489. Also note that I have a C++ library developed several years ago, which contains highly optimized SIMD codes for small matrix algebra (e.g. hand-crafted kernels for matrix multiplication of I'd be happy to translate all these to Julia whenever SIMD lands. |
Sounds reasonable to me. I presume it's not the case that LLVM can generate such instructions automatically? |
I think it wouldn't it be too hard to dispatch fixed-size matrix operations to different underlying implementations depending on the size; just as Eigen lib (AFAIK) does. |
I don't think LLVM is smart enough to do that. It is good to have a matrix type declared as: |
I think your |
@cdsousa, this implementation does dispatch on different sizes, because of the If we use an array instead of a tuple in the underlying representation, we don't need that @lindahua, your "tiled" idea seems like a good one. |
@timholy: oh, it's true! (I'd failed to read the macro foo part) That is great 😃 |
@lindahua, I just checked |
What jumps out at me here is #2489 --- our standard matrix multiply performs horribly on small matrices. That code needs a bath. I think eventually tuples will use native data layouts, at which point they will become immutable fixed-size vectors. Immutability is the way to go here, especially since this feature is aimed at smallish arrays, and we could use an immutable array type anyway. |
I agree with @JeffBezanson that for small matrices, it is desirable to have immutable arrays. To me, the (potential) benefit of immutability outweighs the need to allow modifying individual elements of a small array. (In practice, small matrices/vectors are usually used as a whole). I think @timholy's definition is a good proposal (except that I would use |
Re #2489, in my hands the performance gap was even larger (20x instead of 4x). Not sure if it's because of heap vs stack, or bounds-checking vs not, or 2x3 vs 2x2, or changes in Julia since then, or me messing something up... I'm fine with either immutability or mutability. I guess my bottom line is this: ever since we allowed a tuple of integers to be a template parameter, we've been in a position to create a general (And to further clarify, here I'm trying to confine myself to "professor" mode and simply sketch a potential project, possibly for a GSoC student; I don't have time or intention to actually pursue this myself in the foreseeable future.) |
Forgot to respond to this:
Nope. Don't know enough about SIMD to do that in a respectable fashion. |
My simd branch is targeting loops that can broken into SIMD chunks, so it might not buy anything for fixed sized arrays. I tried it on a 8-iteration loop (with AVX enabled) and the unroller unrolled the loop first. Maybe SLP_Vectorizer can deal with the unrolled loop, or maybe I need to run the loop vectorizer before the unroller. |
@ArchRobison: I am wondering whether the vectorizer can automatically convert the following statements into an # let x and y be a four tuple of Float64
z = (x[1] + y[1], x[2] + y[2], x[3] + y[3], x[4] + y[4]). And whether it may convert the following statements into an # let a be an ordinary array,
t = (a[i], a[i+1], a[i+2], a[i+3]) We don't have to explicitly export low-level SIMD machinery if the vectorizer can deal with tuple operations in a smart way. |
What wasn't clear to me until now is that |
I gave it a whirl and the first thing I noticed is that a tuple access generates a bounds checks that seems gratuitous. For example, run:
Using Julia built from today's master source, the LLVM code looks like it has a bounds check for most of the subscripts even though I though lowering to LLVM took pains to avoid them. Am I missing something or should I file an issue about the gratuitous branches? |
Could more intensive passes be selectively enabled during the precompilation step? |
@ArchRobison: Functions with no type declarations don't get specialized on tuple arguments (see #4090). If I use |
Thanks for the explanation and link. I tried SLPVectorizer on the |
It turns out that For example, look at the transcript at https://gist.github.com/ArchRobison/9145036, which is using The problem in
I'm inclined to look into improving |
My thoughts on the matter are to use
on which to define the arithmetic operations, so there might still be use for something like the immutable arrays package (or we might just put it in base). The reason for this distinction is that I'm nor sure people would expect tuples to behave like vectors, but I haven't yet seen a case where it's an actual problem. |
Note that up at the top of this issue is a candidate implementation of these ideas, including an algorithm for matrix-multiply that exploits a tuple representation. I haven't benchmarked it, but I very much doubt it's currently as fast as ImmutableArrays. (But it's a lot faster than standard Arrays, for small matrices.) |
Ah yes I forgotten about that. That's certainly along the line of what I had in mind. |
I am not entirely sure what I would prefer on this point. There were situations where I wanted to do arithmetics on tuples (e.g. on the size tuple of an array). |
I'd lean towards fewer types – i.e. just making the usual vector arithmetic work on NTuples. An interesting corner case is what to do about say |
It seems like the specialization rules for tuples also need to change if we want to make NTuples behave like fixed-size vectors. |
By coincidence, I was trying to create a vectorized 4x4 matrix multiplication using PR #6271, and almost got it to vectorize. (I think with some more work on LLVM I can make it vectorize). Her'e's what I was trying:
A small matrix class with (e.g. MNMatrix{M,N,T}) might make this sort of thing more practical. Note the destructive update of c. An immutable matrix wouldn't work. |
I'm fine with defining |
I also think that less types are better, especially as one otherwise has to decide when to use NTuple and when the fixed size vector. My only concern is what to do with matrices. Is the syntax
still open? |
I would expect |
If you want to represent a matrix using a tuple, you're going to have to wrap it (even if just with template parameters as above) to encode the |
Hah, also by coincidence, just last night I was playing with implementing element-wise comparisons for tuples. I, too, was frustrated with the |
Its kind of funny that C++ has fixed size (stack-allocated) multi-dimensional arrays but no dynamic equivalent, while Julia has it the other way around. |
To answer the original question of this issue: Yes I think we should have fixed size arrays in base. Some thoughts:
It seems that one central issue is to make NTuples memory layout tight. Is there actually any other possibility to implement a FixedArray with other means? Or does this have to wait until that is done? @loladiro |
Yes, it is good to keep in mind that in some sense an |
I've stumbled upon this issue once again, and @tknopp comment on C++ made me think: What if FixedArray was a new intrinsic/built-in just like Array or tuples (and being mutable)? Once again I thought about the awesome meta-programming-to-the-limits of C++ Eigen library. The static/dynamicness of Eigen matrix sizes (and arrays) is defined by a template parameter: an integer indicating the size (or many for higher dimensions) which can take a special value indicating that it is a dynamic size. I think that's something easily implementable in Julia, although probably not necessary. |
@cdsousa Actually the ImmutableArrays package uses a similar approach as Eigen by making the dimension a type parameter. If @Keno gets the tight memory layout for tuples implemented they would be kind of a built-in type. I don't think any C implementation can help here. This needs to be implemented on the llvm level so that the right code is emitted. |
I don't see ImmutableArrays package doing it (here?). Maybe you wanted to say the above proposed FixedArray? Anyway, I was talking about the size type parameter which can take a special value to mean that the size is dynamic (actually it's a special integer value since C++ templates parameters have to have constant type). In Julia that would be even easily implementable since type parameters don't have to be of constant type and the dynamic value could be a singleton type. Something like
However, that is probably not necessary in Julia. What I think would be nice to have in Julia is built-in fixed size and mutable arrays for small but also for large sizes (just like common Array). It seems to me that the drawback of using tuples is that they are immutable and, since they can take different types, the internal code is (this I'm not sure) more complex.
Just ignore me if what I'm saying is just plain wrong 😄 |
Isn't this exactly what the current |
Ah yes I confused it a bit. ImmutableArrays just generates the appropriate immutable types as a macro. But I am actually with you. If it is easier to implement a fixed-size array on the C level (exploiting that all entry types are the same) it would be great to have them. On the Julia level using an immutable is the only way to achieve a tight memory layout. And I am not so sure how important the mutablility of fixed-size arrays is in practice. |
@timholy: Yes the |
There seems to be a fair amount of interest in these, so let's move the discussion over to #7568. |
There is also the apparently-confusingly-named package < using FlexibleArrays A (10x10) fixed-size arraytypealias Arr2d_10x10 FlexArray(1:10, 1:10)a2 = Arr2d_10x10{Float64}(:,:) -erik On Wed, May 4, 2016 at 4:32 AM, Lyndon White [email protected]
Erik Schnetter [email protected] |
One of the fun things you can do with Cartesian is easily generate unrolled matrix multiplication algorithms. This might be particularly useful for a fixed-size array type (see also ImmutableArrays; the major difference is that ImmutableArrays generates types called
Matrix2x3
etc, whereas here the size is encoded as a parameter).Here's a short proof-of-concept demo (which takes many shortcuts in the interest of brevity):
and the results in a session:
Because of the use of globals, this 20-fold improvement may even underestimate the advantage this would confer.
Do we want this? If so, I recommend it as a GSoC project. It would actually be quite a lot of work, especially to get these working in conjunction with more traditional array types---the number of required methods is not small and indeed somewhat scary.
For the record, here's what the generated code looks like for the case of 2x2 multiplication (look Ma, no loops!):
The text was updated successfully, but these errors were encountered: