-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Matrix implementation more SIMD friendly #60
Conversation
I'd like to call your attention to line 2719. I think that a 3x3 matrix that scales by a scalar values should scale in all 3 dimensions, but for some reason the old code only affected x and y! Does anybody think the old way was correct or that this change will break anyone relying on the old, weird behavior? The fact that I can change this and the test suite neither fails now nor before, I think shows that our unit tests are probably not covering anywhere near the majority of the functionality of this library. |
This is backporting some changes we made in OSL (with customized versions of Imath headers that we can thankfully no longer need if we push the fixes back to the Imath project). This was all pointed out by people from Intel working on OSL, so I'm deferring to their judgment that this is the best solution. * Avoid memset or memcpy to initialize or copy a matrix, intead use element-by-element copies. In a scalar context they generate the same code (with the optimizer on, at least), but when those operations are done inside a loop that you hope will be auto-vectorized, the casting and function calling involved in using memset/memcpy will confuse the vectorizer and you'll end up with inferior code, sometimes even not vectorizing the whole loop. * Replace a bunch of instances of accessing vector elements as [0], [1], [2] with .x, .y, .z. This is because our particular implementation of `operator[]` involves a pointer cast that confuses the auto-vectorizers. For best vector performance (not just of individual operations but also when doing ostensibly scalar operations inside loops that could and should be vectorized), it's important to say v.x rather than v[0]. * Expand `Matrix3::operator*(Matrix3)` to unroll the loops and also avoid needless initialization of elements to 0 only to immediately write over the value in the first iteration of the loop. Signed-off-by: Larry Gritz <[email protected]>
Shortly after the initial comment, I pushed a few updates to this PR. I just updated the description of this PR to more fully explain all of the changes therein. |
These changes are long overdue. I recognize a lot of code I wrote that resulted from profiling I did fifteen years ago, probably taking into account characteristics of MSVC6 and gcc 2.
mea culpa! If you change this routine, an old prototype at LucasArts will stop working if someone ports it to the new Imath (I joke; the code assuredly is in the cosmic bit bucket by now). This was part of an initial build out of Imath, AT LEAST fifteen years ago, when I was building a 2d GUI library with Imath for LucasArts and had the mad idea that maybe 3x3 matrices should be special cased for affine transforms, the way we do for 4x4. That idea didn't carry through, and apparently a routine that I didn't write a test for fell through the cracks. |
At the TSC meeting, consensus was to keep the fixes on that 3x3 matrix scale, the old way was weird. I've pushed an update that alters the comment to be more clear about what's going on and when it changed. All the other changes seemed fine with everybody. |
Signed-off-by: Larry Gritz <[email protected]>
This is backporting some changes we made in OSL (with customized
versions of Imath headers that we can thankfully no longer need if we
push the fixes back to the Imath project). This was all pointed out by
people from Intel working on OSL, so I'm deferring to their judgment
that this is the best solution.
Avoid memset or memcpy to initialize or copy a matrix, intead use
element-by-element copies. In a scalar context they generate the
same code (with the optimizer on, at least), but when those
operations are done inside a loop that you hope will be
auto-vectorized, the casting and function calling involved in using
memset/memcpy will confuse the vectorizer and you'll end up with
inferior code, sometimes even not vectorizing the whole loop.
Replace a bunch of instances of accessing vector elements as [0],
[1], [2] with .x, .y, .z. This is because our particular
implementation of
operator[]
involves a pointer cast that confusesthe auto-vectorizers. For best vector performance (not just of
individual operations but also when doing ostensibly scalar
operations inside loops that could and should be vectorized), it's
important to say v.x rather than v[0].
Expand
Matrix3::operator*(Matrix3)
to unroll the loops and alsoavoid needless initialization of elements to 0 only to immediately
write over the value in the first iteration of the loop.
Signed-off-by: Larry Gritz [email protected]