Make Matrix implementation more SIMD friendly #60

lgritz · 2020-11-13T23:59:38Z

This is backporting some changes we made in OSL (with customized
versions of Imath headers that we can thankfully no longer need if we
push the fixes back to the Imath project). This was all pointed out by
people from Intel working on OSL, so I'm deferring to their judgment
that this is the best solution.

Avoid memset or memcpy to initialize or copy a matrix, intead use
element-by-element copies. In a scalar context they generate the
same code (with the optimizer on, at least), but when those
operations are done inside a loop that you hope will be
auto-vectorized, the casting and function calling involved in using
memset/memcpy will confuse the vectorizer and you'll end up with
inferior code, sometimes even not vectorizing the whole loop.
Replace a bunch of instances of accessing vector elements as [0],
[1], [2] with .x, .y, .z. This is because our particular
implementation of operator[] involves a pointer cast that confuses
the auto-vectorizers. For best vector performance (not just of
individual operations but also when doing ostensibly scalar
operations inside loops that could and should be vectorized), it's
important to say v.x rather than v[0].
Expand Matrix3::operator*(Matrix3) to unroll the loops and also
avoid needless initialization of elements to 0 only to immediately
write over the value in the first iteration of the loop.

Signed-off-by: Larry Gritz [email protected]

lgritz · 2020-11-14T00:02:31Z

I'd like to call your attention to line 2719. I think that a 3x3 matrix that scales by a scalar values should scale in all 3 dimensions, but for some reason the old code only affected x and y! Does anybody think the old way was correct or that this change will break anyone relying on the old, weird behavior?

The fact that I can change this and the test suite neither fails now nor before, I think shows that our unit tests are probably not covering anywhere near the majority of the functionality of this library.

This is backporting some changes we made in OSL (with customized versions of Imath headers that we can thankfully no longer need if we push the fixes back to the Imath project). This was all pointed out by people from Intel working on OSL, so I'm deferring to their judgment that this is the best solution. * Avoid memset or memcpy to initialize or copy a matrix, intead use element-by-element copies. In a scalar context they generate the same code (with the optimizer on, at least), but when those operations are done inside a loop that you hope will be auto-vectorized, the casting and function calling involved in using memset/memcpy will confuse the vectorizer and you'll end up with inferior code, sometimes even not vectorizing the whole loop. * Replace a bunch of instances of accessing vector elements as [0], [1], [2] with .x, .y, .z. This is because our particular implementation of `operator[]` involves a pointer cast that confuses the auto-vectorizers. For best vector performance (not just of individual operations but also when doing ostensibly scalar operations inside loops that could and should be vectorized), it's important to say v.x rather than v[0]. * Expand `Matrix3::operator*(Matrix3)` to unroll the loops and also avoid needless initialization of elements to 0 only to immediately write over the value in the first iteration of the loop. Signed-off-by: Larry Gritz <[email protected]>

lgritz · 2020-11-16T02:52:48Z

Shortly after the initial comment, I pushed a few updates to this PR. I just updated the description of this PR to more fully explain all of the changes therein.

meshula · 2020-11-16T02:53:58Z

These changes are long overdue. I recognize a lot of code I wrote that resulted from profiling I did fifteen years ago, probably taking into account characteristics of MSVC6 and gcc 2.

I'd like to call your attention to line 2719. I think that a 3x3 matrix that scales by a scalar values should scale in all 3 dimensions, but for some reason the old code only affected x and y! Does anybody think the old way was correct or that this change will break anyone relying on the old, weird behavior?

mea culpa! If you change this routine, an old prototype at LucasArts will stop working if someone ports it to the new Imath (I joke; the code assuredly is in the cosmic bit bucket by now). This was part of an initial build out of Imath, AT LEAST fifteen years ago, when I was building a 2d GUI library with Imath for LucasArts and had the mad idea that maybe 3x3 matrices should be special cased for affine transforms, the way we do for 4x4. That idea didn't carry through, and apparently a routine that I didn't write a test for fell through the cracks.

lgritz · 2020-11-20T01:21:11Z

At the TSC meeting, consensus was to keep the fixes on that 3x3 matrix scale, the old way was weird. I've pushed an update that alters the comment to be more clear about what's going on and when it changed. All the other changes seemed fine with everybody.

Signed-off-by: Larry Gritz <[email protected]>

lgritz force-pushed the lg-simd branch from 31accd2 to dc9126a Compare November 15, 2020 00:02

meshula approved these changes Nov 16, 2020

View reviewed changes

Adjust comments

81bfb93

Signed-off-by: Larry Gritz <[email protected]>

lgritz force-pushed the lg-simd branch from 2914909 to 81bfb93 Compare November 20, 2020 01:21

Merge branch 'master' into lg-simd

77ae3a9

lgritz merged commit 4e82cbf into AcademySoftwareFoundation:master Dec 5, 2020

lgritz deleted the lg-simd branch December 5, 2020 23:06

cary-ilm added the v3.0.0-beta label Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Matrix implementation more SIMD friendly #60

Make Matrix implementation more SIMD friendly #60

lgritz commented Nov 13, 2020 •

edited

Loading

lgritz commented Nov 14, 2020

lgritz commented Nov 16, 2020

meshula commented Nov 16, 2020

lgritz commented Nov 20, 2020

Make Matrix implementation more SIMD friendly #60

Make Matrix implementation more SIMD friendly #60

Conversation

lgritz commented Nov 13, 2020 • edited Loading

lgritz commented Nov 14, 2020

lgritz commented Nov 16, 2020

meshula commented Nov 16, 2020

lgritz commented Nov 20, 2020

lgritz commented Nov 13, 2020 •

edited

Loading