Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Matrix implementation more SIMD friendly #60

Merged
merged 3 commits into from
Dec 5, 2020

Conversation

lgritz
Copy link
Contributor

@lgritz lgritz commented Nov 13, 2020

This is backporting some changes we made in OSL (with customized
versions of Imath headers that we can thankfully no longer need if we
push the fixes back to the Imath project). This was all pointed out by
people from Intel working on OSL, so I'm deferring to their judgment
that this is the best solution.

  • Avoid memset or memcpy to initialize or copy a matrix, intead use
    element-by-element copies. In a scalar context they generate the
    same code (with the optimizer on, at least), but when those
    operations are done inside a loop that you hope will be
    auto-vectorized, the casting and function calling involved in using
    memset/memcpy will confuse the vectorizer and you'll end up with
    inferior code, sometimes even not vectorizing the whole loop.

  • Replace a bunch of instances of accessing vector elements as [0],
    [1], [2] with .x, .y, .z. This is because our particular
    implementation of operator[] involves a pointer cast that confuses
    the auto-vectorizers. For best vector performance (not just of
    individual operations but also when doing ostensibly scalar
    operations inside loops that could and should be vectorized), it's
    important to say v.x rather than v[0].

  • Expand Matrix3::operator*(Matrix3) to unroll the loops and also
    avoid needless initialization of elements to 0 only to immediately
    write over the value in the first iteration of the loop.

Signed-off-by: Larry Gritz [email protected]

@lgritz
Copy link
Contributor Author

lgritz commented Nov 14, 2020

I'd like to call your attention to line 2719. I think that a 3x3 matrix that scales by a scalar values should scale in all 3 dimensions, but for some reason the old code only affected x and y! Does anybody think the old way was correct or that this change will break anyone relying on the old, weird behavior?

The fact that I can change this and the test suite neither fails now nor before, I think shows that our unit tests are probably not covering anywhere near the majority of the functionality of this library.

This is backporting some changes we made in OSL (with customized
versions of Imath headers that we can thankfully no longer need if we
push the fixes back to the Imath project). This was all pointed out by
people from Intel working on OSL, so I'm deferring to their judgment
that this is the best solution.

* Avoid memset or memcpy to initialize or copy a matrix, intead use
  element-by-element copies. In a scalar context they generate the
  same code (with the optimizer on, at least), but when those
  operations are done inside a loop that you hope will be
  auto-vectorized, the casting and function calling involved in using
  memset/memcpy will confuse the vectorizer and you'll end up with
  inferior code, sometimes even not vectorizing the whole loop.

* Replace a bunch of instances of accessing vector elements as [0],
  [1], [2] with .x, .y, .z. This is because our particular
  implementation of `operator[]` involves a pointer cast that confuses
  the auto-vectorizers. For best vector performance (not just of
  individual operations but also when doing ostensibly scalar
  operations inside loops that could and should be vectorized), it's
  important to say v.x rather than v[0].

* Expand `Matrix3::operator*(Matrix3)` to unroll the loops and also
  avoid needless initialization of elements to 0 only to immediately
  write over the value in the first iteration of the loop.

Signed-off-by: Larry Gritz <[email protected]>
@lgritz
Copy link
Contributor Author

lgritz commented Nov 16, 2020

Shortly after the initial comment, I pushed a few updates to this PR. I just updated the description of this PR to more fully explain all of the changes therein.

@meshula
Copy link
Contributor

meshula commented Nov 16, 2020

These changes are long overdue. I recognize a lot of code I wrote that resulted from profiling I did fifteen years ago, probably taking into account characteristics of MSVC6 and gcc 2.

I'd like to call your attention to line 2719. I think that a 3x3 matrix that scales by a scalar values should scale in all 3 dimensions, but for some reason the old code only affected x and y! Does anybody think the old way was correct or that this change will break anyone relying on the old, weird behavior?

mea culpa! If you change this routine, an old prototype at LucasArts will stop working if someone ports it to the new Imath (I joke; the code assuredly is in the cosmic bit bucket by now). This was part of an initial build out of Imath, AT LEAST fifteen years ago, when I was building a 2d GUI library with Imath for LucasArts and had the mad idea that maybe 3x3 matrices should be special cased for affine transforms, the way we do for 4x4. That idea didn't carry through, and apparently a routine that I didn't write a test for fell through the cracks.

@lgritz
Copy link
Contributor Author

lgritz commented Nov 20, 2020

At the TSC meeting, consensus was to keep the fixes on that 3x3 matrix scale, the old way was weird. I've pushed an update that alters the comment to be more clear about what's going on and when it changed. All the other changes seemed fine with everybody.

Signed-off-by: Larry Gritz <[email protected]>
@lgritz lgritz merged commit 4e82cbf into AcademySoftwareFoundation:master Dec 5, 2020
@lgritz lgritz deleted the lg-simd branch December 5, 2020 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants