Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make Matrix implementation more SIMD friendly
This is backporting some changes we made in OSL (with customized versions of Imath headers that we can thankfully no longer need if we push the fixes back to the Imath project). It may be tempting to use memset or memcpy rather than doing certain element-by-element copies. And indeed, in a scalar context they generate the same code (with the optimizer on, at least). But when those operations are done inside a loop that you hope will be auto-vectorized, the casting and function calling involved in using memset/memcpy will confuse the vectorizer and you'll end up with inferior code, sometimes even not vectorizing the whole loop. This was all pointed out by people from Intel working on OSL, so I'm deferring to their judgment that this is the best solution. Signed-off-by: Larry Gritz <[email protected]>
- Loading branch information