-
Notifications
You must be signed in to change notification settings - Fork 372
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added building of ISA specific shared libraries lib_b16_AVX512_oslexe…
…c, lib_b8_AVX2_oslexec, lib_b8_AVX512_oslexec, lib_b8_AVX_oslexec to house precompiled OSL library functions that execute over batches of 8 or 16 in SIMD for the ISA. Compiler flags for OpenMP simd code gen and ISA targets has been added for Intel(r) C++ Compiler (ICC) and CLang (newer versions of GCC 6+ might be possible, but untested). Implement batched llvm code gen for: generic function calls, useparam, compare ops, addition, subtraction, multiplication, division, modulus, assignment, component reference, construct triple, construct color, derivative extraction. Stubbed out all other code gen functions with TBD asserts. Populate OpDescriptiors with valid wide version of llvm-generating routine Added wide_opalgebraic.cpp which uses X-macros (instead of #define like llvm_ops.cpp) to define wide(batched) versions of OSL library functions: sqrt, inversesqrt, floor, ceil, trunc, round, sign, abs, fabs, fmod, and step. The X-macro wrappers follow a pattern of manufacturing a target specific library function name with enough parameter types embeded in its name to uniquely identify it (vs. other versions). Then it declares local Wide<T> or Masked<T> wrappers that convert any void */char * parameters to references to Block<T,WidthT> data blocks of wide SOA data. Then an explicit OpenMP simd loop iterates over the data lanes and extract a local scalar values from the Wide|Masked wrappers, then the scalar implementation of the library function is then inlined using the local scalar values. Finally the result is writen back out to the data lane inside the Wide|Masked wrapper. This paradigm allows scalar implementations to be resused inside simd loops and avoid having to use intrinsics or assembly. It also allows the same implementation to be recompiled for different target ISA's and varius Widths (8|16). The build system will create copy of each wide_*.cpp to a target and batch size specific named b(8|16)_(AVX512|AVX2|AVX)_wide*.cpp and build it with different -D__OSL_TARGET_ISA and -D__OSL_WIDTH values which inturn will manufacture uniquie function names. Sometimes scalar algorithms/functions can be refactored to provide better performance when executing inside a SIMD loop. sfmath.h (SIMD friendly math) houses these alternative math functions, although many improvements have already been moved into OIIO as they benefit (or do no harm) to scalar code gen. Made ShadingContext remember the ShaderGroup it just optimized. This allows symbol queries without actually jitting or executing a shader. Improved TestShade to not actually execute the shader during setup_output_images, but to instead explicitly JIT scalar or batched version of the ShaderGroup (primarily to make sure JIT happens during the "setup" stage vs. lazily later). Fix TestShade to explicitly set the number of OIIO worker threads to avoid overhead (and debugging confusion) of OIIO thread pools being created even when "-t 1" was requested. Modified ShadingSystem to only perform group_post_jit_cleanup (delete operations of shader group) only if both scalar and wide JIT's have occured or if RendererServices doesn't support batching. Without this changed the operations were being deleted before a batched JIT could occur. Extended testsuite framework to look for file named "BATCHED" which causes another run of the test with TESTSHADE_BATCHED=1 Added testsuite new tests with BATCHED enabled for passing tgh shaderglobal values, and increased coverage of arithmetic tests with reference images for float, color, point, vector, normal data types along with Dx Dy results. Added utility macros __OSL_CONCAT, __OSL_CONCAT3, ..., __OSL_CONCAT10 to be able to easily manufacture function names. Added macro __OSL_WIDE_PVT to give each target specific library its own namespace avoiding collisions should multiple libraries be loaded. Added sfm::negate(const T &x) with optimized implementation. Disabled some unreferenced functions warnings for ICC and removed some unused functions from batched_analysis.cpp Updated BatchedBackendLLVM to match behavior of BackendLLVM by configuring its LLVM_Util based on ShadingSystem attributes. Disable clang format for X macro based building of initializer arrays to prevent clang format from reordering the #include files. Fix control flow in factory function TargetLibraryHelper::build to not trigger assert unnecessarily. Limit list of OSL library functions in builtindecl_wide_xmacro to just those we have implemented so far because all functions listed must exist in the target specific library for it to successfully be loaded and resolved. Added LLVM_Util::op_zero_if(llvm::Value *cond, llvm::Value *v) which allows its implementation to work around an LLVM issue where expensive instructions to produce the value (div, sqrt, etc) are duplicated (once with a mask, once without). Fix bug in ShadingSystem::supports_batch_execution_at where jit_fma was being accidentally negated causing rest of logic to fail. Implement ShadingSystem::BatchedExecutor<WidthT>::jit_group Signed-off-by: Alex M. Wells <[email protected]>
- Loading branch information
1 parent
5c95ece
commit aacf4b9
Showing
267 changed files
with
3,266 additions
and
69 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.