Skip to content

Commit

Permalink
Added building of ISA specific shared libraries lib_b16_AVX512_oslexe…
Browse files Browse the repository at this point in the history
…c, lib_b8_AVX2_oslexec, lib_b8_AVX512_oslexec, lib_b8_AVX_oslexec to house precompiled OSL library functions that execute over batches of 8 or 16 in SIMD for the ISA. Compiler flags for OpenMP simd code gen and ISA targets has been added for Intel(r) C++ Compiler (ICC) and CLang (newer versions of GCC 6+ might be possible, but untested).

Implement batched llvm code gen for:  generic function calls, useparam, compare ops,  addition, subtraction, multiplication, division, modulus, assignment, component reference, construct triple, construct color, derivative extraction.  Stubbed out all other code gen functions with TBD asserts.

Populate OpDescriptiors with valid wide version of llvm-generating routine

Added wide_opalgebraic.cpp which uses X-macros (instead of #define like llvm_ops.cpp) to define wide(batched) versions of OSL library functions:  sqrt, inversesqrt, floor, ceil, trunc, round, sign, abs, fabs, fmod, and step.
The X-macro wrappers follow a pattern of manufacturing a target specific library function name with enough parameter types embeded in its name to uniquely identify it (vs. other versions).  Then it declares local Wide<T> or Masked<T> wrappers that convert any void */char * parameters to references to Block<T,WidthT> data blocks of wide SOA data.  Then an explicit OpenMP simd loop iterates over the data lanes and extract a local scalar values from the Wide|Masked wrappers, then the scalar implementation of the library function is then inlined using the local scalar values.  Finally the result is writen back out to the data lane inside the Wide|Masked wrapper.  This paradigm allows scalar implementations to be resused inside simd loops and avoid having to use intrinsics or assembly.  It also allows the same implementation to be recompiled for different target ISA's and varius Widths (8|16).  The build system will create copy of each  wide_*.cpp to a target and batch size specific named b(8|16)_(AVX512|AVX2|AVX)_wide*.cpp and build it with different -D__OSL_TARGET_ISA and -D__OSL_WIDTH values which inturn will manufacture uniquie function names.  Sometimes scalar algorithms/functions can be refactored to provide better performance when executing inside a SIMD loop.  sfmath.h (SIMD friendly math) houses these alternative math functions, although many improvements have already been moved into OIIO as they benefit (or do no harm) to scalar code gen.

Made ShadingContext remember the ShaderGroup it just optimized.  This allows symbol queries without actually jitting or executing a shader.

Improved TestShade to not actually execute the shader during setup_output_images, but to instead explicitly JIT scalar or batched version of the ShaderGroup (primarily to make sure JIT happens during the "setup" stage vs. lazily later).
Fix TestShade to explicitly set the number of OIIO worker threads to avoid overhead (and debugging confusion) of OIIO thread pools being created even when "-t 1" was requested.

Modified ShadingSystem to only perform group_post_jit_cleanup (delete operations of shader group) only if both scalar and wide JIT's have occured or if RendererServices doesn't support batching.  Without this changed the operations were being deleted before a batched JIT could occur.

Extended testsuite framework to look for file named "BATCHED" which causes another run of the test with TESTSHADE_BATCHED=1
Added testsuite new tests with BATCHED enabled for passing tgh shaderglobal values, and increased coverage of arithmetic tests with reference images for float, color, point, vector, normal data types along with Dx Dy results.

Added utility macros  __OSL_CONCAT,  __OSL_CONCAT3, ...,  __OSL_CONCAT10 to be able to easily manufacture function names.
Added macro __OSL_WIDE_PVT to give each target specific library its own namespace avoiding collisions should multiple libraries be loaded.
Added sfm::negate(const T &x) with optimized implementation.

Disabled some unreferenced functions warnings for ICC and removed some unused functions from batched_analysis.cpp
Updated BatchedBackendLLVM to match behavior of BackendLLVM by configuring its LLVM_Util based on ShadingSystem attributes.

Disable clang format for X macro based building of initializer arrays to prevent clang format from reordering the #include files.
Fix control flow in factory function TargetLibraryHelper::build to not trigger assert unnecessarily.

Limit list of OSL library functions in builtindecl_wide_xmacro to just those we have implemented so far because all functions listed must exist in the target specific library for it to successfully be loaded and resolved.

Added LLVM_Util::op_zero_if(llvm::Value *cond, llvm::Value *v) which allows its implementation to work around an LLVM issue where expensive instructions to produce the value (div, sqrt, etc) are duplicated (once with a mask, once without).

Fix bug in ShadingSystem::supports_batch_execution_at where jit_fma was being accidentally negated causing rest of logic to fail.
Implement ShadingSystem::BatchedExecutor<WidthT>::jit_group

Signed-off-by: Alex M. Wells <[email protected]>
  • Loading branch information
AlexMWells committed Mar 12, 2021
1 parent 5c95ece commit aacf4b9
Show file tree
Hide file tree
Showing 267 changed files with 3,266 additions and 69 deletions.
16 changes: 15 additions & 1 deletion src/cmake/testing.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ macro ( TESTSUITE )
set (_ats_LABEL "broken")
endif ()
set (test_all_optix $ENV{TESTSUITE_OPTIX})
set (test_all_batched $ENV{TESTSUITE_BATCHED})
# Add the tests if all is well.
set (ALL_TEST_LIST "")
set (_testsuite "${CMAKE_SOURCE_DIR}/testsuite")
Expand Down Expand Up @@ -129,6 +130,18 @@ macro ( TESTSUITE )
add_one_testsuite ("${_testname}.optix.opt" "${_testsrcdir}"
ENV TESTSHADE_OPT=2 TESTSHADE_OPTIX=1 )
endif ()

# When building for Batched support, also run it in Batched mode
# if there is an BATCHED marker file in the directory.
# If an environment variable $TESTSUITE_BATCHED is nonzero, then
# run all tests in Batched mode, even if there's no BATCHED marker.
if ((EXISTS "${_testsrcdir}/BATCHED" OR test_all_batched OR _testname MATCHES "batched")
AND NOT EXISTS "${_testsrcdir}/NOBATCHED"
AND NOT EXISTS "${_testsrcdir}/NOBATCHED-FIXME")
# optimized for right now
add_one_testsuite ("${_testname}.batched.opt" "${_testsrcdir}"
ENV TESTSHADE_OPT=2 TESTSHADE_BATCHED=1 )
endif ()
endforeach ()
if (VERBOSE)
message (STATUS "Added tests: ${ALL_TEST_LIST}")
Expand Down Expand Up @@ -203,7 +216,8 @@ macro (osl_add_all_tests)
render-background render-bumptest
render-cornell render-furnace-diffuse
render-microfacet render-oren-nayar render-veachmis render-ward
select shortcircuit spline splineinverse splineinverse-ident
select shaderglobals shortcircuit
spline splineinverse splineinverse-ident
spline-boundarybug spline-derivbug
string
struct struct-array struct-array-mixture
Expand Down
16 changes: 16 additions & 0 deletions src/include/OSL/oslversion.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,20 @@ namespace @PROJ_NAME@ = @PROJ_NAMESPACE_V@;

#define OSL_SHADER_INSTALL_DIR "@OSL_SHADER_INSTALL_DIR@"

// Macro helpers to build function names based on other macros
#define __OSL_CONCAT_INDIRECT(A, B) A ## B
#define __OSL_CONCAT(A, B) __OSL_CONCAT_INDIRECT(A,B)
#define __OSL_CONCAT3(A, B, C) __OSL_CONCAT(__OSL_CONCAT(A,B),C)
#define __OSL_CONCAT4(A, B, C, D) __OSL_CONCAT(__OSL_CONCAT3(A,B,C),D)
#define __OSL_CONCAT5(A, B, C, D, E) __OSL_CONCAT(__OSL_CONCAT4(A,B,C,D),E)
#define __OSL_CONCAT6(A, B, C, D, E, F) __OSL_CONCAT(__OSL_CONCAT5(A,B,C,D,E),F)
#define __OSL_CONCAT7(A, B, C, D, E, F, G) __OSL_CONCAT(__OSL_CONCAT6(A,B,C,D,E,F),G)
#define __OSL_CONCAT8(A, B, C, D, E, F, G, H) __OSL_CONCAT(__OSL_CONCAT7(A,B,C,D,E,F,G),H)
#define __OSL_CONCAT9(A, B, C, D, E, F, G, H, I) __OSL_CONCAT(__OSL_CONCAT8(A,B,C,D,E,F,G,H),I)
#define __OSL_CONCAT10(A, B, C, D, E, F, G, H, I, J) __OSL_CONCAT(__OSL_CONCAT9(A,B,C,D,E,F,G,H,I),J)

#if defined(__OSL_TARGET_ISA) && defined(__OSL_WIDTH)
#define __OSL_WIDE_PVT __OSL_CONCAT5(b,__OSL_WIDTH,_,__OSL_TARGET_ISA,_pvt)
#endif

#endif /* OSLVERSION_H */
18 changes: 18 additions & 0 deletions src/include/OSL/sfmath.h
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,24 @@ namespace sfm
using std::isinf;
#endif

template<typename T>
OSL_FORCEINLINE OSL_HOSTDEVICE T
negate(const T &x) {
#if OSL_FAST_MATH
// Compiler using a constant bit mask to perform negation,
// and reading a constant involves accessing its memory location.
// Alternatively the compiler can create a 0 value in register
// in a constant time not involving the memory subsystem.
// So we can subtract from 0 to effectively negate a value.
// Handling of +0.0f and -0.0f might differ from IEE here.
// But in graphics practice we don't see a problem with codes
// using this approach and a measurable 10%(+|-5%) performance gain
return T(0) - x;
#else
return -x;
#endif
}

OSL_FORCEINLINE OSL_HOSTDEVICE Dual2<float>
absf (const Dual2<float> &x)
{
Expand Down
236 changes: 236 additions & 0 deletions src/liboslexec/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,157 @@
# SPDX-License-Identifier: BSD-3-Clause
# https://github.com/AcademySoftwareFoundation/OpenShadingLanguage

# NOTE: for any additions/deletions from liboslexec_target_srcs,
# please update wide_target_combine_text_and_rodata.ld
set ( liboslexec_target_srcs
wide/wide_opalgebraic
)

set ( liboslexec_override_limits
)

# Strategy is to make a copy of each cpp in liboslexec_target_srcs for
# each target batch width and ISA combination, applying compiler flags to define
# -D__OSL_WIDTH=[4|8|16]
# -D__OSL_TARGET_ISA=[AVX512|AVX2|AVX|SSE4_2]
# and compiler flags to choose correct target ISA for your compiler
# You may then add/remove the desired ${TARGET_SOURCES_B[4|8|16]_[AVX512|AVX2|AVX|SSE4_2]}
# to your add_library call
foreach(target_src ${liboslexec_target_srcs})

set(DST_B16_AVX512 "${CMAKE_CURRENT_BINARY_DIR}/${target_src}_b16_AVX512.cpp")
set(DST_B16_AVX512_NOFMA "${CMAKE_CURRENT_BINARY_DIR}/${target_src}_b16_AVX512_noFMA.cpp")
set(DST_B8_AVX512 "${CMAKE_CURRENT_BINARY_DIR}/${target_src}_b8_AVX512.cpp")
set(DST_B8_AVX512_NOFMA "${CMAKE_CURRENT_BINARY_DIR}/${target_src}_b8_AVX512_noFMA.cpp")
set(DST_B8_AVX2 "${CMAKE_CURRENT_BINARY_DIR}/${target_src}_b8_AVX2.cpp")
set(DST_B8_AVX2_NOFMA "${CMAKE_CURRENT_BINARY_DIR}/${target_src}_b8_AVX2_noFMA.cpp")
set(DST_B8_AVX "${CMAKE_CURRENT_BINARY_DIR}/${target_src}_b8_AVX.cpp")

set(SRC "${CMAKE_CURRENT_SOURCE_DIR}/${target_src}.cpp")

CONFIGURE_FILE("${SRC}" ${DST_B16_AVX512} COPYONLY)
CONFIGURE_FILE("${SRC}" ${DST_B16_AVX512_NOFMA} COPYONLY)
CONFIGURE_FILE(${SRC} ${DST_B8_AVX512} COPYONLY)
CONFIGURE_FILE(${SRC} ${DST_B8_AVX512_NOFMA} COPYONLY)
CONFIGURE_FILE(${SRC} ${DST_B8_AVX2} COPYONLY)
CONFIGURE_FILE(${SRC} ${DST_B8_AVX2_NOFMA} COPYONLY)
CONFIGURE_FILE(${SRC} ${DST_B8_AVX} COPYONLY)

list(APPEND TARGET_SOURCES_B16_AVX512 "${DST_B16_AVX512}")
list(APPEND TARGET_SOURCES_B16_AVX512_NOFMA "${DST_B16_AVX512_NOFMA}")
list(APPEND TARGET_SOURCES_B8_AVX512 "${DST_B8_AVX512}")
list(APPEND TARGET_SOURCES_B8_AVX512_NOFMA "${DST_B8_AVX512_NOFMA}")
list(APPEND TARGET_SOURCES_B8_AVX2 "${DST_B8_AVX2}")
list(APPEND TARGET_SOURCES_B8_AVX2_NOFMA "${DST_B8_AVX2_NOFMA}")
list(APPEND TARGET_SOURCES_B8_AVX "${DST_B8_AVX}")

set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}" "${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}" "${DST_B8_AVX}" "${DST_B4_SSE4_2}"
APPEND PROPERTY COMPILE_OPTIONS
"-I${CMAKE_CURRENT_SOURCE_DIR}/wide"
"-I${CMAKE_CURRENT_SOURCE_DIR}/../liboslnoise/wide"
)

set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}"
APPEND PROPERTY COMPILE_DEFINITIONS "__OSL_WIDTH=16")
set_property(SOURCE "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}" "${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}" "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_DEFINITIONS "__OSL_WIDTH=8")

set_property(SOURCE "${DST_B16_AVX512}" "${DST_B8_AVX512}"
APPEND PROPERTY COMPILE_DEFINITIONS "__OSL_TARGET_ISA=AVX512")
set_property(SOURCE "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512_NOFMA}"
APPEND PROPERTY COMPILE_DEFINITIONS "__OSL_TARGET_ISA=AVX512_noFMA")
set_property(SOURCE "${DST_B8_AVX2}"
APPEND PROPERTY COMPILE_DEFINITIONS "__OSL_TARGET_ISA=AVX2")
set_property(SOURCE "${DST_B8_AVX2_NOFMA}"
APPEND PROPERTY COMPILE_DEFINITIONS "__OSL_TARGET_ISA=AVX2_noFMA")
set_property(SOURCE "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_DEFINITIONS "__OSL_TARGET_ISA=AVX")

if (CMAKE_COMPILER_IS_INTEL)
if (MSVC)
set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "/QxCORE-AVX512" "/Qopt-zmm-usage:high")
set_property(SOURCE "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "/QxCORE-AVX512" "/Qopt-zmm-usage:low")
set_property(SOURCE "${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "/QxCORE-AVX2")
set_property(SOURCE "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_OPTIONS "/QxAVX")

set_property(SOURCE "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512_NOFMA}" "${DST_B8_AVX2_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "/Qfma-")

if (${target_src} IN_LIST liboslexec_override_limits)
set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}"
"${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}" "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_OPTIONS "/Qoverride-limits")
endif()
set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}"
"${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}" "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_OPTIONS "/Qprec-div" "/Qqopt-report:5")
else ()
set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "-xCORE-AVX512" "-qopt-zmm-usage=high")
set_property(SOURCE "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "-xCORE-AVX512" "-qopt-zmm-usage=low")
set_property(SOURCE "${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "-xCORE-AVX2")
set_property(SOURCE "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_OPTIONS "-xAVX")

set_property(SOURCE "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512_NOFMA}" "${DST_B8_AVX2_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "-no-fma")

if (${target_src} IN_LIST liboslexec_override_limits)
set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}"
"${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}" "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_OPTIONS "-qoverride-limits")
endif()
set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}"
"${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}" "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_OPTIONS "-prec-div" "-qopt-report=5")
endif ()
endif ()

if (CMAKE_COMPILER_IS_CLANG OR CMAKE_COMPILER_IS_APPLECLANG)
# REQUIRES CLANG 7+ to use sucessfully vectorize with OpenMP simd
set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "-march=skylake-avx512")
if (CLANG_VERSION_STRING VERSION_GREATER 7.0)
set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "-mprefer-vector-width=512")
set_property(SOURCE "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "-mprefer-vector-width=256")
endif ()
set_property(SOURCE "${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "-march=core-avx2")
set_property(SOURCE "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_OPTIONS "-march=corei7-avx")

set_property(SOURCE "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512_NOFMA}" "${DST_B8_AVX2_NOFMA}"
APPEND PROPERTY COMPILE_OPTIONS "-mno-fma")

# large SIMD function loops will exceed llvm's -inline-threshold default of 225.
# remark: loop not vectorized: call instruction cannot be vectorized [-Rpass-analysis]
# choose to increase that limit via compiler flags vs.
# workaround with __attribute__((flatten))
set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}"
"${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}" "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_OPTIONS "-mllvm" "-inline-threshold=100000")

# For loops with small loop bodies, clang was unrolling the loop before
# #pragma omp simd
# was processed. Thus there was no loop left for the pragma to apply to
# To work around this, for these wide SIMD libraries, we will disable
# clang's loop unrolling. We really don't want to do this and wish to remove
# it as soon as clang addresses this issue.
set_property(SOURCE "${DST_B16_AVX512}" "${DST_B16_AVX512_NOFMA}" "${DST_B8_AVX512}" "${DST_B8_AVX512_NOFMA}"
"${DST_B8_AVX2}" "${DST_B8_AVX2_NOFMA}" "${DST_B8_AVX}"
APPEND PROPERTY COMPILE_OPTIONS "-fno-unroll-loops")
endif ()

endforeach(target_src)

set (local_lib oslexec)
set (lib_src
shadingsys.cpp closure.cpp
Expand Down Expand Up @@ -185,6 +336,83 @@ else ()
set (lib_src ${lib_src} llvm_ops.cpp)
endif ()

# Always build target specific libraries for batched execution as shared
# For each target specific shared library being built, we need
# we need to let a couple source files know so they can build
# function mappings for those targets
set_property(SOURCE "batched_llvm_instance.cpp" "shadingsys.cpp"
APPEND PROPERTY COMPILE_DEFINITIONS
"__OSL_SUPPORTS_B16_AVX512"
"__OSL_SUPPORTS_B8_AVX512"
"__OSL_SUPPORTS_B8_AVX2"
"__OSL_SUPPORTS_B8_AVX"
#"__OSL_SUPPORTS_B16_AVX512_NOFMA"
#"__OSL_SUPPORTS_B8_AVX512_NOFMA"
#"__OSL_SUPPORTS_B8_AVX2_NOFMA"
)


set (_b16_AVX512_oslexec_lib _b16_AVX512_oslexec)
set (_b8_AVX512_oslexec_lib _b8_AVX512_oslexec)
set (_b8_AVX2_oslexec_lib _b8_AVX2_oslexec)
set (_b8_AVX_oslexec_lib _b8_AVX_oslexec)
set (_b16_AVX512_noFMA_oslexec_lib _b16_AVX512_noFMA_oslexec)
set (_b8_AVX512_noFMA_oslexec_lib _b8_AVX512_noFMA_oslexec)
set (_b8_AVX2_noFMA_oslexec_lib _b8_AVX2_noFMA_oslexec)

add_library ( ${_b16_AVX512_oslexec_lib} SHARED "${TARGET_SOURCES_B16_AVX512}" )
add_library ( ${_b8_AVX512_oslexec_lib} SHARED "${TARGET_SOURCES_B8_AVX512}" )
add_library ( ${_b8_AVX2_oslexec_lib} SHARED "${TARGET_SOURCES_B8_AVX2}" )
add_library ( ${_b8_AVX_oslexec_lib} SHARED "${TARGET_SOURCES_B8_AVX}" )
#add_library ( ${_b16_AVX512_noFMA_oslexec_lib} SHARED "${TARGET_SOURCES_B16_AVX512_NOFMA}" )
#add_library ( ${_b8_AVX512_noFMA_oslexec_lib} SHARED "${TARGET_SOURCES_B8_AVX512_NOFMA}" )
#add_library ( ${_b8_AVX2_noFMA_oslexec_lib} SHARED "${TARGET_SOURCES_B8_AVX2_NOFMA}" )

set ( liboslexec_target_libs
${_b16_AVX512_oslexec_lib}
${_b8_AVX512_oslexec_lib}
${_b8_AVX2_oslexec_lib}
${_b8_AVX_oslexec_lib}
# ${_b16_AVX512_noFMA_oslexec_lib}
# ${_b8_AVX512_noFMA_oslexec_lib}
# ${_b8_AVX2_noFMA_oslexec_lib}
)

foreach(target_lib ${liboslexec_target_libs})
target_include_directories (${target_lib}
PUBLIC
${CMAKE_INSTALL_FULL_INCLUDEDIR}
${ILMBASE_INCLUDES}
)
target_compile_definitions (${target_lib}
PRIVATE
OSL_EXPORTS
CUDA_TARGET_ARCH="${CUDA_TARGET_ARCH}"
OSL_CUDA_VERSION="${CUDA_VERSION}"
OSL_OPTIX_VERSION="${OPTIX_VERSION}"
)
target_link_libraries (${target_lib}
PUBLIC
OpenImageIO::OpenImageIO
# For OpenEXR/Imath 3.x:
$<$<TARGET_EXISTS:Imath::Imath>:Imath::Imath>
$<$<TARGET_EXISTS:Imath::Half>:Imath::Half>
# For OpenEXR >= 2.4/2.5 with reliable exported targets
$<$<TARGET_EXISTS:IlmBase::Imath>:IlmBase::Imath>
$<$<TARGET_EXISTS:IlmBase::Half>:IlmBase::Half>
$<$<TARGET_EXISTS:IlmBase::IlmThread>:IlmBase::IlmThread>
$<$<TARGET_EXISTS:IlmBase::Iex>:IlmBase::Iex>
# For OpenEXR <= 2.3:
${ILMBASE_LIBRARIES}
PRIVATE
pugixml::pugixml
${ZLIB_LIBRARIES}
${Boost_LIBRARIES} ${CMAKE_DL_LIBS}
${CLANG_LIBRARIES}
${LLVM_LIBRARIES} ${LLVM_LDFLAGS} ${LLVM_SYSTEM_LIBRARIES}
)
endforeach(target_lib)

add_library (${local_lib} ${lib_src})
target_include_directories (${local_lib}
PUBLIC
Expand Down Expand Up @@ -255,6 +483,14 @@ endif ()

# add_dependencies (${local_lib} "${CMAKE_SOURCE_DIR}/src/build-scripts/hidesymbols.map")

install_targets ( _b16_AVX512_oslexec )
install_targets ( _b8_AVX512_oslexec )
install_targets ( _b8_AVX2_oslexec )
install_targets ( _b8_AVX_oslexec )
#install_targets ( _b16_AVX512_noFMA_oslexec )
#install_targets ( _b8_AVX512_noFMA_oslexec )
#install_targets ( _b8_AVX2_noFMA_oslexec )

install_targets (${local_lib})

# Unit tests
Expand Down
Loading

0 comments on commit aacf4b9

Please sign in to comment.