Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Update docs to reflect PI removal. #15057

Merged
merged 20 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ sycl/include/sycl/detail/ur.hpp @intel/unified-runtime-reviewers
sycl/source/detail/posix_ur.cpp @intel/unified-runtime-reviewers
sycl/source/detail/ur.cpp @intel/unified-runtime-reviewers
sycl/source/detail/windows_ur.cpp @intel/unified-runtime-reviewers
sycl/test-e2e/Plugin/ @intel/unified-runtime-reviewers
sycl/test-e2e/Adapters/ @intel/unified-runtime-reviewers

# Win Proxy Loader
sycl/pi_win_proxy_loader @intel/llvm-reviewers-runtime
sycl/test-e2e/Plugin/dll-detach-order.cpp @intel/llvm-reviewers-runtime
sycl/ur_win_proxy_loader @intel/llvm-reviewers-runtime
sycl/test-e2e/Adapters/dll-detach-order.cpp @intel/llvm-reviewers-runtime

# CUDA specific runtime implementations
sycl/include/sycl/ext/oneapi/experimental/cuda/ @intel/llvm-reviewers-cuda
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/sycl-linux-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,6 @@ jobs:
--cmake-opt=-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
--cmake-opt="-DLLVM_INSTALL_UTILS=ON" \
--cmake-opt="-DNATIVECPU_USE_OCK=Off" \
--cmake-opt="-DSYCL_PI_TESTS=OFF" \
--cmake-opt="-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=SPIRV"
- name: Compile
id: build
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/sycl-macos-build-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,5 @@ jobs:
--cmake-opt=-DCMAKE_C_COMPILER_LAUNCHER=ccache \
--cmake-opt=-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
--cmake-opt="-DLLVM_INSTALL_UTILS=ON" \
--cmake-opt="-DSYCL_PI_TESTS=OFF"
- name: Compile
run: cmake --build $GITHUB_WORKSPACE/build --target deploy-sycl-toolchain
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ To contribute:
- [The seven rules of a great Git commit message](https://cbea.ms/git-commit)
are recommended read and follow.
- To a reasonable extent, title tags can be used to signify the component
changed, e.g.: `[PI]`, `[CUDA]`, `[Doc]`.
changed, e.g.: `[UR]`, `[CUDA]`, `[Doc]`.
- Create a pull request (PR) for your changes following
[Creating a pull request instructions](https://help.github.com/articles/creating-a-pull-request/).
- Make sure PR has a good description explaining all of the changes made,
Expand Down
10 changes: 6 additions & 4 deletions buildbot/configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ def do_configure(args):
if sys.platform != "darwin":
sycl_enabled_backends.append("level_zero")

# lld is needed on Windows or for the HIP plugin on AMD
# lld is needed on Windows or for the HIP adapter on AMD
if platform.system() == "Windows" or (args.hip and args.hip_platform == "AMD"):
llvm_enable_projects += ";lld"

Expand Down Expand Up @@ -152,8 +152,8 @@ def do_configure(args):
libclc_targets_to_build += libclc_nvidia_target_names
libclc_gen_remangled_variants = "ON"

if args.enable_plugin:
sycl_enabled_backends += args.enable_plugin
if args.enable_backends:
sycl_enabled_backends += args.enable_backends

if args.disable_preview_lib:
sycl_preview_lib = "OFF"
Expand Down Expand Up @@ -372,7 +372,9 @@ def main():
parser.add_argument(
"--ci-defaults", action="store_true", help="Enable default CI parameters"
)
parser.add_argument("--enable-plugin", action="append", help="Enable SYCL plugin")
parser.add_argument(
"--enable-backends", action="append", help="Enable SYCL backend"
)
parser.add_argument(
"--disable-preview-lib",
action="store_true",
Expand Down
64 changes: 0 additions & 64 deletions sycl/cmake/modules/AddSYCL.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -33,67 +33,3 @@ function(add_sycl_library LIB_NAME TYPE)
# TODO remove add_common_options
add_common_options(${LIB_NAME})
endfunction()

# current ur adapter dependency managing is a bit hacky, we should try to copy
# this as closely as possible instead
function(add_sycl_plugin PLUGIN_NAME)
cmake_parse_arguments("ARG"
""
""
"SOURCES;INCLUDE_DIRS;LIBRARIES;HEADER"
${ARGN}
)

add_sycl_library("pi_${PLUGIN_NAME}" SHARED
LINKER_SCRIPT "${PROJECT_SOURCE_DIR}/plugins/ld-version-script.txt"
SOURCES ${ARG_SOURCES}
INCLUDE_DIRS
${ARG_INCLUDE_DIRS}
${sycl_inc_dir}
LIBRARIES
${ARG_LIBRARIES}
OpenCL-Headers
)

# All SYCL plugins use UR sources.
# Disable errors from warnings and apply other workarounds while building the UR.
if(WIN32)
target_compile_options("pi_${PLUGIN_NAME}" PRIVATE /WX- /UUNICODE /DUSE_Z7=ON)
else()
target_compile_options("pi_${PLUGIN_NAME}" PRIVATE -Wno-error)
endif()

# Install feature test header
if (NOT "${ARG_HEADER}" STREQUAL "")
get_filename_component(HEADER_NAME ${ARG_HEADER} NAME)
configure_file(
${ARG_HEADER}
${SYCL_INCLUDE_BUILD_DIR}/sycl/detail/plugins/${PLUGIN_NAME}/${HEADER_NAME}
COPYONLY)

install(FILES ${ARG_HEADER}
DESTINATION ${SYCL_INCLUDE_DIR}/sycl/detail/plugins/${PLUGIN_NAME}
COMPONENT pi_${PLUGIN_NAME})
endif()

install(TARGETS pi_${PLUGIN_NAME}
LIBRARY DESTINATION "lib${LLVM_LIBDIR_SUFFIX}" COMPONENT pi_${PLUGIN_NAME}
RUNTIME DESTINATION "bin" COMPONENT pi_${PLUGIN_NAME})

set (manifest_file
${CMAKE_CURRENT_BINARY_DIR}/install_manifest_pi_${PLUGIN_NAME}.txt)
add_custom_command(OUTPUT ${manifest_file}
COMMAND "${CMAKE_COMMAND}"
"-DCMAKE_INSTALL_COMPONENT=pi_${PLUGIN_NAME}"
-P "${CMAKE_BINARY_DIR}/cmake_install.cmake"
COMMENT "Deploying component pi_${PLUGIN_NAME}"
USES_TERMINAL
)
add_custom_target(install-sycl-plugin-${PLUGIN_NAME}
DEPENDS
${manifest_file} pi_${PLUGIN_NAME}
)

set_property(GLOBAL APPEND PROPERTY SYCL_TOOLCHAIN_INSTALL_COMPONENTS
pi_${PLUGIN_NAME})
endfunction()
25 changes: 12 additions & 13 deletions sycl/doc/EnvironmentVariables.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ compiler and runtime.
| `SYCL_ENABLE_DEFAULT_CONTEXTS` | '1' or '0' | Enable ('1') or disable ('0') creation of default platform contexts in SYCL runtime. The default context for each platform contains all devices in the platform. Refer to [Platform Default Contexts](extensions/supported/sycl_ext_oneapi_default_context.asciidoc) extension to learn more. Enabled by default on Linux and disabled on Windows. |
| `SYCL_RT_WARNING_LEVEL` | Positive integer | The higher warning level is used the more warnings and performance hints the runtime library may print. Default value is '0', which means no warning/hint messages from the runtime library are allowed. The value '1' enables performance warnings from device runtime/codegen. The values greater than 1 are reserved for future use. |
| `SYCL_USM_HOSTPTR_IMPORT` | Integer | Enable by specifying non-zero value. Buffers created with a host pointer will result in host data promotion to USM, improving data transfer performance. To use this feature, also set SYCL_HOST_UNIFIED_MEMORY=1. |
| `SYCL_EAGER_INIT` | Integer | Enable by specifying non-zero value. Tells the SYCL runtime to do as much as possible initialization at objects construction as opposed to doing lazy initialization on the fly. This may mean doing some redundant work at warmup but ensures fastest possible execution on the following hot and reportable paths. It also instructs PI plugins to do the same. Default is "0". |
| `SYCL_EAGER_INIT` | Integer | Enable by specifying non-zero value. Tells the SYCL runtime to do as much as possible initialization at objects construction as opposed to doing lazy initialization on the fly. This may mean doing some redundant work at warmup but ensures fastest possible execution on the following hot and reportable paths. It also instructs UR adapters to do the same. Default is "0". |
| `SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE` | See [below](#sycl_reduction_preferred_workgroup_size) | Controls the preferred work-group size of reductions. |
| `SYCL_ENABLE_FUSION_CACHING` | '1' or '0' | Enable ('1') or disable ('0') caching of JIT compilations for kernel fusion. Caching avoids repeatedly running the JIT compilation pipeline if the same sequence of kernels is fused multiple times. Default value is '1'. |
| `SYCL_JIT_AMDGCN_PTX_KERNELS` | '1' or '0' | Enable ('1') or disable ('0') JIT compilation of kernels. Only supported for Nvidia and AMD backends. Note, that it is required to have a valid binary for the desired backend (AMD or CUDA), that was compiled with `-fsycl-embed-ir` in order to use JIT-ing. When JIT-ing is enabled SYCL runtime will try to cache and reuse JIT-compiled kernels, furthermore if a kernel uses specialization constants the compiler will attempt to materialize the values in place, turning them to de-facto compile time constants. Default is '0'. |
Expand Down Expand Up @@ -152,23 +152,23 @@ For a description of parallel for range rounding in DPC++ see
| | | `MinRangeX`: The minimum X dimension of the range such that range rounding is activated (Default 1024) |


## Controlling DPC++ Level Zero Plugin
## Controlling DPC++ Level Zero Adapter

| Environment variable | Values | Description |
| -------------------- | ------ | ----------- |
| `SYCL_ENABLE_PCI` (Deprecated) | Integer | When set to 1, enables obtaining the GPU PCI address when using the Level Zero backend. The default is 1. This option is kept for compatibility reasons and is immediately deprecated. |
| `SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR` | Any(\*) | Disable USM allocator in Level Zero plugin (each memory request will go directly to Level Zero runtime) |
| `SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY` | Any(\*) | Enable support of the kernels with indirect access and corresponding deferred release of memory allocations in the Level Zero plugin. |
| `SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR` | Any(\*) | Disable USM allocator in Level Zero adapter (each memory request will go directly to Level Zero runtime) |
| `SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY` | Any(\*) | Enable support of the kernels with indirect access and corresponding deferred release of memory allocations in the Level Zero adapter. |

`(*) Note: Any means this environment variable is effective when set to any non-null value.`

## Controlling DPC++ CUDA Plugin
## Controlling DPC++ CUDA Adapter

| Environment variable | Values | Description |
| -------------------- | ------ | ----------- |
| `SYCL_PI_CUDA_MAX_LOCAL_MEM_SIZE` | Integer | Specifies the maximum size of a local memory allocation in bytes. If the value exceeds the device's capabilities then a `sycl::runtime_error` is thrown. In order for the full error message to be printed, `SYCL_RT_WARNING_LEVEL=2` must be set. The default value for `SYCL_PI_CUDA_MAX_LOCAL_MEM_SIZE` is determined by the hardware. |

## Controlling DPC++ HIP Plugin
## Controlling DPC++ HIP Adapter

| Environment variable | Values | Description |
| -------------------- | ------ | ----------- |
Expand Down Expand Up @@ -231,7 +231,6 @@ variables in production code.</span>
| after_addHostAcc | print graph after addHostAccessor method |
| always | print graph before and after each of the above methods |


### `SYCL_UR_TRACE` Options

`SYCL_UR_TRACE` accepts a bit-mask, so individual tracing types can be enabled.
Expand All @@ -246,7 +245,7 @@ Supported tracing levels are in the table below
| -1 | Enable all levels of tracing |


## Debugging variables for Level Zero Plugin
## Debugging variables for Level Zero Adapter

:warning: **Warning:** <span style="color:red">the environment variables
described below are used for development and debugging of DPC++ compiler
Expand All @@ -255,15 +254,15 @@ variables in production code.</span>

| Environment variable | Values | Description |
| -------------------- | ------ | ----------- |
| `SYCL_PI_LEVEL_ZERO_SINGLE_THREAD_MODE` | Integer | A single-threaded app has an opportunity to enable this mode to avoid overhead from mutex locking in the Level Zero plugin. A value greater than 0 enables single thread mode. A value of 0 disables single thread mode. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_SINGLE_THREAD_MODE` | Integer | A single-threaded app has an opportunity to enable this mode to avoid overhead from mutex locking in the Level Zero adapter. A value greater than 0 enables single thread mode. A value of 0 disables single thread mode. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR` | [EnableBuffers][;[MaxPoolSize][;[host\|device\|shared:][MaxPoolableSize][,[Capacity][,SlabMinSize]]]...] | EnableBuffers enables pooling for SYCL buffers, default 1, set to 0 to disable. MaxPoolSize is the maximum size of the pool, by default there is no size limit. MemType is host, device, shared or read_only_shared. Other parameters are values specified as positive integers with optional K, M or G suffix. MaxPoolableSize is the maximum allocation size that may be pooled, default 0 for shared, 2MB for host, 4MB for device and read_only_shared. Capacity is the number of allocations in each size range freed by the program but retained in the pool for reallocation, default 4. Size ranges follow this pattern: 64, 96, 128, 192, and so on, i.e., powers of 2, with one range in between. SlabMinSize is the minimum allocation size, 64KB for host and device, 2MB for shared and read_only_shared. Example: SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=1;32M;host:1M,4,64K;device:1M,4,64K;shared:0,0,2M|
| `SYCL_PI_LEVEL_ZERO_BATCH_SIZE` | Integer | Sets a preferred number of compute commands to batch into a command list before executing the command list. A value of 0 causes the batch size to be adjusted dynamically. A value greater than 0 specifies fixed size batching, with the batch size set to the specified value. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_COPY_BATCH_SIZE` | Integer | Sets a preferred number of copy commands to batch into a command list before executing the command list. A value of 0 causes the batch size to be adjusted dynamically. A value greater than 0 specifies fixed size batching, with the batch size set to the specified value. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_FILTER_EVENT_WAIT_LIST` | Integer | When set to 0, disables filtering of signaled events from wait lists when using the Level Zero backend. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Any(\*) | This environment variable enables users to control use of copy engines for copy operations. If the value is an integer, it will allow the use of copy engines, if available in the device, in Level Zero plugin to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The value of this environment variable can also be a pair of the form "lower_index:upper_index" where the indices point to copy engines in a list of all available copy engines. The default is 0:0 when immediate command lists are being used on the device and 1 otherwise. (Also see description of SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS). |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Any(\*) | This environment variable enables users to control use of copy engines for copy operations. If the value is an integer, it will allow the use of copy engines, if available in the device, in Level Zero adapter to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The value of this environment variable can also be a pair of the form "lower_index:upper_index" where the indices point to copy engines in a list of all available copy engines. The default is 0:0 when immediate command lists are being used on the device and 1 otherwise. (Also see description of SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS). |
| `SYCL_PI_LEVEL_ZERO_USE_COMPUTE_ENGINE` | Integer | It can be set to an integer (>=0) in which case all compute commands will be submitted to the command-queue with the given index in the compute command group. If it is instead set to a negative value then all available compute engines may be used. The default value is "0" |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
| `SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS` | Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=1 the Level Zero plugin would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 the Level Zero plugin would create all events having device-scope and add proxy host-visible event at the end of each command-list submission. The default is 0, meaning all events have host visibility. SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS is ignored when using immediate command lists (SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1) and all events use default scope of 0. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero adapter for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
| `SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS` | Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=1 the Level Zero adapter would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 the Level Zero adapter would create all events having device-scope and add proxy host-visible event at the end of each command-list submission. The default is 0, meaning all events have host visibility. SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS is ignored when using immediate command lists (SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1) and all events use default scope of 0. |
| `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` | Integer | When set to a positive value enables use of Level Zero immediate commandlists, which means there is no batching and all commands are immediately submitted for execution. When set to 1, unique immediate commandlists are created for each SYCL queue. When set to 2, unique immediate commandlists are created per host thread per SYCL queue. Default is 1 on Intel® Data Center GPU Max Series running Linux and 0 elsewhere. |
| `SYCL_PI_LEVEL_ZERO_USE_MULTIPLE_COMMANDLIST_BARRIERS` | Integer | When set to a positive value enables use of multiple Level Zero commandlists when submitting barriers. Default is 1. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_FILL` | Integer | When set to a positive value enables use of a copy engine for memory fill operations. Default is 0. |
Expand All @@ -275,7 +274,7 @@ variables in production code.</span>
| `SYCL_PI_LEVEL_ZERO_USM_RESIDENT` | Integer | Bit-mask controls if/where to make USM allocations resident at the time of allocation. Input value is of the form 0xHSD, where 4-bits of D control device allocations, 4-bits of S control shared allocations, and 4-bits of H control host allocations. Each 4-bit component is holding one of the following values: "0" - then no special residency is forced, "1" - then allocation is made resident at the device of allocation, or "2" - then allocation is made resident on all devices in the context of allocation that have P2P access to the device of allocation. Default is 0x002, i.e. force full residency for device allocations only. |
| `SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D` | Integer | When set to a positive value enables the use of Level Zero USM 2D memory copy operations. Default is 0. |

## Debugging variables for CUDA Plugin
## Debugging variables for CUDA Adapter

:warning: **Warning:** <span style="color:red">the environment variables
described below are used for development and debugging of DPC++ compiler
Expand Down
Loading
Loading