[SYCL][CUDA] accessor_api_image CTS test is failing #2119

againull · 2020-07-15T07:36:46Z

Build compiler
git clone https://github.com/intel/llvm
Hash: b00fb7c

Includes: #1990, #1977

python /localdisk2/ws/againull/sycl/llvm/buildbot/configure.py --cuda -o /localdisk2/ws/againull/sycl/build
python /localdisk2/ws/againull/sycl/llvm/buildbot/compile.py -o /localdisk2/ws/againull/sycl/build

Build accessor CTS tests
git clone https://github.com/KhronosGroup/SYCL-CTS.git
Hash: 9cbe1a719b25c269ef78a2ee08f2e5ed12a1cc6d

Applied: KhronosGroup/SYCL-CTS#52

cmake -G Ninja -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DINTEL_SYCL_ROOT=/localdisk2/ws/againull/sycl/build -DINTEL_SYCL_TRIPLE=nvptx64-nvidia-cuda-sycldevice -DSYCL_IMPLEMENTATION=Intel_SYCL -DSYCL_CTS_ENABLE_OPENCL_INTEROP_TESTS=Off -DSYCL_CTS_ENABLE_DOUBLE_TESTS=On -DSYCL_CTS_ENABLE_HALF_TESTS=On -DINTEL_SYCL_FLAGS="-Xsycl-target-backend;--cuda-gpu-arch=sm_50" -DOpenCL_INCLUDE_DIR=/localdisk2/ws/againull/sycl/build/include/sycl -DOpenCL_LIBRARY=/localdisk2/ws/againull/sycl/build/lib/libOpenCL.so ..

ninja test_accessor -j 12

Run accessor_api_image CTS test
=> ./bin/test_accessor -p nvidia -d opencl_gpu --test accessor_api_image
--- accessor_api_image
. accessor<vec<int32_t, 4>, 1, mode{1024}, target{2017}>
. Checking get_range

PI CUDA ERROR:
Value: 500
Name: CUDA_ERROR_NOT_FOUND
Description: named symbol not found
Function: build_program
Source Location: /iusers/againull/sycl/llvm/sycl/plugins/cuda/pi_cuda.cpp:468

PI CUDA ERROR:
Value: 400
Name: CUDA_ERROR_INVALID_HANDLE
Description: invalid resource handle
Function: cuda_piProgramRelease
Source Location: /iusers/againull/sycl/llvm/sycl/plugins/cuda/pi_cuda.cpp:2807

. sycl exception caught
. what - The program was built for 1 devices
Build program log for 'GeForce GTX 1060 6GB':
-999 (Unknown OpenCL error code)
. line: 63
. a SYCL exception was caught: The program was built for 1 devices
Build program log for 'GeForce GTX 1060 6GB':
-999 (Unknown OpenCL error code)

fail

--- accessor_api_image_fp16
. Device does not support half precision floating point operations

pass

. Passed 1/14 tests (7%)

The text was updated successfully, but these errors were encountered:

bader · 2020-07-22T09:02:38Z

@againull, @pvchupin, I think we figured out that this was caused by the regression in the driver. Can we close this one?

againull · 2020-07-22T16:36:00Z

@bader
Yes. Test passes with 435.21 nvidia driver when KhronosGroup/SYCL-CTS#52 is applied. Do you know when this PR it is going to be merged?

pvchupin · 2021-03-23T22:25:53Z

It looks like issue is still reproducible on latest driver 450.102.04.
Let's reopen it, at least for the tracking purpose.

pgorlani · 2022-01-27T09:55:13Z

The 500 CUDA_ERROR_NOT_FOUND (that turns in a 801 CUDA_ERROR_NOT_SUPPORTED for CUDA toolkit 11.3 and above) is caused by the suq.depth PTX instruction in 3d sampled readings at

llvm/libclc/ptx-nvidiacl/libspirv/images/image.cl

Line 151 in 90c8f05

int __nvvm_suq_depth(long) __asm("llvm.nvvm.suq.depth");

llvm/libclc/ptx-nvidiacl/libspirv/images/image.cl

Line 158 in 90c8f05

int __nvvm_suq_depth_3i(read_only image3d_t) __asm("llvm.nvvm.suq.depth");

NVIDIA tells that this error is expected since this PTX instructions work just in case they are used within the 'OpenCL driver'.
Unfortunately, they added that there are no ways to use this instruction with CUDA. They promised to update the PTX documentation in order to make it clear. If suq.depth is present in the fatbin, it produces the aforementioned error even if it is not actually executed. For this reason, it has been removed with #5378.

Another error emerged when the one above is factored out, a 700 CUDA_ERROR_ILLEGAL_ADDRESS. Which appeared in sampled readings with linear filtering. This has been fixed with #5204.

Unfortunately, the previous two errors are just the tip of the iceberg. Passing the accessor_api_image_core test implies the support of (u)int 8bit channels and their related conversion functions, which are completely missing right now: (u)int accessors can read (u)int_{32,16,8}b channels. Further details can be found in Section 6.12.14 and 8.3 of the OpenCL 1.2 Specification.

In order to let this test pass, the image support has been marked as experimental and deactivated by default with #5204.

In summary:

the only type supported are (u)int (32bit), float and half,
8bit channels and related conversion functions are missing,
writings and non-sampled readings are supported,
1d/2d sampled readings are supported,
3d sampled readings are not functioning due to suq.depth.

pvchupin · 2022-01-28T00:09:32Z

@pgorlani, thanks a lot for nice and detailed summary!

bader · 2022-08-01T18:18:02Z

Recent version of test_all (and test_accessor_legacy) aborts on NVIDIA GPU.

pi_die: PI CUDA kernels only support images with channel types int32, uint32, float, and half.
terminate called without an active exception
-------------------------------------------------------------------------------
accessor_api_image_core
-------------------------------------------------------------------------------
SYCL-CTS/tests/accessor_legacy/../common/../../util/proxy.h:35
...............................................................................

SYCL-CTS/tests/accessor_legacy/../common/../../util/proxy.h:35: FAILED:
due to a fatal error condition:
  SIGABRT - Abort (abnormal termination) signal

Can we change pi_die to an exception? pi_die aborts the execution of the whole test suite, whereas exception will just fail a single test.

pgorlani · 2022-08-03T13:28:58Z

Recent version of test_all (and test_accessor_legacy) aborts on NVIDIA GPU.

pi_die: PI CUDA kernels only support images with channel types int32, uint32, float, and half.
terminate called without an active exception
-------------------------------------------------------------------------------
accessor_api_image_core
-------------------------------------------------------------------------------
SYCL-CTS/tests/accessor_legacy/../common/../../util/proxy.h:35
...............................................................................

SYCL-CTS/tests/accessor_legacy/../common/../../util/proxy.h:35: FAILED:
due to a fatal error condition:
  SIGABRT - Abort (abnormal termination) signal

Can we change pi_die to an exception? pi_die aborts the execution of the whole test suite, whereas exception will just fail a single test.

Hi @bader, here is #6521, it should fit our needs.

…6521) This patch makes `cuda_piextKernelSetArgMemObj` setting an error message instead of `std::terminate` in case of the image format is not supported. This error message is encapsulated in an exception thrown by the RT. This allows to continue the SYCL-CTS execution in case of tests using unsupported channel types, see #2119 (comment).

* [SYCL][ABI-BREAK] Remove sycl::program class (#6666) According to the SYCL 2020 spec, section D.1. What has changed from SYCL 1.2.1 to SYCL 2020: > The program class has been removed and replaced with a new > class `kernel_bundle`, which provides similar functionality in a > type-safe and thread-safe way. Removing of `program_impl` class will be done with a separate commit since it's not an ABI-breaking change and some performance analysis should be done in scope of that removal. Tests depending on `sycl::program` were removed in intel/llvm-test-suite#1187 * [flang] Support lowering of intrinsic module procedure C_F_POINTER As Fortran 2018 18.2.3.3, the intrinsic module procedure C_F_POINTER(CPTR, FPTR [, SHAPE]) associates a data pointer with the target of a C pointer and specify its shape. CPTR shall be a scalar of type C_PTR, and its value is the C address or the result of a reference to C_LOC. FPTR is one pointer, either scalar or array. SHAPE is a rank-one integer array, and it shall be present if and only if FPTR is an array. C_PTR is the derived type with only one component of integer 64, and the integer 64 component value is the address. Build the right "source" fir::ExtendedValue based on the address and shape, and use associateMutableBox to associate the pointer with the target of the C pointer. Refactor the getting the address of C_PTR to reuse the code. Reviewed By: jeanPerier Differential Revision: https://reviews.llvm.org/D132303 * [LoongArch] Fix annotations not matching predicates. NFC. * [RISCV] Improve vector fceil/ffloor lowering by changing FRM. This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the VFCVT. Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from RVV intrinsics as I could. A followup patch will use this approach for FROUND too. Still need to fix the cost model. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D133238 * [NFC] Remove invisible character in Diagnostic message and tests * [X86] Support SAE for VCVTPS2PH from intrinsic. For now, clang and gcc both failed to generate sae version from _mm512_cvt_roundps_ph: https://godbolt.org/z/oh7eTGY5z. Intrinsic guide description is also wrong, which will be update soon. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D132641 * [RegisterCoalescer] Fix crash on early clobbered subreg operands. The issue was with processing two subregs of the same reg are used in the same instruction (e.g. inline asm): "def early-clobber" and other just "def". Register coalescer ran in bad recursion if the early clobbered subreg is second in the following sequence of COPYs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127136 * [OpenMP][OMPD] GDB plugin code to leverage libompd to provide debugging support for OpenMP programs. This is 5th of 6 patches started from https://reviews.llvm.org/D100181 This plugin code, when loaded in gdb, adds a few commands like ompd icv, ompd bt, ompd parallel. These commands create an interface for GDB to read the OpenMP runtime through libompd. Reviewed By: @dreachem Differential Revision: https://reviews.llvm.org/D100185 * [lld-macho] Simplify linker optimization hint processing This commit removes the `relocTargets` vector, and instead makes the code reconstruct the referent addresses from the relocated instructions. This will allow us to move `applyOptimizationHints` from `ConcatInputSection::writeTo` to a separate pass that parses and applies LOHs in one step, on a per-file basis. This will improve performance, as parsing is currently done serially in `ObjFile::parse`. I opted to remove the sanity check that ensures that all relocations within a LOH point to the same symbol. This completely eliminates the need to search through relocations. It is my understanding that mismatched relocation targets should not be present in valid object files, so it's unlikely that the removal will lead to mislinks. Differential Revision: https://reviews.llvm.org/D133274 * [NFC] Add test of sized deallocation for coroutines [dcl.fct.def.coroutine]p12 says: > If both a usual deallocation function with only a pointer parameter > and a usual deallocation function with both a pointer parameter and a > size parameter are found, then the selected deallocation function > shall be the one with two parameters. However, the sized deallocation function is disabled by default for ABI reasons. This leads the sentence never get tested and covered. This commit tries to add a test for it * [NFC] [Coroutines] Add tests for looking up deallocation According to [dcl.fct.def.coroutine]p12, the program should be ill-formed if the promise_type contains operator delete but none of them are available. But this behavior was not tested before. This commit adds the tests for it. * [CMake][MLGO] Fix cmake for MLGO The if-statement should check whehter TFLITE is on or not rather than if the variable is specified. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D132902 * [OpenMP] Mark -fopenmp-implicit-rpath as NoArgumentUnused This matches the behavior for all the other -fopenmp options, as well as -frtlib-add-rpath. For context, Fedora passes this flag by default in case OpenMP is used, and this results in a warning if it (usually) isn't, which causes build failures for some programs with unnecessarily strict build systems (like Ruby). Differential Revision: https://reviews.llvm.org/D133316 * [MemorySSA][NFC] Simplify if condition Differential Revision: https://reviews.llvm.org/D133332 * [lldb] Enable the insertion of "pending callbacks" to MainLoops from other threads This will be used as a replacement for selecting over a pipe fd, which does not work on windows. The posix implementation still uses a pipe under the hood, while the windows version uses windows event handles. The idea is that, instead of writing to a pipe, one just inserts a callback, which does whatever you wanted to do after the bytes come out the read end of the pipe. Differential Revision: https://reviews.llvm.org/D131160 * [CostModel][X86] Add CostKinds handling for fcmp ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually) SSE numbers are still too low for FCMP_ONE/FCMP_UEQ cases which expand to a more complex sequence than the existing 'ExtraCost' system can manage. * [OpenCL] Remove argument names from atomic/fence builtins This simplifies completeness comparisons against OpenCLBuiltins.td and also makes the header no longer "claim" the argument name identifiers. Continues the direction set out in D119560. * [SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path The main difference is that this preserves intermediate rounding steps, which the other route doesn't. This aligns bfloat16 more with half floats, which use this path on most targets. I didn't understand what the difference was between these softening approaches when I first added bfloat lowerings, would be nice if we only had one of them. Based on @pengfei 's D131502 Differential Revision: https://reviews.llvm.org/D133207 * Apply clang-tidy fixes for readability-identifier-naming in OptimizeSharedMemory.cpp (NFC) * Apply clang-tidy fixes for readability-identifier-naming in OpenMPDialect.cpp (NFC) * [clang-format] [doc] Fix example of wrapping class definitions Example of BraceWrapping AfterClass is wrong Differential Revision: https://reviews.llvm.org/D133087 * [clang-format] Change heuristic for locating lambda template arguments Previously, the heuristic was simply to look for template argument- specific keywords, such as typename, class, template and auto that are preceded by a left angle bracket <. This changes the heuristic to instead look for a left angle bracket < preceded by a right square bracket ], since according to the C++ grammar, the template arguments must *directly* follow the introducer. (This sort of check might just end up being *too* aggressive) This patch also adds a bunch more token annotator tests for lambdas, specifically for some of the stranger forms of lambdas now allowed as of C++20 or soon-to-be-allowed as part of C++23. Fixes https://github.com/llvm/llvm-project/issues/57093 This does NOT resolve the FIXME regarding explicit template lists, but perhaps it gets closer Differential Revision: https://reviews.llvm.org/D132295 * [MLIR] Switch lit tests to %mlir_lib_dir and %mlir_src_dir replacements. The old replacements will be removed soon: - `%linalg_test_lib_dir` - `%cuda_wrapper_library_dir` - `%spirv_wrapper_library_dir` - `%vulkan_wrapper_library_dir` - `%mlir_runner_utils_dir` - `%mlir_integration_test_dir` Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D133270 * [ARM] Constant pools need 4-byte alignment if we only have tADR When the only ADR instruction we have is the 16-bit thumb one then all constant pool entries need to be 4-byte aligned, as tADR has an offset that's a multiple of 4. It looks like previously there happened to be no situations in which we encountered a constant pool entry with alignment less than 4, so failing to do this didn't cause any problems, but the expansion of cttz to a table added by D128911 does use a constant pool with alignment 1, so we now need to handle it correctly. Differential Revision: https://reviews.llvm.org/D133199 * [AMDGPU][MC][GFX11][NFC] Update assembler tests for MIMG instructions Differential Revision: https://reviews.llvm.org/D133322 * [MLIR] Fix for commit 0f2ec35 Fix incorrectly formatted python file. * [CostModel][X86] Add CostKinds handling for SSE FCMP_ONE/FCMP_UEQ predicates These require special handling to account for their expansion in lowering. I'm trying very hard not to have to add predicate specific costs - but it might be inevitable..... * [lldb] [Core] Split read thread support into ThreadedCommunication Split the read thread support from Communication into a dedicated ThreadedCommunication subclass. The read thread support is used only by a subset of Communication consumers, and it adds a lot of complexity to the base class. Furthermore, having a dedicated subclass makes it clear whether a particular consumer needs to account for the possibility of read thread being running or not. The modules currently calling `StartReadThread()` are updated to use `ThreadedCommunication`. The remaining modules use the simplified `Communication` class. `SBCommunication` is changed to use `ThreadedCommunication` in order to avoid changing the public API. `CommunicationKDP` is updated in order to (hopefully) compile with the new code. However, I do not have a Darwin box to test it, so I've limited the changes to the bare minimum. `GDBRemoteCommunication` is updated to become a `Broadcaster` directly. Since it does not inherit from `ThreadedCommunication`, its event support no longer collides with the one used for read thread and can be implemented cleanly. The support for `eBroadcastBitReadThreadDidExit` is removed from the code -- since the read thread was not used, this event was never reported. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.llvm.org/D133251 * [gn build] Port 9823d42557eb * [mlir] Add materializeOpFoldResults to turn OpFoldResult array into values. Differential Revision: https://reviews.llvm.org/D133346 * [lldb] Go back to process-directed signals in MainLoopTest.cpp Thread-directed signals are not caught by kqueue (used on Mac). This reverts half of D133181. * [OpenMP] Add lit test for metadirective device arch inspired from sollve This lit test is added based upon the tests present in the tests/5.0/metadirective directory of the SOLLVE repo https://github.com/SOLLVE/sollve_vv Reviewed By: saiislam Differential Revision: https://reviews.llvm.org/D131763 * [InstCombine] add tests for icmp-of-trunc; NFC * [InstCombine] reduce code duplication; NFC * [InstSimplify] allow poison/undef in constant match for "C - X ==/!= X -> false/true" This fold was added with 5e9522c311dd, but over-specified. We can assume that an undef element is an odd number: https://alive2.llvm.org/ce/z/djQmWU * Update the clang and clang-tools-extra code owners files This also converts the Clang code owners file from a flat text file to an RST file that is linked in to the rest of our documentation. The RFC for this can be found at: https://discourse.llvm.org/t/rfc-proposed-changes-to-clangs-code-ownership/ Differential Revision: https://reviews.llvm.org/D132550 * [GlobalISel] Combine G_INSERT/EXTRACT_VECTOR_ELT with out of bounds indices to undef. Differential Revision: https://reviews.llvm.org/D133309 * [flang] Accept assumed shape arrays as SHAPE in C_F_POINTER C_F_POINTER was added in https://reviews.llvm.org/D132303, but the code assumed that SHAPE would always be an explicit shape with compile time constant rank. It can actually be an assumed shape, or an explicit shape with non compile time constant rank. Get the rank from FPTR pointer instead. Differential Revision: https://reviews.llvm.org/D133347 * [SCCP] convert signed div/rem to unsigned for non-negative operands This extends the transform added with D81756 to handle div/rem opcodes. For example: https://alive2.llvm.org/ce/z/cX6za6 This replicates part of what CVP already does, but the motivating example from issue #57472 demonstrates a phase ordering problem - we convert branches to select before CVP runs and miss the transform. Differential Revision: https://reviews.llvm.org/D133198 * [CostModel][X86] Add CostKinds test coverage for ctpop intrinsics * [CostModel][X86] Add CostKinds test coverage for cttz intrinsics * [CostModel][X86] Add CostKinds test coverage for ctlz intrinsics * Fix Clang Sphinx docs build The CodeOwners.rst file needs to live in the same directory as the rest of the documentation. This copies the file to the correct place when making a Sphinx build but continues to leave the .rst file at the root directory where it's easier for developers to find. This also ensures that local doc builds using `make html` work as expected. * [Metadata] Introduce MD_pcsections Introduces MD_pcsections metadata kind. See added documentation for more details. Subsequent patches enable propagating PC sections metadata through code generation to the AsmPrinter. RFC: https://discourse.llvm.org/t/rfc-pc-keyed-metadata-at-runtime/64191 Reviewed By: dvyukov, vitalybuka Differential Revision: https://reviews.llvm.org/D130875 * [MachineInstr] Allow setting PCSections in ExtraInfo Provide MachineInstr::setPCSection(), to propagate relevant metadata through the backend. Use ExtraInfo to store the metadata. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130876 * [Object] Refactor code for extracting offload binaries We currently extract offload binaries inside of the linker wrapper. Other tools may wish to do the same extraction operation. This patch simply factors out this handling into the `OffloadBinary.h` interface. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D132689 * [OffloadPackager] Add ability to extract images from other file types A previous patch added support for extracting images from offloading binaries. Users may wish to extract these files from the file types they are most commonly emebedded in, such as an ELF or bitcode. This can be difficult for the user to do manually, as these could be stored in different section names potentially. This patch addsp support for extracting these file types. Reviewed By: saiislam Differential Revision: https://reviews.llvm.org/D132607 * [SYCL][CUDA] Fix get_native interop for device (#6649) This patch fixes: https://github.com/intel/llvm/issues/6635 In https://github.com/intel/llvm/pull/6483, the implementation of `get_native` for device for the CUDA plugin was mistakenly moved to the experimental interface header, and so it was no longer available for the regular interface, causing build issues. For the CUDA plugin there is currently two interfaces for the CUDA interop, the "legacy" one which is used by projects such as oneMKL and oneDNN, and the "experimental" one, defined in the `sycl/ext/oneapi/experimental/backend/cuda.hpp` header which implements the interop as described in the CUDA backend specification proposed here: https://github.com/KhronosGroup/SYCL-Docs/pull/197 * [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64 This adds the ExpandLargeDivRem to the default pass pipeline. The limit at which it expands div/rem instructions is configured via a new TargetTransformInfo hook (default: no expansion) X86, Arm and AArch64 backends implement this hook to expand div/rem instructions with more than 128 bits. Differential Revision: https://reviews.llvm.org/D130076 * [SelectionDAG] Rename CallSiteDbgInfo to NodeExtraInfo For information infrequently attached to SDNodes, it is useful to provide a way to add this information out-of-line. This is already done for call-site specific information. Rename CallSiteDbgInfo to NodeExtraInfo in preparation of adding additional information not necessarily related to call sites only. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130880 * [SelectionDAG] Properly copy ExtraInfo on RAUW During SelectionDAG legalization SDNodes with associated extra info may be replaced with a new SDNode. Preserve associated extra info on ReplaceAllUsesWith and remove entries in DeallocateNode. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D130881 * Add parantheses to silence warning. * [AArch64] Additional tests for sinking splats to muls. NFC * Fix "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64" compilation on Windows * Fix AMDGPU test failures due to "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64" * [tsan] Replace mem intrinsics with calls to interceptors After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm intrinsics with calls to glibc functions. However this approach is fragile, as slight changes in pipeline can return llvm intrinsics back. In particular InstCombine can do that. Msan/Asan already declare own version of these memory functions for the similar purpose. KCSAN, or anything that uses something else than compiler-rt, needs to implement this callbacks. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D133268 * Fix remaining test failures for "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64" * [gn build] port 5dbc7cf7cac44 * [bazel] port 5dbc7cf7cac44 * Revert "[lldb][bindings] Fix module_access handling of regex" This reverts commit 75f05fccbbdd91393bdc7b6183b9dd2b1e859f8e. This commit broke the windows lldb bot: https://lab.llvm.org/buildbot/#/builders/83/builds/23284 * Fix OpenMP Opt for target without a parallel region. Remove ctx redeclaration. Format code. Remove parallel check. Modify tests. Clean-up code. Fix another test. Move code to helper functions. Format file. Minor fixes. * [InstCombine] add tests for add of select with 0 and negate arms; NFC * [InstCombine] add/move tests for add with select operands that simplify; NFC * [InstCombine] remove dead code for add (select cond, (sub), 0); NFC This pattern is handled more generally in SimplifySelectsFeedingBinaryOp(). Tests to confirm that added to the add.ll test file in the previous commit. * Add docs for Mach-O lld I wasn't able to find any docs for Mach-O in `lld/docs`, so here's an attempt at adding basic docs. One of my goals here is to make it easy for users who are unfamiliar with linkers to successfully use lld. Reviewed By: #lld-macho, int3 Differential Revision: https://reviews.llvm.org/D132893 * [CostModel][X86] Add CostKinds handling for ctpop ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually) Some of the pre-AVX values still aren't great - atom/slm worst case numbers for ctpop expansion really affect these (especially throughput/latency), so we need to clean them up in a more consistent way - its a pity we don't have models for more older cpus (merom/nehalem etc.) as other examples. * [clang] fix profiling of template arguments of template and declaration kind Template arguments of template and declaration kind were being profiled only by their canonical properties, which would cause incorrect uniquing of constrained AutoTypes, leading to a crash in some cases. This exposed some places in CheckTemplateArgumentList where non-canonical arguments where being pushed into the resulting converted list. We also throw in some asserts to catch early and explain the crashes. Note that the fix for the 'declaration' kind is untestable at this point, because there should be no cases right now in the AST where we try to unique a non-canonical converted template argument. This fixes GH55567. Signed-off-by: Matheus Izvekov <[email protected]> Differential Revision: https://reviews.llvm.org/D133072 * [RISCV] Improve vector fround lowering by changing FRM. This is a follow up to D133238 which did this for ceil/floor. Reviewed By: arcbbb, frasercrmck Differential Revision: https://reviews.llvm.org/D133335 * [mlir][sparse] codegen for sparse alloc Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D133241 * Revert "[tsan] Replace mem intrinsics with calls to interceptors" Breaks http://45.33.8.238/macm1/43944/step_4.txt https://lab.llvm.org/buildbot/#/builders/70/builds/26926 This reverts commit 77654a65a373da9c4829de821e7b393ea811ee40. * [mlir][sparse] Refactoring: renaming StorageNewOp to StorageOp To address comment in https://reviews.llvm.org/D133241 Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D133363 * [ConstraintElimination] Replace pair with named struct (NFC). This slightly improves the readability and allows further extensions in follow-ups. * [libc++] Avoid instantiating type_trait classes Use `using` aliases to avoid instantiating lots of types Reviewed By: ldionne, #libc Spies: libcxx-commits, miyuki Differential Revision: https://reviews.llvm.org/D132785 * [AArch64] Add an option to reserve physical registers from RA This patch adds an option --reserve-regs-for-regalloc, so we can reserve a list of physical registers. These registers will not be used by register allocator, but can still be used as ABI requests such as passing arguments to function call. Its main purpose is simulating high register pressure by reserving many physical registers. So it will be much easier to test and debug register allocation changes. Differential Revision: https://reviews.llvm.org/D132717 * Revert "[SCCP] convert signed div/rem to unsigned for non-negative operands" This reverts commit fe1f3cfc2669aca387a45c8ce615b45c1db50fc6. It looks like this commit breaks building llvm-test-suite. To reproduce, run `opt -passes=ipsccp` on the IR below. @g = internal global i32 256, align 4 define void @test() { entry: %0 = load i32, ptr @g, align 4 %div = sdiv i32 %0, undef ret void } * [clang] Implement setting crash_diagnostics_dir through env variable This implements setting the equivalent of `-fcrash-diagnostics-dir` through the environment variable `CLANG_CRASH_DIAGNOSTICS_DIR`. If present, the flag still takes precedence. This helps integration with test frameworks and pipelines. With this feature, we change the libcxx bootstrapping build pipeline to produce clang crash reproducers as artifacts. Signed-off-by: Matheus Izvekov <[email protected]> Differential Revision: https://reviews.llvm.org/D133082 * [mlir][spirv] Add base classes for vendor ops This is the first patch in the series to rename vendor ops from `spv.NameVENDOR` to `spv.VENDOR.Name`. The goal is to make the SPIR-V dialect more internally consistent. Issue: https://github.com/llvm/llvm-project/issues/56863 * [mlir][spirv] Refactor vendor op definitions Use dedicated vendor op classes/categories. This is so that we can later change the mnemonics of all vendor ops by changing the base class: `SPV_VendorOp`. Issue: https://github.com/llvm/llvm-project/issues/56863 * [mlir][spirv] Change vendor op mnemonics to `spv.VENDOR.name` Make vendor ops more consistent with the naming scheme within the SPIR-V dialect. Issue: https://github.com/llvm/llvm-project/issues/56863 Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D133247 * [mlir] Make bit enum operators constexpr This allows using the | operator on the values of enum attributes in complie-time constants. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D133159 * [lldb] Fix CommunicationKDP following D133251 Add `m_bytes` and `m_bytes_mutex` to `CommunicationKDP`, following refactoring in D133251. Differential Revision: https://reviews.llvm.org/D133365 * [ADT] Remove is_splat `is_splat` is superseded by `all_equal` and marked as deprecated. See the discussion thread for more details: https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D132336 * [OpenMP] Fix `omp_get_wtime` function being marked incorrectly as readonly OpenMP has a list of of optimistic attributes that can be attached to known runtime functions to aid some analysis. The `omp_get_wtime` function incorrectly used the `readonly` attribute. This is not correct at the `omp_get_wtime` function changes values depending on some external state. This is more correctly modeled with `inaccessiblememonly` meaning that the value does not depend on anything within the module, but can not be removes as it depends on external state. Fixes #57578 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D133360 * [SYCL] Fix accessor's CTAD for g++ host compiler (#6673) * [SYCL] Replace hardcoded namespaces with attribute (#6674) Namespaces were hardcoded and used in compiler to check for various SYCL types including accessors, spec_constants, etc. This patch implements an attribute to uniquely identify the types instead. Attribute argument is an Identifier which denotes each type. E.g. __attribute__((sycl_type(accessor)) is used to mark accessor class. The attribute has been implemented as with an accepted list of arguments via EnumArg. The attribute definition should be updated to support any new types. The attribute takes 1 argument. Fixes: https://github.com/intel/llvm/issues/5186 Signed-off-by: Elizabeth Andrews <[email protected]> * [MC] Emit Stackmaps before debug info This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment. The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections. This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op. Differential Revision: https://reviews.llvm.org/D132708 * [OpenMP] Remove use of removed '-f[no-]openmp-new-driver' flag The changes in D130020 removed all support for the old method of compiling OpenMP offloading programs. This means that `-fopenmp-new-driver` has no effect and `-fno-openmp-new-driver` does not work. This patch removes the use and documentation of this flag. Note that the `--offload-new-driver` flag still exists for using the new driver optionally with CUDA and HIP. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D133367 * [SYCL][ABI-Break] Remove deprecated OptionalDevice input type of make_queue and remove deprecated make_queue ABIs (#6628) We introduced a tentative OptionalDevice input type of make_queue API to support the SYCL 2020 API as well as the legacy API which did not require Device to be passed as a parameter. This PR intends to remove this tentative input type. Extended: Now this PR also removed deprecated make_queue ABIs and make the make_queue to take a pointer to pi_device, which becomes back to optional. Signed-off-by: Byoungro So <[email protected]> * [SYCL] Emit an error on attempt to compile in less than C++17 mode (#6678) * [CI][NFC] Rename workflow job (#6572) Just pure renaming of "resolve test matrix" to "generate test matrix" Follow up on a comment made at https://github.com/intel/llvm/pull/6528 * [CI][NFC] Make lint for commits in PR change only (#6722) Make lint for commits in PR change only not taking into account sycl branch HEAD. Should fix issues like in https://github.com/intel/llvm/pull/6705 where lint reported errors for files not affected in PR. To overcome current PRs stuck because of this they need to use ignore-lint tag until PR will be based on devops directory changes made here. * [CI][NFC] Fixed GFX driver update PR (#6723) Now generate test matrix correctly use github.event.pull_request.head.sha that is required for pull_request_target. * [SYCL] Improve sycl-post-link performance with -split=kernel (#6689) Right now we are computing a new callgraph in every call to extractCallGraph. extractCallGraph is called every time we do a module split, so for -split=kernel, that would be once per kernel. For modules with many kernels, this can take a very long time. We only need to compute this once because the input IR doesn't seem to change between splits. This improves performance of sycl-post-link from ~45min to ~7min for an example with 13k kernels Signed-off-by: Sarnie, Nick <[email protected]> * [SYCL] Deprecate SYCL 1.2.1 device selectors (#6599) Now that we have added the SYCL 2020 callable device selectors, we need to prepare for the removal of the older SYCL 1.2.1 `device_selector` class. The first step is to add the deprecation message to the 1.2.1 style device selectors, which this PR does. It also removes the usage of those from our own codebase so as to not trip on our own messages in the future. Signed-off-by: Chris Perkins <[email protected]> Signed-off-by: Larsen, Steffen <[email protected]> Co-authored-by: Larsen, Steffen <[email protected]> * [SYCL][CUDA][HIP] Add support for intel extension free-memory (#6709) The support is added for NVIDIA and AMD devices. Co-authored-by: Abhishek Bagusetty <[email protected]> * [SYCL][Windows] Improve windows sycl.lib linking (#6699) This patch does two things, first it makes `-fsycl` ignore `-nostdlib` when linking the SYCL library. This is necessary because for Clang on Windows CMake will generate link commands using `-nostdlib` and explicitly list the system libraries, but of course it doesn't do it for SYCL, so we currently end up never linking the SYCL library when this is used. Ignoring `-nostdlib` for `-fsycl` on Windows seems like a reasonnable solution for this as this is also what is done for the OpenMP runtime libraries. See the CMake module: * https://github.com/Kitware/CMake/blob/aa2de7cd2a04699744a224ab84e0ca483559c5d3/Modules/Platform/Windows-Clang.cmake#L79 In addition this patch also adds a linker parameter to help clang find the `sycl.lib` file without requiring users to tweak their environments to link against it. * [SYCL][Windows] Enable building lld by default on Windows (#6701) The Windows-Clang CMake module uses `lld-link` by default, so having lld built on windows is helpful for people wanting to build SYCL applications using CMake. Related to: https://github.com/intel/llvm/issues/6026 * [SYCL] Allow buitin_assume_aligned to be called from device code. (#6705) Prior to this PR, if there is a direct call to __builtin_assume_aligned from within the device code, we would get the following error: `error: SYCL kernel cannot call a variadic function` This PR allows `__builtin_assume_aligned` to be invoked from within device code. The reason for doing so is, for example, if we load a float* pointer from a data structure, the compiler has no way of knowing the underlying alignment of the data the pointer points to, and therefore it cannot combine the load/stores and one ends up with many unnecessary load/store instructions plus lots of unnecessary pointer arithmetic plus increased register pressure. We see this all over the place in our kernels. With `__builtin_assume_aligned` we can tell the compiler to safely assume a certain alignment, therefore implicitly forcing coalescing. * [SYCL][NFC] Remove unnecessary NULL check for 'Fn' (#6726) Klocwork thinks that it is possible for 'Fn' to be a nullptr due to this check but there's already an assert at the beginning of CodeGenFunction::GenerateCode to avoid that possibility. * [SYCL] Fix device comparison in removeDuplicateDevices (#6730) Kernel bundles attempt to remove duplicate devices from a passed device list. This is done through the `removeDuplicateDevices` function which creates a set through comparing devices by their native handles. However, the `getNative` member function on `device_impl` used to get these handles will retain the native devices if the backend is OpenCL. For root devices this will not have an effect, but on sub-devices this can potentially lead to a leak of the devices. As a fix this commit compares the PI devices rather than the native handles. Signed-off-by: Larsen, Steffen <[email protected]> * [SYCL] Remove cl namespace deprecation warning (#6735) According to the SYCL 2020 specification, CL/sycl.hpp should supply the cl prefix namespace for backwards compatibility, but it does not specify that the namespace is deprecated. This commit removes the deprecation message. Signed-off-by: Larsen, Steffen <[email protected]> * [GHA] Uplift Linux GPU RT version to 22.35.24055 (#6704) Scheduled drivers uplift Co-authored-by: GitHub Actions <[email protected]> * [BuildBot] Uplift GPU RT version for Linux CI Process (#6697) Uplift GPU RT version for Linux to 22.35.24055 Signed-off-by: bb-sycl <[email protected]> * Update test to match -fvisibility=arg option requirements (#7098) * [SYCL][ESIMD] Introduce predicates for lsc_block_store/load (#6688) * Update for LLVM Optional API changes Update for LLVM commit b5f8d42efe3e ("[ADT] Deprecate Optional::{hasValue,getValue} (NFC)", 2022-08-07). This is a mechanical replacement of `hasValue` to `has_value` and `getValue` to `value`. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/ffeb4df * Restore getArgIndex in OCLTypeToSPIRV (#1567) * Update for LLVM Optional API changes Update for LLVM commit b5f8d42efe3e ("[ADT] Deprecate Optional::{hasValue,getValue} (NFC)", 2022-08-07). This is a mechanical replacement of `hasValue` to `has_value` and `getValue` to `value`. * Restore argument tracing in OCLTypeToSPIRV Commit 4a9c78ee ("Prepare SPIRVWriter for type conversion without opaque pointers. (#1499)", 2022-06-20) removed `getArgIndex` in favor of passing `Idx`, but this leads to incorrect adaptation of argument types if sampler arguments are not in the same position between different functions. We might be able to drop `adaptArgumentsBySamplerUse` entirely, as we don't represent samplers as i32 anymore; but for now just fix the regression. Fixes https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/1562 Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/87f8a58 * Move to C++17 LLVM has switched to C++17 with commit b1356504e63a ("[LLVM] Update C++ standard to 17", 2022-08-06). Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/75d16c2 * .clang-tidy: temporarily disable misc-const-correctness This is a relatively new check added to clang-tidy by 46ae26e7eb70 ("[clang-tidy] implement new check 'misc-const-correctness' to add 'const' to unmodified variables", 2022-07-24). Currently the code base doesn't follow the practice of declaring variables `const` where possible, which makes the check quite noisy, so disable it for now. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/4dd494e * Drop the JointMatrixINTEL struct-renaming pass when opaque pointers are enabled. (#1570) The frontend is being changed to lower the struct name to the correct LLVM name directly, obviating the need for this check. See https://github.com/intel/llvm/pull/6535 for this change. This marks the removal of the final call to the deprecated method Type::getPointerElementType, although there remains some code that is not fully working with opaque pointers enabled. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/e2bb479 * Use TypedPointerType in lieu of PointerIndirectPair in mangleBuiltin. (#1568) Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/42cf770 * Remove 2 unused SPCV_ macros Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/160f013 * Upgrade to Ubuntu 20.04 The Ubuntu 18.04 image is marked deprecated [1], so move to a newer image. [1] https://github.com/actions/runner-images Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/20a1fd7 * Fix a -Wmaybe-uninitialized warning `MDWrapper::get` may not fill its argument in error cases, so initialize `Arg` to an all-ones value. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/2c247b7 * Dot product bugfix to include more floating point types (#1578) Switched the visitCallDot check to use isFloatingPointTy for scalar floating point operands. Bugfix for previous change regarding integer dot product. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/71e01b5 * Mass add -emit-opaque-pointers for tests that don't require changes. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/1ea77d2 * Add support for toolchain compilation with LLVM_LINK_LLVM_DYLIB option (#1543) Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/40fd741 * Update SPV_INTEL_vector_compute to rev 15 This adds NamedBarrierCountINTEL Execution Mode, see more in https://github.com/intel/llvm/pull/1612 Co-authored-by: nrudenko <[email protected]> Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/21c7a30 * Remove ExecutionModeNamedBarrierCountINTEL from spirv_internal (#1604) It was actually upstreamed to SPIR-V Headers already. Signed-off-by: Sidorov, Dmitry <[email protected]> Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/3335c25 * Mass add -emit-opaque-pointers for tests that require some changes. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/6e15642 * Implement SPIR-V support for max_reinvocation_delay attribute (#1594) The attribute generates the llvm.loop.intel.max_reinvocation_delay.count metadata in LLVM IR. There is one positive integer literal operand denoting the maximum number of cycles allowed between loop invocations. Spec: KhronosGroup/SPIRV-Registry#163 Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/d4ec010 * Fix type scavenger for variable arguments and multiple-uses-of-types cases. (#1606) Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/401d124 * Add support for split barriers extension SPV_INTEL_split_barrier (#1424) This PR adds support for split barriers and the SPV_INTEL_split_barrier extension. The related SPIR-V extension spec can be found here: * https://github.com/KhronosGroup/SPIRV-Registry/pull/136 The related OpenCL C extension spec can be found here: * https://github.com/KhronosGroup/OpenCL-Docs/pull/765 Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/e3cd296 * Reinstate deprecated-declarations warning as error (#1609) This reverts commit 55d1de820841e6d9d1c6ca0cd534323d69a1cbf1. Now that the last use of the deprecated `getPointerElementType` has been removed, treat uses of deprecated functions as errors again. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/6811488 * Rework the demangler to support recovering more pointer element types. (#1556) Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/b412ae5 * Translate function pointer from global variable as pointer, not as declaration (#1608) This patch helps to avoid invalid SPV generation. When global variable contains a pointer to a function, translator tries to translate it as declaration. Then it translates this function the second time when going through the function list. This leads to double translation of the same function and to the usage of the same IDs in SPIR-V file. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/2b4ce42 * Add SPV_INTEL_masked_gather_scatter extension (#1580) This extension allows TypeVector to have a Physical Pointer Type Component Type and introduces gather/scatter instructions. It will be useful for explicitly vectorized kernels. Spec: https://github.com/intel/llvm/pull/6613 Signed-off-by: Sidorov, Dmitry <[email protected] Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/49b08e8 * [SYCL][Doc] Remove stale SPIR-V extension spec (#6741) This SPIR-V extension was promoted to a KHR, and that specification is on the SPIR-V registry. The DPC++ compiler uses the KHR version (not this INTEL version), so this old specification can be removed. * [SYCL][CUDA] Make `piextKernelSetArgMemObj` setting an error message (#6521) This patch makes `cuda_piextKernelSetArgMemObj` setting an error message instead of `std::terminate` in case of the image format is not supported. This error message is encapsulated in an exception thrown by the RT. This allows to continue the SYCL-CTS execution in case of tests using unsupported channel types, see https://github.com/intel/llvm/issues/2119#issuecomment-1201548912. * [SYCL][Doc] Add sycl complex to complex algorithms extension (#6717) This PR extends the complex algorithms extension to support `sycl::ext::oneapi::complex` and `marray<sycl::ext::oneapi::complex>`. Additionally it adds the `multiplies` operator as a valid binary operation for complex values when reducing and scanning across work items. This PR has a dependency upon #6550. * [SYCL][NFC] Add SYCLPropagateAspectsUsage pass (#6670) Added a pass which is a part of optional kernel features design: it uses information provided by FE & Headers about aspects used in device code to propagate it through the call graph to mark all kernels and functions with list of aspects they use. Co-authored-by: Maksim Sabianin <[email protected]> * [NFC] Fix the expected assert message in LIT test gather_scatter_rgba.cpp Signed-off-by: Vyacheslav N Klochkov <[email protected]> * [SYCL] Make intel specific device info descriptors namespace qualified (#6639) Conforming to SYCL 2020 specification section 6.3.1 and 4.6.4.2, to make extension information descriptors templated and within the correct namespace. -Also moved deprecated info descriptors for device into separate file -Changed namespace of the recently added [device memory extension](https://github.com/intel/llvm/pull/6604) to ext::intel::info::device Signed-off-by: Rauf, Rana <[email protected]> Co-authored-by: Steffen Larsen <[email protected]> * [SYCL] Fix USM free for descendent devices (#6733) Now that use of descendent devices of context members is supported, we cannot rely on always choosing the only device in single-device contexts. Remove this branch to always fetch the device from the platform. * [SYCL] Test must pass -opaque-pointers explicitly Not yet the default for sycl. * Add opaque pointers switch to llvm/test/Verifier/dllstorage.ll * Remove clang/test/Driver/openmp-sycl-interop.c test form testing. The test is specific to behaviors involving SYCL and OpenMP offloading. Since OpenMP offload has moved to the new offloading model the test is not relevant anymore. * [SYCL] Pass /Zc:__cplusplus in -fsycl-host-compiler-options in some tests (#6751) By default MSVC reports 199711L as the standard being used and needs that option to report C++ version properly. This fixes current post-commit failures on the tests modified. * Delete obsolete Clang::Driver tests * Revert "Delete obsolete Clang::Driver tests" This reverts commit 84be9c2cb06b5c98225ae0da80e32aa161483187. * The XFAIL tests are specific to the old OpenMP offloading model, which was removed. * Disable sycl-libspirv for amdgcn-amd-amdhsa target in clang/test/Driver/sycl-amdgcn-sqrt.cpp * [SYCL][ABI-Break][NFC] Remove unusued members (#6731) This commit removes the following: - The unused member function `kernel_bundle::join_impl` overload. - The unused member variable `Offset` from `stream`. - The unused member function `buffer_impl::constructorNotification` overload. Signed-off-by: Larsen, Steffen <[email protected]> * [SYCL] Fix memory leak in program link (#6641) This PR fixes the memory leak caused by missing the call to program release. Signed-off-by: Byoungro So <[email protected]> * [SYCL] Remove mentions of host device from in-tree LIT tests (#6683) * [ESIMD] Fix invoke_simd calls case with pointer passed to it (#6696) The helper function created during translation of invoke_simd must accept a pointer to a function, not a reference to a pointer to a function. That additional level of indirection is automatically resolved by compiler for invoke_simd, but needs to be manually resolved/adjusted for the helper function. Signed-off-by: Vyacheslav N Klochkov <[email protected]> * [SYCL] Remove CG/handler extended members mechanism (#6759) Now that the extended members have been promoted to proper fields of CG/handler classes, the extended member mechanism can be removed until it's needed again. * Suppress some clang-tidy errors Split the .clang-tidy check lists out over multiple lines to improve readability. Suppress `misc-non-private-member-variables-in-classes` as the code currently contains many instances that fail this check. Drop `constexpr` from `LoopControlLoopCountINTELMask` after clang started diagnosing this with b36453530418 ("[Clang] Diagnose ill-formed constant expression when setting ...", 2022-07-28). Removing `constexpr` is just a workaround, the long term fix would be to upstream the new enum value to `spirv.hpp`. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/c5b29f2 * Update LLVM version from 15 to 16 Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/7aa1fd2 * [SYCL] Assign code owners for invoke_simd.hpp and simd.hpp. (#6746) * [SYCL] Fix memory leak (queue_impl) due to #5901 (#6707) Cross dependency event_impl vs queue_impl prevents objects release. Event_impl now has only weak pointer to queue. Signed-off-by: Tikhomirova, Kseniya <[email protected]> * Fix the Clang sphinx build bot; NFC This should address the failure introduced in: https://lab.llvm.org/buildbot/#/builders/92/builds/32377 * [SYCL] Move accessor_impl to source directory (#6698) Moving impl part of host accessor implementation to avoid exposing implementation details in the headers. This allows for more changes in accessor without breaking ABI. Also updated the gdb xmethods since it was relying on the impl details which are not available for gdb(unless libsycl.so is built with debug symbols) anymore. Instead of accessing members of impl directly gdb printers now accessing helper methods. To prevent compiler discarding these methods there are dummy references which are active when NDEBUG is not defined. * [SYCL] Silence -Wctad-maybe-unsupported warning This might be an FE bug. Will follow up separately. Workaround should be fine to unblock pulldown. * [sycl-bisect] Add missing exit (#6771) This script has been missing one call to exit, which causes control to fall through and run successful tests a second time when using --command without --command-allow-bisect-codes. This didn't change the results of the script, but it did make it slower if the test command takes a while to run. * [ESIMD] Change LSC API to improve template argument type deduction (#6764) * [CI] Fix dependabot alert (#6725) * [SYCL] More changes to silence -Wctad-maybe-unsupported Mutex changes could be reverted if/when this change lands: https://reviews.llvm.org/D133425 * [SYCL][ABI-Break] Merge DeviceBinaryImage and RTDeviceBinaryImage (#6768) To avoid future issues with ABI for DeviceBinaryImage, this commit makes the following changes: * Moves RTDeviceBinaryImage to a source header. * Merges DeviceBinaryImage into RTDeviceBinaryImage. * Promotes the common property ranges from DeviceBinaryImage into new members. Signed-off-by: Larsen, Steffen <[email protected]> * [SYCL] Fix get_pointer_device for cases with descendent devices (#6719) Looking through context members alone when searching for a specific device isn't enough anymore since now descendent devices of context members can be used within that context as well. Change the logic to look for the device in the cache instead. * [Matrix][SYCL] Add use argument for joint_matrix and add another feat… (#5835) …ure macro for it * [SYCL] Remove unused argument from getDeviceImpl (#6780) * [SYCL][Windows] Fix debug build in non cl mode (#6721) This fixes building SYCL programs in Debug mode with `Windows-Clang.cmake`. The issue is that the code was using `OPT__SLASH_MDd` to select `sycld.lib` but under the `!C.getDriver.IsCLMode()` condition this flag will never be set, `OPT_g_Flag` should be used instead (`-g` rather than `/MDd`). Note that using the regular `clang` command line to manually build with `-g` still doesn't work as it will link against `msvcrt` rather than `msvcrtd` and will miss required defines for debug builds on Windows (`_DEBUG`). This is correctly done by the CMake module or simply when using `clang-cl`. * [SYCL][DOC] Add extension for FPGA kernel interface properties (#5715) ## SYCL extension contains the following new kernel properties - `streaming_interface<...>` - `register_map_interface<...>` The first two properties take enum arguments that provide the compiler information about whether the logic downstream to the kernel will back-pressure the kernel or not. - `pipelined<N>` Takes an integer, non-zero values specify minimum cycles between kernel invocations, and 0 specifies that pipelining should be disabled. Co-authored-by: GarveyJoe <[email protected]> * [SYCL] Fix llvm.used removal when used with opaque pointers. (#6773) The code priorily assumed that all functions when used in @llvm.used would be wrapped within a bitcast <fnptr type> to i8*; with opaque pointers, the values would be functions directly, causing a crash since functions don't have any operands. * [SYCL] Add MAJOR_VERSION to the name of the sycl library on Win (#6745) * [SYCL] Silence -Wctad-maybe-unsupported for check-sycl Add deduction guide to kernel_bundle * [SYCL] Add --host-target flag and remove --arm flag (#6620) This patch adds a new flag to `configure.py` to allow changing the host target to build, this makes it easier to build on different hosts such as ARM or PowerPC. This patch also removes the `--arm` flag as it is now redundant, `--host-target=ARM` or `--host-target=AArch64` should be used instead. This is slightly different than the original `--arm` flag as it only allows to enable one of the ARM platforms. But I'm not sure what the use case was for enabling both, as only one will be the host platform for a given build. And compilation time was given as a reason to drop the X86 architecture originally so only enabling the correct ARM architecture should also help with that. Co-authored-by: Alexey Bader <[email protected]> * [NFC] Update sycl-force-target test to play nicely for 32-bit host (#6785) Update the target for the test to use 64-bit to allow for the test to emit the expected device target when unbundling. * [SYCL] Use copy engine for memory read/write operations (#6783) In the present state of the L0 plugin, MemBufferWrite and MemBufferRead operations are being executed using compute engines. This patch changes this behavior for the operations to be executed using copy engines. It is expected to improve performance. Signed-off-by: Arvind Sudarsanam <[email protected]> * [SYCL][FPGA] Support Intel FPGA simulator device selector (#6715) Create **fpga_simulator_selector** to be used to select the FPGA simulator device. This assumes that the user has compiled their program with -Xssimulation (as well as -fintelfpga) to prepare for the simulator. An object of class fpga_simulator_selector must be constructed early in the execution of the host program in order to make the simulator available at runtime. The current implementation has the effect that if an object of class sycl::ext::intel::fpga_simulator_selector is defined, FPGA hardware devices selected using sycl::ext::intel::fpga_selector will select simulator devices. This will be documented in the release notes and the specification. We expect this behaviour to be eliminated in the future. * [SYCL][CUDA] Add support of CUDA XPTI tracing (#6373) Fork of https://github.com/intel/llvm/pull/5797 This patch rebases and finalizes (similarly to https://github.com/intel/llvm/pull/6023) the draft in https://github.com/intel/llvm/pull/5797, which already contained the most important commits thanks to @alexbatashev. The most relevant additions of this patch were done in CMake files, in particular - switch from `FindCUDA` CMake (deprecated) module to the `FindCUDAToolkit` one in order to find cupti library by means of `CUDA_cupti_LIBRARY`. This is advisable because on some systems `FindCUDA` fails to find `CUDA_cupti_LIBRARY`. This is also the case of the CI, see the [log](https://github.com/intel/llvm/runs/7115612243?check_suite_focus=true) in case of `FindCUDA` is used. - find `generated_cuda_meta.h` for generating the CUDA printer definitions, since the location of this header file seems to vary depending on the system, in case of this file is not found a warning is printed and no errors are thrown. Co-authored-by: Alexander Batashev <[email protected]> * [SYCL] Add diagnostic test for global_variable_allowed attribute (#6777) Add cases for this attribute independent of device_global * [SYCL][ABI-Break] Implement property interface for local_accessor & usm_allocator (#6737) * [SYCL][Windows] Fix DataMovement test (#6790) Using `-g` with the regular `clang` command line is not supported on Windows. On Windows `clang-cl` and `/Mdd` should be used instead. However it doesn't seem like this test is testing anything to do debug info and I couldn't find any reason for having it in the history, so removing `-g` is the simplest solution to make the test work on both Linux and Windows. This should fix the post-commit issue that showed up in: https://github.com/intel/llvm/pull/6721 * [SYCL][ABI-Break] Add SYCL 2020 kernel_device_specific::max_sub_group_size (#6782) SYCL 2020 promotes the info::kernel_device_specific::max_sub_group_size query on kernels, but removes the additional argument. This commit adds an overload with no additional argument and deprecates the old variant. * [SYCL][NFC] Add another option for the lambda size mismatch message (#6794) * [SYCL] Introduce a fully-mocked PI plugin for unit tests (#6684) The idea behind this PR is to introduce an infrastructure which allows to write unit-tests, which are not dependent on a presence of actual backends (like OpenCL, L0, etc.) or devices (like CPU, GPU, etc.). Motivation for the patch: host device is going to be removed and we have a number of in-tree LIT tests, which can't pass if there are no devices available, so they are likely to be removed/disabled in short-term (#6683), unless we move them into llvm-test-suite or unit-tests with new infrastructure suggested here. Signed-off-by: Larsen, Steffen <[email protected]> Co-authored-by: Sachkov, Alexey <[email protected]> * [SYCL][ESIMD][EMU] Enable fp64/double type support (#6796) * For PI_DEVICE_INFO_EXTENSIONS, 'piDeviceGetInfo' returns 'cl_khr_fp64' as the type can be supported as native 'double' type * Fixes current pre-commit issue with unexpected pass on SYCL :: spec_const/spec_const_double.cpp * [SYCL] Fix macro definition conflicting with MSVC (#6798) The _CONCAT macro is defined in MSVC headers so to avoid redefinition warnings this commit changes the naming in the PI mock plugin from _CONCAT to _PI_MOCK_PLUGIN_CONCAT. Signed-off-by: Larsen, Steffen <[email protected]> * [SYCL] Update DPCPP library major version (#6801) * [SYCL] Add assert for device_global without device_image_scope (#6791) device_global is currently not fully supported but for backends that support it, they should be usable on device only when the device_image_scope property is present. This commit adds a temporary static assert to avoid prevent the use of device_global without device_image_scope until proper initialization has been implemented. Signed-off-by: Larsen, Steffen <[email protected]> * [SYCL] Fix unittests on MSVC (#6805) This commit makes two changes: * Fixes a cast of incompatible size in the PI mock plugin that caused a warning on MSVC. * Changes the definition of the captureless lambda function in the RedefineAPI PiMock unittest to avoid MSVC considering it equal to the function defined by the PI mock plugin. Signed-off-by: Larsen, Steffen <[email protected]> * [SYCL][Doc] Closing the ABI breaking changes window (#6800) Next one is expected to be not earlier than June 2023. * [SYCL] Fix deprecation warning for headers (#6808) Using pragmas to emit warnings didn't work because SYCL headers are considered to be system headers and any warnings in them are suppressed. Use "#warning" instead. Unfortunately, MSVC doesn't support it (although it's part of C23/C++23 and they'll have to add support eventually), so we need some #if guards. Also, #warning cannot be put inside a macro definition, thus we have to have some code duplication. Luckily, entire headers deprecations aren't as often and we can be a little bit verbose. * [SYCL][DOC] Extension to add hints to stop batching and start executing (#6465) Signed-off-by: James Brodman <[email protected]> Co-authored-by: Greg Lueck <[email protected]> Co-authored-by: John Pennycook <[email protected]> * [SYCL] Make host device inaccessible through SYCL API (#6685) This commit removes the host device from the device list and as such the host device will no longer be available in user code. The following changes are a result of this: * Device filters using 'host' as either backend or device type will cause a warning at runtime. Since there is no host device selectable for these filters, the resulting device list will not contain a host device. * is_host() on SYCL objects has been deprecated. Any use of them internally on a host device should cause an assertion to fail. * host_selector deprecation message has been changed to better reflect that there is no alternative. Signed-off-by: Larsen, Steffen <[email protected]> * [SYCL] Fix typos in xmethods script (#6814) * [SYCL][NFC] Fix aspects detection of AllocaInst when opaque pointers are enabled (#6767) * [NFC][Sema] Minor code quality change in SemaSYCL.cpp (#6827) Signed-off-by: Elizabeth Andrews <[email protected]> * [SYCL] Remove device-dependent tests from in-tree LIT (#6829) This commit removes the currently failing in-tree LIT tests that were previously dependent on a device being present. The removed tests are either moved to the test-suite or were already there. Signed-off-by: Larsen, Steffen <[email protected]> * [SYCL][NFC] Fix unused argument warning in host_selector (#6831) Signed-off-by: Larsen, Steffen <[email protected]> * [SYCL] Make the mock plugin report cl_khr_il_program as supported (#6823) To make build-log tests not depend on the binary type, this commit makes the mock plugin report that the cl_khr_il_program is supported. Together with adding PI_DEVICE_INFO_NAME to mock_piDeviceGetInfo, the corresponding redefinition from the build-log tests can be removed. Signed-off-by: Larsen, Steffen <[email protected]> * [SYCL][NFC] Reintroduce missing Windows symbols (#6830) https://github.com/intel/llvm/pull/6685 unintentionally caused the windows library to lose symbols for handler::verifyKernelInvoc and handler::is_host. This patch introduces these as unused members to avoid the patch being ABI-breaking. Signed-off-by: Larsen, Steffen <[email protected]> * Fix available memory reporting for Arc devices (#6825) current PI Level Zero implementation assumed device memory is only HBM and hence returned 0 for Alchemist GPUs. * [CI] Enable independent build of libclc (#6833) * Update CMakeLists to build libclc based on project presence in cmake command * Add remangled versions build in --ci-defaults * Fix issue on possible libclc targets strings concat * [SYCL][DOCS] Remove references to host device in top-level docs (#6836) This commit removes references to the host device from the Getting Started Guide and the FAQ documents. Signed-off-by: Larsen, Steffen <[email protected]> * [CI] Enable CUDA SYCL CTS tests (#6439) Signed-off-by: Yin Yang <[email protected]> Co-authored-by: Alexey Bader <[email protected]> * [ESIMD] Do simd<T, N>* to <N x T> arg/ret type conv when possible. (#6835) Today, `simd<T, N> foo(simd<T, N> x)` is codegenerated by clang as `void foo(simd<T, N>* sret(simd<T, N>) %res, simd<T, N>* %x)` for the SPIRV target (unless `__regcall` is specified), which is then converted to `void foo(<N x T>* sret(<N x T>) %res, <N x T>* %x)` in the LowerESIMDVecArg, then to `<N x T> foo(<N x T> %x)` in the VC BE. With the opaque pointers this becomes impossible, and the optimization must happen in the "ESIMD FE". This patch implements it. It also changes `lowerEsimdConstructs` in sycl-post-link.cpp to use new pass manager to avoid the old pass manager-related boiler plate code in new ESIMD transformations. Signed-off-by: Konstantin S Bobrovsky <[email protected]> * [SYCL][L0] Use compute engine for me…

rodburns · 2023-06-23T10:02:04Z

This seems to be addressed by the change to an error report rather than exit/die so closing.

* Don't wrap kernels that are not being called in the module This patch is a result of a reflection about previously merged PR KhronosGroup/SPIRV-LLVM-Translator#1149 "add an entry point wrapper around functions (llvm pass)" and is enspired by various reported translator, clang (OpenCL) and Intel GPU drivers issues (see KhronosGroup/SPIRV-LLVM-Translator#2029 for reference). While SPIR-V spec states: === *OpName* --//--. This has nosemantic impact and can safely be removed from a module. === yet having EntryPoint function and a function that shares the name via OpName might be confusing by both (old) drivers and programmers, who read the SPIR-V file. This patch prevents generation of the wrapper function when it's not necessary to generate it aka if a kernel function is not called by other kernel. We can do better in other cases as well, for example I have experiments of renaming a wrapped function adding a previx, so it could be distinguished from the actual kernel/entry point, but for now it doesn't pass validation for E2E OpenCL tests. Signed-off-by: Sidorov, Dmitry <[email protected]> * prevent a copy Signed-off-by: Sidorov, Dmitry <[email protected]> This patch is a result of a reflection about previously merged PR #1149 "add an entry point wrapper around functions (llvm pass)" and is enspired by various reported translator, clang (OpenCL) and Intel GPU drivers issues (see While SPIR-V spec states: OpName --//--. This has nosemantic impact and can safely be removed from a module. yet having EntryPoint function and a function that shares the name via OpName might be confusing by both not-up-to-date drivers and programmers, who read the SPIR-V file. This patch prevents generation of the wrapper function when it's not necessary to generate it aka if a kernel function is not called by other kernel. Signed-off-by: Sidorov, Dmitry <[email protected]> Original commit: KhronosGroup/SPIRV-LLVM-Translator@46285e4

bader added the cuda CUDA back-end label Jul 15, 2020

againull closed this as completed Jul 22, 2020

pvchupin reopened this Mar 23, 2021

bader added bug Something isn't working CTS Impacts Khronos SYCL CTS labels Jun 5, 2021

AerialMantis added the runtime Runtime library related issue label Aug 23, 2021

pgorlani mentioned this issue Aug 3, 2022

[SYCL][CUDA] Make piextKernelSetArgMemObj setting an error message #6521

Merged

rodburns closed this as completed Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA] accessor_api_image CTS test is failing #2119

[SYCL][CUDA] accessor_api_image CTS test is failing #2119

againull commented Jul 15, 2020 •

edited

Loading

bader commented Jul 22, 2020

againull commented Jul 22, 2020

pvchupin commented Mar 23, 2021

pgorlani commented Jan 27, 2022

pvchupin commented Jan 28, 2022

bader commented Aug 1, 2022

pgorlani commented Aug 3, 2022 •

edited

Loading

rodburns commented Jun 23, 2023

[SYCL][CUDA] accessor_api_image CTS test is failing #2119

[SYCL][CUDA] accessor_api_image CTS test is failing #2119

Comments

againull commented Jul 15, 2020 • edited Loading

bader commented Jul 22, 2020

againull commented Jul 22, 2020

pvchupin commented Mar 23, 2021

pgorlani commented Jan 27, 2022

pvchupin commented Jan 28, 2022

bader commented Aug 1, 2022

pgorlani commented Aug 3, 2022 • edited Loading

rodburns commented Jun 23, 2023

againull commented Jul 15, 2020 •

edited

Loading

pgorlani commented Aug 3, 2022 •

edited

Loading