Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][CUDA] accessor_api_image CTS test is failing #2119

Closed
againull opened this issue Jul 15, 2020 · 8 comments
Closed

[SYCL][CUDA] accessor_api_image CTS test is failing #2119

againull opened this issue Jul 15, 2020 · 8 comments
Labels
bug Something isn't working CTS Impacts Khronos SYCL CTS cuda CUDA back-end runtime Runtime library related issue

Comments

@againull
Copy link
Contributor

againull commented Jul 15, 2020

Build compiler
git clone https://github.com/intel/llvm
Hash: b00fb7c

Includes: #1990, #1977

python /localdisk2/ws/againull/sycl/llvm/buildbot/configure.py --cuda -o /localdisk2/ws/againull/sycl/build
python /localdisk2/ws/againull/sycl/llvm/buildbot/compile.py -o /localdisk2/ws/againull/sycl/build

Build accessor CTS tests
git clone https://github.com/KhronosGroup/SYCL-CTS.git
Hash: 9cbe1a719b25c269ef78a2ee08f2e5ed12a1cc6d

Applied: KhronosGroup/SYCL-CTS#52

cmake -G Ninja -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DINTEL_SYCL_ROOT=/localdisk2/ws/againull/sycl/build -DINTEL_SYCL_TRIPLE=nvptx64-nvidia-cuda-sycldevice -DSYCL_IMPLEMENTATION=Intel_SYCL -DSYCL_CTS_ENABLE_OPENCL_INTEROP_TESTS=Off -DSYCL_CTS_ENABLE_DOUBLE_TESTS=On -DSYCL_CTS_ENABLE_HALF_TESTS=On -DINTEL_SYCL_FLAGS="-Xsycl-target-backend;--cuda-gpu-arch=sm_50" -DOpenCL_INCLUDE_DIR=/localdisk2/ws/againull/sycl/build/include/sycl -DOpenCL_LIBRARY=/localdisk2/ws/againull/sycl/build/lib/libOpenCL.so ..

ninja test_accessor -j 12

Run accessor_api_image CTS test
=> ./bin/test_accessor -p nvidia -d opencl_gpu --test accessor_api_image
--- accessor_api_image
. accessor<vec<int32_t, 4>, 1, mode{1024}, target{2017}>
. Checking get_range

PI CUDA ERROR:
Value: 500
Name: CUDA_ERROR_NOT_FOUND
Description: named symbol not found
Function: build_program
Source Location: /iusers/againull/sycl/llvm/sycl/plugins/cuda/pi_cuda.cpp:468

PI CUDA ERROR:
Value: 400
Name: CUDA_ERROR_INVALID_HANDLE
Description: invalid resource handle
Function: cuda_piProgramRelease
Source Location: /iusers/againull/sycl/llvm/sycl/plugins/cuda/pi_cuda.cpp:2807

. sycl exception caught
. what - The program was built for 1 devices
Build program log for 'GeForce GTX 1060 6GB':
-999 (Unknown OpenCL error code)
. line: 63
. a SYCL exception was caught: The program was built for 1 devices
Build program log for 'GeForce GTX 1060 6GB':
-999 (Unknown OpenCL error code)

  • fail

--- accessor_api_image_fp16
. Device does not support half precision floating point operations

  • pass

. Passed 1/14 tests (7%)

@bader bader added the cuda CUDA back-end label Jul 15, 2020
@bader
Copy link
Contributor

bader commented Jul 22, 2020

@againull, @pvchupin, I think we figured out that this was caused by the regression in the driver. Can we close this one?

@againull
Copy link
Contributor Author

@bader
Yes. Test passes with 435.21 nvidia driver when KhronosGroup/SYCL-CTS#52 is applied. Do you know when this PR it is going to be merged?

@pvchupin
Copy link
Contributor

It looks like issue is still reproducible on latest driver 450.102.04.
Let's reopen it, at least for the tracking purpose.

@pvchupin pvchupin reopened this Mar 23, 2021
@bader bader added bug Something isn't working CTS Impacts Khronos SYCL CTS labels Jun 5, 2021
@AerialMantis AerialMantis added the runtime Runtime library related issue label Aug 23, 2021
@pgorlani
Copy link
Contributor

The 500 CUDA_ERROR_NOT_FOUND (that turns in a 801 CUDA_ERROR_NOT_SUPPORTED for CUDA toolkit 11.3 and above) is caused by the suq.depth PTX instruction in 3d sampled readings at

int __nvvm_suq_depth(long) __asm("llvm.nvvm.suq.depth");

int __nvvm_suq_depth_3i(read_only image3d_t) __asm("llvm.nvvm.suq.depth");

NVIDIA tells that this error is expected since this PTX instructions work just in case they are used within the 'OpenCL driver'.
Unfortunately, they added that there are no ways to use this instruction with CUDA. They promised to update the PTX documentation in order to make it clear. If suq.depth is present in the fatbin, it produces the aforementioned error even if it is not actually executed. For this reason, it has been removed with #5378.

Another error emerged when the one above is factored out, a 700 CUDA_ERROR_ILLEGAL_ADDRESS. Which appeared in sampled readings with linear filtering. This has been fixed with #5204.

Unfortunately, the previous two errors are just the tip of the iceberg. Passing the accessor_api_image_core test implies the support of (u)int 8bit channels and their related conversion functions, which are completely missing right now: (u)int accessors can read (u)int_{32,16,8}b channels. Further details can be found in Section 6.12.14 and 8.3 of the OpenCL 1.2 Specification.

In order to let this test pass, the image support has been marked as experimental and deactivated by default with #5204.

In summary:

  • the only type supported are (u)int (32bit), float and half,
  • 8bit channels and related conversion functions are missing,
  • writings and non-sampled readings are supported,
  • 1d/2d sampled readings are supported,
  • 3d sampled readings are not functioning due to suq.depth.

@pvchupin
Copy link
Contributor

@pgorlani, thanks a lot for nice and detailed summary!

@bader
Copy link
Contributor

bader commented Aug 1, 2022

Recent version of test_all (and test_accessor_legacy) aborts on NVIDIA GPU.

pi_die: PI CUDA kernels only support images with channel types int32, uint32, float, and half.
terminate called without an active exception
-------------------------------------------------------------------------------
accessor_api_image_core
-------------------------------------------------------------------------------
SYCL-CTS/tests/accessor_legacy/../common/../../util/proxy.h:35
...............................................................................

SYCL-CTS/tests/accessor_legacy/../common/../../util/proxy.h:35: FAILED:
due to a fatal error condition:
  SIGABRT - Abort (abnormal termination) signal

Can we change pi_die to an exception? pi_die aborts the execution of the whole test suite, whereas exception will just fail a single test.

@pgorlani
Copy link
Contributor

pgorlani commented Aug 3, 2022

Recent version of test_all (and test_accessor_legacy) aborts on NVIDIA GPU.

pi_die: PI CUDA kernels only support images with channel types int32, uint32, float, and half.
terminate called without an active exception
-------------------------------------------------------------------------------
accessor_api_image_core
-------------------------------------------------------------------------------
SYCL-CTS/tests/accessor_legacy/../common/../../util/proxy.h:35
...............................................................................

SYCL-CTS/tests/accessor_legacy/../common/../../util/proxy.h:35: FAILED:
due to a fatal error condition:
  SIGABRT - Abort (abnormal termination) signal

Can we change pi_die to an exception? pi_die aborts the execution of the whole test suite, whereas exception will just fail a single test.

Hi @bader, here is #6521, it should fit our needs.

bader pushed a commit that referenced this issue Sep 9, 2022
…6521)

This patch makes `cuda_piextKernelSetArgMemObj` setting an error message
instead of `std::terminate` in case of the image format is not
supported. This error message is encapsulated in an exception thrown by
the RT.

This allows to continue the SYCL-CTS execution in case of tests using
unsupported channel types, see
#2119 (comment).
YuriPlyakhin pushed a commit to oneapi-src/SYCLomatic that referenced this issue Sep 30, 2022
* [SYCL][ABI-BREAK] Remove sycl::program class (#6666)

According to the SYCL 2020 spec, section
D.1. What has changed from SYCL 1.2.1 to SYCL 2020:

> The program class has been removed and replaced with a new
> class `kernel_bundle`, which provides similar functionality in a
> type-safe and thread-safe way.

Removing of `program_impl` class will be done with a separate commit
since it's not an ABI-breaking change and some performance analysis
should be done in scope of that removal.

Tests depending on `sycl::program` were removed in intel/llvm-test-suite#1187

* [flang] Support lowering of intrinsic module procedure C_F_POINTER

As Fortran 2018 18.2.3.3, the intrinsic module procedure
C_F_POINTER(CPTR, FPTR [, SHAPE]) associates a data pointer with the
target of a C pointer and specify its shape. CPTR shall be a scalar of
type C_PTR, and its value is the C address or the result of a reference
to C_LOC. FPTR is one pointer, either scalar or array. SHAPE is a
rank-one integer array, and it shall be present if and only if FPTR is
an array.

C_PTR is the derived type with only one component of integer 64, and the
integer 64 component value is the address. Build the right "source"
fir::ExtendedValue based on the address and shape, and use
associateMutableBox to associate the pointer with the target of the C
pointer.

Refactor the getting the address of C_PTR to reuse the code.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D132303

* [LoongArch] Fix annotations not matching predicates. NFC.

* [RISCV] Improve vector fceil/ffloor lowering by changing FRM.

This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the
VFCVT.

Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from
RVV intrinsics as I could.

A followup patch will use this approach for FROUND too.

Still need to fix the cost model.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D133238

* [NFC] Remove invisible character in Diagnostic message and tests

* [X86] Support SAE for VCVTPS2PH from intrinsic.

For now, clang and gcc both failed to generate sae version from _mm512_cvt_roundps_ph:
https://godbolt.org/z/oh7eTGY5z. Intrinsic guide description is also wrong, which will be
update soon.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D132641

* [RegisterCoalescer] Fix crash on early clobbered subreg operands.

The issue was with processing two subregs of the same reg are used in the same
instruction (e.g. inline asm): "def early-clobber" and other just "def".
Register coalescer ran in bad recursion if the early clobbered subreg is second
in the following sequence of COPYs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127136

* [OpenMP][OMPD] GDB plugin code to leverage libompd to provide debugging
support for OpenMP programs.

This is 5th of 6 patches started from https://reviews.llvm.org/D100181
This plugin code, when loaded in gdb, adds a few commands like
ompd icv, ompd bt, ompd parallel.
These commands create an interface for GDB to read the OpenMP
runtime through libompd.

Reviewed By: @dreachem
Differential Revision: https://reviews.llvm.org/D100185

* [lld-macho] Simplify linker optimization hint processing

This commit removes the `relocTargets` vector, and instead makes the
code reconstruct the referent addresses from the relocated instructions.
This will allow us to move `applyOptimizationHints` from
`ConcatInputSection::writeTo` to a separate pass that parses and applies
LOHs in one step, on a per-file basis. This will improve performance, as
parsing is currently done serially in `ObjFile::parse`.

I opted to remove the sanity check that ensures that all relocations
within a LOH point to the same symbol. This completely eliminates the
need to search through relocations. It is my understanding that
mismatched relocation targets should not be present in valid object
files, so it's unlikely that the removal will lead to mislinks.

Differential Revision: https://reviews.llvm.org/D133274

* [NFC] Add test of sized deallocation for coroutines

[dcl.fct.def.coroutine]p12 says:

> If both a usual deallocation function with only a pointer parameter
> and a usual deallocation function with both a pointer parameter and a
> size parameter are found, then the selected deallocation function
> shall be the one with two parameters.

However, the sized deallocation function is disabled by default for ABI
reasons. This leads the sentence never get tested and covered. This
commit tries to add a test for it

* [NFC] [Coroutines] Add tests for looking up deallocation

According to [dcl.fct.def.coroutine]p12, the program should be
ill-formed if the promise_type contains operator delete but none of them
are available. But this behavior was not tested before. This commit adds
the tests for it.

* [CMake][MLGO] Fix cmake for MLGO

The if-statement should check whehter TFLITE is on or not rather than if the variable is specified.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D132902

* [OpenMP] Mark -fopenmp-implicit-rpath as NoArgumentUnused

This matches the behavior for all the other -fopenmp options,
as well as -frtlib-add-rpath.

For context, Fedora passes this flag by default in case OpenMP is
used, and this results in a warning if it (usually) isn't, which
causes build failures for some programs with unnecessarily strict
build systems (like Ruby).

Differential Revision: https://reviews.llvm.org/D133316

* [MemorySSA][NFC] Simplify if condition

Differential Revision: https://reviews.llvm.org/D133332

* [lldb] Enable the insertion of "pending callbacks" to MainLoops from other threads

This will be used as a replacement for selecting over a pipe fd, which
does not work on windows. The posix implementation still uses a pipe
under the hood, while the windows version uses windows event handles.

The idea is that, instead of writing to a pipe, one just inserts a
callback, which does whatever you wanted to do after the bytes come out
the read end of the pipe.

Differential Revision: https://reviews.llvm.org/D131160

* [CostModel][X86] Add CostKinds handling for fcmp ops

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually)

SSE numbers are still too low for FCMP_ONE/FCMP_UEQ cases which expand to a more complex sequence than the existing 'ExtraCost' system can manage.

* [OpenCL] Remove argument names from atomic/fence builtins

This simplifies completeness comparisons against OpenCLBuiltins.td and
also makes the header no longer "claim" the argument name identifiers.

Continues the direction set out in D119560.

* [SelectionDAG] Rewrite bfloat16 softening to use the "half promotion" path

The main difference is that this preserves intermediate rounding steps,
which the other route doesn't. This aligns bfloat16 more with half
floats, which use this path on most targets.

I didn't understand what the difference was between these softening
approaches when I first added bfloat lowerings, would be nice if we only
had one of them.

Based on @pengfei 's D131502

Differential Revision: https://reviews.llvm.org/D133207

* Apply clang-tidy fixes for readability-identifier-naming in OptimizeSharedMemory.cpp (NFC)

* Apply clang-tidy fixes for readability-identifier-naming in OpenMPDialect.cpp (NFC)

* [clang-format] [doc] Fix example of wrapping class definitions

Example of

BraceWrapping
  AfterClass
is wrong

Differential Revision: https://reviews.llvm.org/D133087

* [clang-format] Change heuristic for locating lambda template arguments

Previously, the heuristic was simply to look for template argument-
specific keywords, such as typename, class, template and auto
that are preceded by a left angle bracket <.

This changes the heuristic to instead look for a left angle bracket <
preceded by a right square bracket ], since according to the C++
grammar, the template arguments must *directly* follow the introducer.
(This sort of check might just end up being *too* aggressive)

This patch also adds a bunch more token annotator tests for lambdas,
specifically for some of the stranger forms of lambdas now allowed as
of C++20 or soon-to-be-allowed as part of C++23.

Fixes https://github.com/llvm/llvm-project/issues/57093

This does NOT resolve the FIXME regarding explicit template lists, but
perhaps it gets closer

Differential Revision: https://reviews.llvm.org/D132295

* [MLIR] Switch lit tests to %mlir_lib_dir and %mlir_src_dir replacements.

The old replacements will be removed soon:
- `%linalg_test_lib_dir`
- `%cuda_wrapper_library_dir`
- `%spirv_wrapper_library_dir`
- `%vulkan_wrapper_library_dir`
- `%mlir_runner_utils_dir`
- `%mlir_integration_test_dir`

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D133270

* [ARM] Constant pools need 4-byte alignment if we only have tADR

When the only ADR instruction we have is the 16-bit thumb one then all
constant pool entries need to be 4-byte aligned, as tADR has an offset
that's a multiple of 4.

It looks like previously there happened to be no situations in which
we encountered a constant pool entry with alignment less than 4, so
failing to do this didn't cause any problems, but the expansion of
cttz to a table added by D128911 does use a constant pool with
alignment 1, so we now need to handle it correctly.

Differential Revision: https://reviews.llvm.org/D133199

* [AMDGPU][MC][GFX11][NFC] Update assembler tests for MIMG instructions

Differential Revision: https://reviews.llvm.org/D133322

* [MLIR] Fix for commit 0f2ec35

Fix incorrectly formatted python file.

* [CostModel][X86] Add CostKinds handling for SSE FCMP_ONE/FCMP_UEQ predicates

These require special handling to account for their expansion in lowering.

I'm trying very hard not to have to add predicate specific costs - but it might be inevitable.....

* [lldb] [Core] Split read thread support into ThreadedCommunication

Split the read thread support from Communication into a dedicated
ThreadedCommunication subclass.  The read thread support is used only
by a subset of Communication consumers, and it adds a lot of complexity
to the base class.  Furthermore, having a dedicated subclass makes it
clear whether a particular consumer needs to account for the possibility
of read thread being running or not.

The modules currently calling `StartReadThread()` are updated to use
`ThreadedCommunication`.  The remaining modules use the simplified
`Communication` class.

`SBCommunication` is changed to use `ThreadedCommunication` in order
to avoid changing the public API.

`CommunicationKDP` is updated in order to (hopefully) compile with
the new code.  However, I do not have a Darwin box to test it, so I've
limited the changes to the bare minimum.

`GDBRemoteCommunication` is updated to become a `Broadcaster` directly.
Since it does not inherit from `ThreadedCommunication`, its event
support no longer collides with the one used for read thread and can
be implemented cleanly.  The support for
`eBroadcastBitReadThreadDidExit` is removed from the code -- since
the read thread was not used, this event was never reported.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.llvm.org/D133251

* [gn build] Port 9823d42557eb

* [mlir] Add materializeOpFoldResults to turn OpFoldResult array into values.

Differential Revision: https://reviews.llvm.org/D133346

* [lldb] Go back to process-directed signals in MainLoopTest.cpp

Thread-directed signals are not caught by kqueue (used on Mac). This
reverts half of D133181.

* [OpenMP] Add lit test for metadirective device arch inspired
from sollve

This lit test is added based upon the tests present in the
tests/5.0/metadirective directory of the SOLLVE repo
https://github.com/SOLLVE/sollve_vv

Reviewed By: saiislam

Differential Revision: https://reviews.llvm.org/D131763

* [InstCombine] add tests for icmp-of-trunc; NFC

* [InstCombine] reduce code duplication; NFC

* [InstSimplify] allow poison/undef in constant match for "C - X ==/!= X -> false/true"

This fold was added with 5e9522c311dd, but over-specified.
We can assume that an undef element is an odd number:
https://alive2.llvm.org/ce/z/djQmWU

* Update the clang and clang-tools-extra code owners files

This also converts the Clang code owners file from a flat text file to
an RST file that is linked in to the rest of our documentation.

The RFC for this can be found at:
https://discourse.llvm.org/t/rfc-proposed-changes-to-clangs-code-ownership/

Differential Revision: https://reviews.llvm.org/D132550

* [GlobalISel] Combine G_INSERT/EXTRACT_VECTOR_ELT with out of bounds indices to undef.

Differential Revision: https://reviews.llvm.org/D133309

* [flang] Accept assumed shape arrays as SHAPE in C_F_POINTER

C_F_POINTER was added in https://reviews.llvm.org/D132303, but the code
assumed that SHAPE would always be an explicit shape with compile time
constant rank. It can actually be an assumed shape, or an explicit shape
with non compile time constant rank. Get the rank from FPTR pointer
instead.

Differential Revision: https://reviews.llvm.org/D133347

* [SCCP] convert signed div/rem to unsigned for non-negative operands

This extends the transform added with D81756 to handle div/rem opcodes.
For example:
https://alive2.llvm.org/ce/z/cX6za6

This replicates part of what CVP already does, but the motivating example
from issue #57472 demonstrates a phase ordering problem - we convert
branches to select before CVP runs and miss the transform.

Differential Revision: https://reviews.llvm.org/D133198

* [CostModel][X86] Add CostKinds test coverage for ctpop intrinsics

* [CostModel][X86] Add CostKinds test coverage for cttz intrinsics

* [CostModel][X86] Add CostKinds test coverage for ctlz intrinsics

* Fix Clang Sphinx docs build

The CodeOwners.rst file needs to live in the same directory as the rest
of the documentation. This copies the file to the correct place when
making a Sphinx build but continues to leave the .rst file at the root
directory where it's easier for developers to find. This also ensures
that local doc builds using `make html` work as expected.

* [Metadata] Introduce MD_pcsections

Introduces MD_pcsections metadata kind. See added documentation for
more details.

Subsequent patches enable propagating PC sections metadata through code
generation to the AsmPrinter.

RFC: https://discourse.llvm.org/t/rfc-pc-keyed-metadata-at-runtime/64191

Reviewed By: dvyukov, vitalybuka

Differential Revision: https://reviews.llvm.org/D130875

* [MachineInstr] Allow setting PCSections in ExtraInfo

Provide MachineInstr::setPCSection(), to propagate relevant metadata
through the backend. Use ExtraInfo to store the metadata.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130876

* [Object] Refactor code for extracting offload binaries

We currently extract offload binaries inside of the linker wrapper.
Other tools may wish to do the same extraction operation. This patch
simply factors out this handling into the `OffloadBinary.h` interface.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D132689

* [OffloadPackager] Add ability to extract images from other file types

A previous patch added support for extracting images from offloading
binaries. Users may wish to extract these files from the file types they
are most commonly emebedded in, such as an ELF or bitcode. This can be
difficult for the user to do manually, as these could be stored in
different section names potentially. This patch addsp support for
extracting these file types.

Reviewed By: saiislam

Differential Revision: https://reviews.llvm.org/D132607

* [SYCL][CUDA] Fix get_native interop for device (#6649)

This patch fixes: https://github.com/intel/llvm/issues/6635

In https://github.com/intel/llvm/pull/6483, the implementation of `get_native` for device for the CUDA plugin was mistakenly moved to the experimental interface header, and so it was no longer available for the regular interface, causing build issues.

For the CUDA plugin there is currently two interfaces for the CUDA interop, the "legacy" one which is used by projects such as oneMKL and oneDNN, and the "experimental" one, defined in the `sycl/ext/oneapi/experimental/backend/cuda.hpp` header  which implements the interop as described in the CUDA backend specification proposed here: https://github.com/KhronosGroup/SYCL-Docs/pull/197

* [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64

This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076

* [SelectionDAG] Rename CallSiteDbgInfo to NodeExtraInfo

For information infrequently attached to SDNodes, it is useful to
provide a way to add this information out-of-line. This is already done
for call-site specific information.

Rename CallSiteDbgInfo to NodeExtraInfo in preparation of adding
additional information not necessarily related to call sites only.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130880

* [SelectionDAG] Properly copy ExtraInfo on RAUW

During SelectionDAG legalization SDNodes with associated extra info may
be replaced with a new SDNode. Preserve associated extra info on
ReplaceAllUsesWith and remove entries in DeallocateNode.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130881

* Add parantheses to silence warning.

* [AArch64] Additional tests for sinking splats to muls. NFC

* Fix "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64" compilation on Windows

* Fix AMDGPU test failures due to "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64"

* [tsan] Replace mem intrinsics with calls to interceptors

After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm
intrinsics with calls to glibc functions. However this approach is
fragile, as slight changes in pipeline can return llvm intrinsics back.
In particular InstCombine can do that.

Msan/Asan already declare own version of these memory
functions for the similar purpose.

KCSAN, or anything that uses something else than compiler-rt, needs to
implement this callbacks.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D133268

* Fix remaining test failures for "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64"

* [gn build] port 5dbc7cf7cac44

* [bazel] port 5dbc7cf7cac44

* Revert "[lldb][bindings] Fix module_access handling of regex"

This reverts commit 75f05fccbbdd91393bdc7b6183b9dd2b1e859f8e.

This commit broke the windows lldb bot: https://lab.llvm.org/buildbot/#/builders/83/builds/23284

* Fix OpenMP Opt for target without a parallel region.

Remove ctx redeclaration.

Format code.

Remove parallel check. Modify tests. Clean-up code.

Fix another test.

Move code to helper functions.

Format file.

Minor fixes.

* [InstCombine] add tests for add of select with 0 and negate arms; NFC

* [InstCombine] add/move tests for add with select operands that simplify; NFC

* [InstCombine] remove dead code for add (select cond, (sub), 0); NFC

This pattern is handled more generally in SimplifySelectsFeedingBinaryOp().
Tests to confirm that added to the add.ll test file in the previous commit.

* Add docs for Mach-O lld

I wasn't able to find any docs for Mach-O in `lld/docs`, so here's an attempt at adding basic docs. One of my goals here is to make it easy for users who are unfamiliar with linkers to successfully use lld.

Reviewed By: #lld-macho, int3

Differential Revision: https://reviews.llvm.org/D132893

* [CostModel][X86] Add CostKinds handling for ctpop ops

This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (although it still struggles with avx512 predicate numbers which had to be done manually)

Some of the pre-AVX values still aren't great - atom/slm worst case numbers for ctpop expansion really affect these (especially throughput/latency), so we need to clean them up in a more consistent way - its a pity we don't have models for more older cpus (merom/nehalem etc.) as other examples.

* [clang] fix profiling of template arguments of template and declaration kind

Template arguments of template and declaration kind were being profiled
only by their canonical properties, which would cause incorrect
uniquing of constrained AutoTypes, leading to a crash in some cases.

This exposed some places in CheckTemplateArgumentList where non-canonical
arguments where being pushed into the resulting converted list.

We also throw in some asserts to catch early and explain the crashes.

Note that the fix for the 'declaration' kind is untestable at this point,
because there should be no cases right now in the AST where we try
to unique a non-canonical converted template argument.

This fixes GH55567.

Signed-off-by: Matheus Izvekov <[email protected]>

Differential Revision: https://reviews.llvm.org/D133072

* [RISCV] Improve vector fround lowering by changing FRM.

This is a follow up to D133238 which did this for ceil/floor.

Reviewed By: arcbbb, frasercrmck

Differential Revision: https://reviews.llvm.org/D133335

* [mlir][sparse] codegen for sparse alloc

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D133241

* Revert "[tsan] Replace mem intrinsics with calls to interceptors"

Breaks
http://45.33.8.238/macm1/43944/step_4.txt
https://lab.llvm.org/buildbot/#/builders/70/builds/26926

This reverts commit 77654a65a373da9c4829de821e7b393ea811ee40.

* [mlir][sparse] Refactoring: renaming StorageNewOp to StorageOp

To address comment in https://reviews.llvm.org/D133241

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D133363

* [ConstraintElimination] Replace pair with named struct (NFC).

This slightly improves the readability and allows further extensions in
follow-ups.

* [libc++] Avoid instantiating type_trait classes

Use `using` aliases to avoid instantiating lots of types

Reviewed By: ldionne, #libc

Spies: libcxx-commits, miyuki

Differential Revision: https://reviews.llvm.org/D132785

* [AArch64] Add an option to reserve physical registers from RA

This patch adds an option --reserve-regs-for-regalloc, so we can reserve a list
of physical registers. These registers will not be used by register allocator,
but can still be used as ABI requests such as passing arguments to function
call.

Its main purpose is simulating high register pressure by reserving many physical
registers. So it will be much easier to test and debug register allocation
changes.

Differential Revision: https://reviews.llvm.org/D132717

* Revert "[SCCP] convert signed div/rem to unsigned for non-negative operands"

This reverts commit fe1f3cfc2669aca387a45c8ce615b45c1db50fc6.

It looks like this commit breaks building llvm-test-suite.

To reproduce, run `opt -passes=ipsccp` on the IR below.

    @g = internal global i32 256, align 4

    define void @test() {
    entry:
      %0 = load i32, ptr @g, align 4
      %div = sdiv i32 %0, undef
      ret void
    }

* [clang] Implement setting crash_diagnostics_dir through env variable

This implements setting the equivalent of `-fcrash-diagnostics-dir`
through the environment variable `CLANG_CRASH_DIAGNOSTICS_DIR`.
If present, the flag still takes precedence.

This helps integration with test frameworks and pipelines.

With this feature, we change the libcxx bootstrapping build
pipeline to produce clang crash reproducers as artifacts.

Signed-off-by: Matheus Izvekov <[email protected]>

Differential Revision: https://reviews.llvm.org/D133082

* [mlir][spirv] Add base classes for vendor ops

This is the first patch in the series to rename vendor ops from
`spv.NameVENDOR` to `spv.VENDOR.Name`. The goal is to make the SPIR-V
dialect more internally consistent.

Issue: https://github.com/llvm/llvm-project/issues/56863

* [mlir][spirv] Refactor vendor op definitions

Use dedicated vendor op classes/categories. This is so that we can later
change the mnemonics of all vendor ops by changing the base class: `SPV_VendorOp`.

Issue: https://github.com/llvm/llvm-project/issues/56863

* [mlir][spirv] Change vendor op mnemonics to `spv.VENDOR.name`

Make vendor ops more consistent with the naming scheme within the SPIR-V
dialect.

Issue: https://github.com/llvm/llvm-project/issues/56863

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D133247

* [mlir] Make bit enum operators constexpr

This allows using the | operator on the values of enum attributes
in complie-time constants.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D133159

* [lldb] Fix CommunicationKDP following D133251

Add `m_bytes` and `m_bytes_mutex` to `CommunicationKDP`, following refactoring
in D133251.

Differential Revision: https://reviews.llvm.org/D133365

* [ADT] Remove is_splat

`is_splat` is superseded by `all_equal` and marked as deprecated.
See the discussion thread for more details:
https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D132336

* [OpenMP] Fix `omp_get_wtime` function being marked incorrectly as readonly

OpenMP has a list of of optimistic attributes that can be attached to
known runtime functions to aid some analysis. The `omp_get_wtime`
function incorrectly used the `readonly` attribute. This is not correct
at the `omp_get_wtime` function changes values depending on some
external state. This is more correctly modeled with
`inaccessiblememonly` meaning that the value does not depend on anything
within the module, but can not be removes as it depends on external
state.

Fixes #57578

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D133360

* [SYCL] Fix accessor's CTAD for g++ host compiler (#6673)

* [SYCL] Replace hardcoded namespaces with attribute (#6674)

Namespaces were hardcoded and used in compiler to
check for various SYCL types including accessors,
spec_constants, etc. This patch implements an
attribute to uniquely identify the types instead.
Attribute argument is an Identifier which denotes
 each type.

E.g. __attribute__((sycl_type(accessor)) is used 
to mark accessor class.

The attribute has been implemented as with an
accepted list of arguments via EnumArg. The attribute
definition should be updated to support any new types.

The attribute takes 1 argument.

Fixes: https://github.com/intel/llvm/issues/5186

Signed-off-by: Elizabeth Andrews <[email protected]>

* [MC] Emit Stackmaps before debug info

This patch is essentially an alternative to https://reviews.llvm.org/D75836 and was mentioned by @lhames in a comment.

The gist of the issue is that Mach-O has restrictions on which kind of sections are allowed after debug info has been emitted, which is also properly asserted within LLVM. Problem is that stack maps are currently emitted as one of the last sections in each target-specific AsmPrinter so far, which would cause the assertion to trigger. The current approach of special casing for the `__LLVM_STACKMAPS` section is not viable either, as downstream users can overwrite the stackmap format using plugins, which may want to use different sections.

This patch fixes the issue by emitting the stack map earlier, right before debug info is emitted. The way this is implemented is by taking the choice when to emit the StackMap away from the target AsmPrinter and doing so in the base class. The only disadvantage of this approach is that the `StackMaps` member is now part of the base class, even for targets that do not support them. This is functionaly not a problem however, as emitting an empty `StackMaps` is a no-op.

Differential Revision: https://reviews.llvm.org/D132708

* [OpenMP] Remove use of removed '-f[no-]openmp-new-driver' flag

The changes in D130020 removed all support for the old method of
compiling OpenMP offloading programs. This means that
`-fopenmp-new-driver` has no effect and `-fno-openmp-new-driver` does
not work. This patch removes the use and documentation of this flag.
Note that the `--offload-new-driver` flag still exists for using the new
driver optionally with CUDA and HIP.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D133367

* [SYCL][ABI-Break] Remove deprecated OptionalDevice input type of make_queue and remove deprecated make_queue ABIs (#6628)

We introduced a tentative OptionalDevice input type of make_queue API to support the SYCL 2020 API as well as the legacy API which did not require Device to be passed as a parameter.
This PR intends to remove this tentative input type.
Extended: Now this PR also removed deprecated make_queue ABIs and make the make_queue to take a pointer to pi_device, which becomes back to optional.

Signed-off-by: Byoungro So <[email protected]>

* [SYCL] Emit an error on attempt to compile in less than C++17 mode (#6678)

* [CI][NFC] Rename workflow job (#6572)

Just pure renaming of "resolve test matrix" to "generate test matrix"
Follow up on a comment made at https://github.com/intel/llvm/pull/6528

* [CI][NFC] Make lint for commits in PR change only  (#6722)

Make lint for commits in PR change only not taking into account sycl
branch HEAD. Should fix issues like in
https://github.com/intel/llvm/pull/6705 where lint reported errors for
files not affected in PR. To overcome current PRs stuck because of this
they need to use ignore-lint tag until PR will be based on devops
directory changes made here.

* [CI][NFC] Fixed GFX driver update PR (#6723)

Now generate test matrix correctly use
github.event.pull_request.head.sha that is required for
pull_request_target.

* [SYCL] Improve sycl-post-link performance with -split=kernel (#6689)

Right now we are computing a new callgraph in every call to
extractCallGraph. extractCallGraph is called every time we do a module
split, so for -split=kernel, that would be once per kernel. For modules
with many kernels, this can take a very long time. We only need to
compute this once because the input IR doesn't seem to change between
splits.

This improves performance of sycl-post-link from ~45min to ~7min for an
example with 13k kernels

Signed-off-by: Sarnie, Nick <[email protected]>

* [SYCL] Deprecate SYCL 1.2.1 device selectors (#6599)

Now that we have added the SYCL 2020 callable device selectors, we need
to prepare for the removal of the older SYCL 1.2.1 `device_selector`
class. The first step is to add the deprecation message to the 1.2.1
style device selectors, which this PR does. It also removes the usage of
those from our own codebase so as to not trip on our own messages in the
future.

Signed-off-by: Chris Perkins <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
Co-authored-by: Larsen, Steffen <[email protected]>

* [SYCL][CUDA][HIP] Add support for intel extension free-memory (#6709)

The support is added for NVIDIA and AMD devices.

Co-authored-by: Abhishek Bagusetty <[email protected]>

* [SYCL][Windows] Improve windows sycl.lib linking (#6699)

This patch does two things, first it makes `-fsycl` ignore `-nostdlib`
when linking the SYCL library. This is necessary because for Clang on
Windows CMake will generate link commands using `-nostdlib` and
explicitly list the system libraries, but of course it doesn't do it for
SYCL, so we currently end up never linking the SYCL library when this is
used.

Ignoring `-nostdlib` for `-fsycl` on Windows seems like a reasonnable
solution for this as this is also what is done for the OpenMP runtime
libraries.

See the CMake module:
*
https://github.com/Kitware/CMake/blob/aa2de7cd2a04699744a224ab84e0ca483559c5d3/Modules/Platform/Windows-Clang.cmake#L79

In addition this patch also adds a linker parameter to help clang find
the `sycl.lib` file without requiring users to tweak their environments
to link against it.

* [SYCL][Windows] Enable building lld by default on Windows (#6701)

The Windows-Clang CMake module uses `lld-link` by default, so
having lld built on windows is helpful for people wanting to build SYCL
applications using CMake.

Related to: https://github.com/intel/llvm/issues/6026

* [SYCL] Allow buitin_assume_aligned to be called from device code. (#6705)

Prior to this PR, if there is a direct call to __builtin_assume_aligned
from within the device code, we would get the following error:
`error: SYCL kernel cannot call a variadic function`

This PR allows `__builtin_assume_aligned` to be invoked from within
device code.
The reason for doing so is, for example, if we load a float* pointer
from a data structure, the compiler has no way of knowing the underlying
alignment of the data the pointer points to, and therefore it cannot
combine the load/stores and one ends up with many unnecessary load/store
instructions plus lots of unnecessary pointer arithmetic plus increased
register pressure. We see this all over the place in our kernels. With
`__builtin_assume_aligned` we can tell the compiler to safely assume a
certain alignment, therefore implicitly forcing coalescing.

* [SYCL][NFC] Remove unnecessary NULL check for 'Fn' (#6726)

Klocwork thinks that it is possible for 'Fn' to be a nullptr due to this
check but there's already an assert at the beginning of 
CodeGenFunction::GenerateCode to avoid that possibility.

* [SYCL] Fix device comparison in removeDuplicateDevices (#6730)

Kernel bundles attempt to remove duplicate devices from a passed device
list. This is done through the `removeDuplicateDevices` function which
creates a set through comparing devices by their native handles.
However, the `getNative` member function on `device_impl` used to get
these handles will retain the native devices if the backend is OpenCL.
For root devices this will not have an effect, but on sub-devices this
can potentially lead to a leak of the devices. As a fix this commit
compares the PI devices rather than the native handles.

Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL] Remove cl namespace deprecation warning (#6735)

According to the SYCL 2020 specification, CL/sycl.hpp should supply the
cl prefix namespace for backwards compatibility, but it does not specify
that the namespace is deprecated. This commit removes the deprecation
message.

Signed-off-by: Larsen, Steffen <[email protected]>

* [GHA] Uplift Linux GPU RT version to 22.35.24055 (#6704)

Scheduled drivers uplift

Co-authored-by: GitHub Actions <[email protected]>

* [BuildBot] Uplift GPU RT version for Linux CI Process (#6697)

Uplift GPU RT version for Linux to 22.35.24055

Signed-off-by: bb-sycl <[email protected]>

* Update test to match -fvisibility=arg option requirements (#7098)

* [SYCL][ESIMD] Introduce predicates for lsc_block_store/load (#6688)

* Update for LLVM Optional API changes

Update for LLVM commit b5f8d42efe3e ("[ADT] Deprecate
Optional::{hasValue,getValue} (NFC)", 2022-08-07).

This is a mechanical replacement of `hasValue` to `has_value` and
`getValue` to `value`.

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/ffeb4df

* Restore getArgIndex in OCLTypeToSPIRV (#1567)

* Update for LLVM Optional API changes

Update for LLVM commit b5f8d42efe3e ("[ADT] Deprecate
Optional::{hasValue,getValue} (NFC)", 2022-08-07).

This is a mechanical replacement of `hasValue` to `has_value` and
`getValue` to `value`.

* Restore argument tracing in OCLTypeToSPIRV

Commit 4a9c78ee ("Prepare SPIRVWriter for type conversion without
opaque pointers. (#1499)", 2022-06-20) removed `getArgIndex` in favor
of passing `Idx`, but this leads to incorrect adaptation of argument
types if sampler arguments are not in the same position between
different functions.

We might be able to drop `adaptArgumentsBySamplerUse` entirely, as we
don't represent samplers as i32 anymore; but for now just fix the
regression.

Fixes https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/1562

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/87f8a58

* Move to C++17

LLVM has switched to C++17 with commit b1356504e63a ("[LLVM] Update
C++ standard to 17", 2022-08-06).

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/75d16c2

* .clang-tidy: temporarily disable misc-const-correctness

This is a relatively new check added to clang-tidy by 46ae26e7eb70
("[clang-tidy] implement new check 'misc-const-correctness' to add
'const' to unmodified variables", 2022-07-24).  Currently the code
base doesn't follow the practice of declaring variables `const`
where possible, which makes the check quite noisy, so disable it for
now.

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/4dd494e

* Drop the JointMatrixINTEL struct-renaming pass when opaque pointers are enabled. (#1570)

The frontend is being changed to lower the struct name to the correct LLVM name
directly, obviating the need for this check. See
https://github.com/intel/llvm/pull/6535 for this change.

This marks the removal of the final call to the deprecated method
Type::getPointerElementType, although there remains some code that is not fully
working with opaque pointers enabled.

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/e2bb479

* Use TypedPointerType in lieu of PointerIndirectPair in mangleBuiltin. (#1568)

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/42cf770

* Remove 2 unused SPCV_ macros

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/160f013

* Upgrade to Ubuntu 20.04

The Ubuntu 18.04 image is marked deprecated [1], so move to a newer
image.

[1] https://github.com/actions/runner-images

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/20a1fd7

* Fix a -Wmaybe-uninitialized warning

`MDWrapper::get` may not fill its argument in error cases, so
initialize `Arg` to an all-ones value.

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/2c247b7

* Dot product bugfix to include more floating point types (#1578)

Switched the visitCallDot check to use isFloatingPointTy for scalar
floating point operands. Bugfix for previous change regarding
integer dot product.

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/71e01b5

* Mass add -emit-opaque-pointers for tests that don't require changes.

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/1ea77d2

* Add support for toolchain compilation with LLVM_LINK_LLVM_DYLIB option (#1543)

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/40fd741

* Update SPV_INTEL_vector_compute to rev 15

This adds NamedBarrierCountINTEL Execution Mode,
see more in https://github.com/intel/llvm/pull/1612

Co-authored-by: nrudenko <[email protected]>

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/21c7a30

* Remove ExecutionModeNamedBarrierCountINTEL from spirv_internal (#1604)

It was actually upstreamed to SPIR-V Headers already.

Signed-off-by: Sidorov, Dmitry <[email protected]>

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/3335c25

* Mass add -emit-opaque-pointers for tests that require some changes.

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/6e15642

* Implement SPIR-V support for max_reinvocation_delay attribute (#1594)

The attribute generates the llvm.loop.intel.max_reinvocation_delay.count metadata in LLVM IR.
There is one positive integer literal operand denoting the maximum number of cycles allowed
between loop invocations.

Spec: KhronosGroup/SPIRV-Registry#163

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/d4ec010

* Fix type scavenger for variable arguments and multiple-uses-of-types cases. (#1606)

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/401d124

* Add support for split barriers extension SPV_INTEL_split_barrier (#1424)

This PR adds support for split barriers and the SPV_INTEL_split_barrier extension.

The related SPIR-V extension spec can be found here:

* https://github.com/KhronosGroup/SPIRV-Registry/pull/136

The related OpenCL C extension spec can be found here:

* https://github.com/KhronosGroup/OpenCL-Docs/pull/765

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/e3cd296

* Reinstate deprecated-declarations warning as error (#1609)

This reverts commit 55d1de820841e6d9d1c6ca0cd534323d69a1cbf1.

Now that the last use of the deprecated `getPointerElementType` has
been removed, treat uses of deprecated functions as errors again.

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/6811488

* Rework the demangler to support recovering more pointer element types. (#1556)

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/b412ae5

* Translate function pointer from global variable as pointer, not as declaration (#1608)

This patch helps to avoid invalid SPV generation.

When global variable contains a pointer to a function, translator tries to
translate it as declaration. Then it translates this function the second time
when going through the function list. This leads to double translation
of the same function and to the usage of the same IDs in SPIR-V file.

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/2b4ce42

* Add SPV_INTEL_masked_gather_scatter extension (#1580)

This extension allows TypeVector to have a Physical Pointer Type
Component Type and introduces gather/scatter instructions.
It will be useful for explicitly vectorized kernels.

Spec: https://github.com/intel/llvm/pull/6613

Signed-off-by: Sidorov, Dmitry <[email protected]

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/49b08e8

* [SYCL][Doc] Remove stale SPIR-V extension spec (#6741)

This SPIR-V extension was promoted to a KHR, and that specification is
on the SPIR-V registry. The DPC++ compiler uses the KHR version (not
this INTEL version), so this old specification can be removed.

* [SYCL][CUDA] Make `piextKernelSetArgMemObj` setting an error message (#6521)

This patch makes `cuda_piextKernelSetArgMemObj` setting an error message
instead of `std::terminate` in case of the image format is not
supported. This error message is encapsulated in an exception thrown by
the RT.

This allows to continue the SYCL-CTS execution in case of tests using
unsupported channel types, see
https://github.com/intel/llvm/issues/2119#issuecomment-1201548912.

* [SYCL][Doc] Add sycl complex to complex algorithms extension (#6717)

This PR extends the complex algorithms extension to support
`sycl::ext::oneapi::complex` and `marray<sycl::ext::oneapi::complex>`.
Additionally it adds the `multiplies` operator as a valid binary
operation for complex values when reducing and scanning across work
items. This PR has a dependency upon #6550.

* [SYCL][NFC] Add SYCLPropagateAspectsUsage pass (#6670)

Added a pass which is a part of optional kernel features design: it uses
information provided by FE & Headers about aspects used in device code
to propagate it through the call graph to mark all kernels and functions
with list of aspects they use.

Co-authored-by: Maksim Sabianin <[email protected]>

* [NFC] Fix the expected assert message in LIT test gather_scatter_rgba.cpp

Signed-off-by: Vyacheslav N Klochkov <[email protected]>

* [SYCL] Make intel specific device info descriptors namespace qualified (#6639)

Conforming to SYCL 2020 specification section 6.3.1 and 4.6.4.2, to make
extension information descriptors templated and within the correct
namespace.

-Also moved deprecated info descriptors for device into separate file   

-Changed namespace of the recently added [device memory
extension](https://github.com/intel/llvm/pull/6604) to
ext::intel::info::device

Signed-off-by: Rauf, Rana <[email protected]>
Co-authored-by: Steffen Larsen <[email protected]>

* [SYCL] Fix USM free for descendent devices (#6733)

Now that use of descendent devices of context members is supported, we
cannot rely on always choosing the only device in single-device
contexts. Remove this branch to always fetch the device from the
platform.

* [SYCL] Test must pass -opaque-pointers explicitly

Not yet the default for sycl.

* Add opaque pointers switch to llvm/test/Verifier/dllstorage.ll

* Remove clang/test/Driver/openmp-sycl-interop.c test form testing.
The test is specific to behaviors involving SYCL and OpenMP offloading.
Since OpenMP offload has moved to the new offloading model the test is
not relevant anymore.

* [SYCL] Pass /Zc:__cplusplus in -fsycl-host-compiler-options in some tests (#6751)

By default MSVC reports 199711L as the standard being used and needs
that option to report C++ version properly. This fixes current post-commit
failures on the tests modified.

* Delete obsolete Clang::Driver tests

* Revert "Delete obsolete Clang::Driver tests"

This reverts commit 84be9c2cb06b5c98225ae0da80e32aa161483187.

* The XFAIL tests are specific to the old OpenMP offloading model, which was removed.

* Disable sycl-libspirv for amdgcn-amd-amdhsa target in clang/test/Driver/sycl-amdgcn-sqrt.cpp

* [SYCL][ABI-Break][NFC] Remove unusued members (#6731)

This commit removes the following:

- The unused member function `kernel_bundle::join_impl` overload.
- The unused member variable `Offset` from `stream`.
- The unused member function `buffer_impl::constructorNotification`
overload.

Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL] Fix memory leak in program link (#6641)

This PR fixes the memory leak caused by missing the call to program
release.

Signed-off-by: Byoungro So <[email protected]>

* [SYCL] Remove mentions of host device from in-tree LIT tests (#6683)

* [ESIMD] Fix invoke_simd calls case with pointer passed to it (#6696)

The helper function created during translation of invoke_simd must
accept a pointer to a function, not a reference to a pointer to a function.
That additional level of indirection is automatically resolved by compiler
for invoke_simd, but needs to be manually resolved/adjusted for the helper
function.

Signed-off-by: Vyacheslav N Klochkov <[email protected]>

* [SYCL] Remove CG/handler extended members mechanism (#6759)

Now that the extended members have been promoted to proper fields of
CG/handler classes, the extended member mechanism can be removed until
it's needed again.

* Suppress some clang-tidy errors

Split the .clang-tidy check lists out over multiple lines to improve
readability.

Suppress `misc-non-private-member-variables-in-classes` as the code
currently contains many instances that fail this check.

Drop `constexpr` from `LoopControlLoopCountINTELMask` after clang
started diagnosing this with b36453530418 ("[Clang] Diagnose
ill-formed constant expression when setting ...", 2022-07-28).
Removing `constexpr` is just a workaround, the long term fix would be
to upstream the new enum value to `spirv.hpp`.

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/c5b29f2

* Update LLVM version from 15 to 16

Original commit:
https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/7aa1fd2

* [SYCL] Assign code owners for invoke_simd.hpp and simd.hpp. (#6746)

* [SYCL] Fix memory leak (queue_impl)  due to #5901 (#6707)

Cross dependency event_impl vs queue_impl prevents objects release.
Event_impl now has only weak pointer to queue.

Signed-off-by: Tikhomirova, Kseniya <[email protected]>

* Fix the Clang sphinx build bot; NFC

This should address the failure introduced in:
https://lab.llvm.org/buildbot/#/builders/92/builds/32377

* [SYCL] Move accessor_impl to source directory (#6698)

Moving impl part of host accessor implementation to avoid exposing
implementation details in the headers. This allows for more changes
in accessor without breaking ABI.

Also updated the gdb xmethods since it was relying on the
impl details which are not available for gdb(unless libsycl.so is
built with debug symbols) anymore. Instead of accessing members of
impl directly gdb printers now accessing helper methods. To prevent
compiler discarding these methods there are dummy references which
are active when NDEBUG is not defined.

* [SYCL] Silence -Wctad-maybe-unsupported warning

This might be an FE bug. Will follow up separately. Workaround should be
fine to unblock pulldown.

* [sycl-bisect] Add missing exit (#6771)

This script has been missing one call to exit, which causes control to
fall through and run successful tests a second time when using --command
without --command-allow-bisect-codes. This didn't change the results of
the script, but it did make it slower if the test command takes a while
to run.

* [ESIMD] Change LSC API to improve template argument type deduction (#6764)

* [CI] Fix dependabot alert (#6725)

* [SYCL] More changes to silence -Wctad-maybe-unsupported

Mutex changes could be reverted if/when this change lands:
https://reviews.llvm.org/D133425

* [SYCL][ABI-Break] Merge DeviceBinaryImage and RTDeviceBinaryImage (#6768)

To avoid future issues with ABI for DeviceBinaryImage, this commit makes
the following changes:
* Moves RTDeviceBinaryImage to a source header.
* Merges DeviceBinaryImage into RTDeviceBinaryImage.
* Promotes the common property ranges from DeviceBinaryImage into new
members.

Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL] Fix get_pointer_device for cases with descendent devices (#6719)

Looking through context members alone when searching for a specific
device isn't enough anymore since now descendent devices of context
members can be used within that context as well. Change the logic to
look for the device in the cache instead.

* [Matrix][SYCL] Add use argument for joint_matrix and add another feat… (#5835)

…ure macro for it

* [SYCL] Remove unused argument from getDeviceImpl (#6780)

* [SYCL][Windows] Fix debug build in non cl mode  (#6721)

This fixes building SYCL programs in Debug mode with
`Windows-Clang.cmake`.

The issue is that the code was using `OPT__SLASH_MDd` to select
`sycld.lib` but under the  `!C.getDriver.IsCLMode()` condition this flag
will never be set, `OPT_g_Flag` should be used instead (`-g` rather than
`/MDd`).

Note that using the regular `clang` command line to manually build with
`-g` still doesn't work as it will link against `msvcrt` rather than
`msvcrtd` and will miss required defines for debug builds on Windows
(`_DEBUG`). This is correctly done by the CMake module or simply when
using `clang-cl`.

* [SYCL][DOC] Add extension for FPGA kernel interface properties (#5715)

## SYCL extension contains the following new kernel properties
- `streaming_interface<...>`
- `register_map_interface<...>`
The first two properties take enum arguments that provide the compiler
information about whether the logic downstream to the kernel will
back-pressure the kernel or not.
- `pipelined<N>`
Takes an integer, non-zero values specify minimum cycles between kernel
invocations, and 0 specifies that pipelining should be disabled.

Co-authored-by: GarveyJoe <[email protected]>

* [SYCL] Fix llvm.used removal when used with opaque pointers. (#6773)

The code priorily assumed that all functions when used in @llvm.used
would be wrapped within a bitcast <fnptr type> to i8*; with opaque
pointers, the values would be functions directly, causing a crash since
functions don't have any operands.

* [SYCL] Add MAJOR_VERSION to the name of the sycl library on Win (#6745)

* [SYCL] Silence -Wctad-maybe-unsupported for check-sycl

Add deduction guide to kernel_bundle

* [SYCL] Add --host-target flag and remove --arm flag (#6620)

This patch adds a new flag to `configure.py` to allow changing the host
target to build, this makes it easier to build on different hosts such
as ARM or PowerPC.

This patch also removes the `--arm` flag as it is now redundant,
`--host-target=ARM` or `--host-target=AArch64` should be used instead.

This is slightly different than the original `--arm` flag as it only
allows to enable one of the ARM platforms. But I'm not sure what the use
case was for enabling both, as only one will be the host platform for a
given build. And compilation time was given as a reason to drop the X86
architecture originally so only enabling the correct ARM architecture
should also help with that.

Co-authored-by: Alexey Bader <[email protected]>

* [NFC] Update sycl-force-target test to play nicely for 32-bit host (#6785)

Update the target for the test to use 64-bit to allow for the test to
emit the expected device target when unbundling.

* [SYCL] Use copy engine for memory read/write operations (#6783)

In the present state of the L0 plugin, MemBufferWrite and MemBufferRead
operations are being executed using compute engines. This patch changes
this behavior for the operations to be executed using copy engines.
It is expected to improve performance.

Signed-off-by: Arvind Sudarsanam <[email protected]>

* [SYCL][FPGA] Support Intel FPGA simulator device selector (#6715)

Create **fpga_simulator_selector** to be used to select the FPGA
simulator device. This assumes that the user has compiled their program
with -Xssimulation (as well as -fintelfpga) to prepare for the
simulator.

An object of class fpga_simulator_selector must be constructed early in
the execution of the host program in order to make the simulator
available at runtime.

The current implementation has the effect that if an object of class
sycl::ext::intel::fpga_simulator_selector is defined, FPGA hardware
devices selected using sycl::ext::intel::fpga_selector will select
simulator devices. This will be documented in the release notes and the
specification. We expect this behaviour to be eliminated in the future.

* [SYCL][CUDA] Add support of CUDA XPTI tracing (#6373)

Fork of https://github.com/intel/llvm/pull/5797

This patch rebases and finalizes (similarly to
https://github.com/intel/llvm/pull/6023) the draft in
https://github.com/intel/llvm/pull/5797, which already contained the
most important commits thanks to @alexbatashev.

The most relevant additions of this patch were done in CMake files, in
particular
 
- switch from `FindCUDA` CMake (deprecated) module to the
`FindCUDAToolkit` one in order to find cupti library by means of
`CUDA_cupti_LIBRARY`. This is advisable because on some systems
`FindCUDA` fails to find `CUDA_cupti_LIBRARY`. This is also the case of
the CI, see the
[log](https://github.com/intel/llvm/runs/7115612243?check_suite_focus=true)
in case of `FindCUDA` is used.
- find `generated_cuda_meta.h` for generating the CUDA printer
definitions, since the location of this header file seems to vary
depending on the system, in case of this file is not found a warning is
printed and no errors are thrown.

Co-authored-by: Alexander Batashev <[email protected]>

* [SYCL] Add diagnostic test for global_variable_allowed attribute (#6777)

Add cases for this attribute independent of device_global

* [SYCL][ABI-Break] Implement property interface for local_accessor & usm_allocator (#6737)

* [SYCL][Windows] Fix DataMovement test (#6790)

Using `-g` with the regular `clang` command line is not supported on
Windows.

On Windows `clang-cl` and `/Mdd` should be used instead.

However it doesn't seem like this test is testing anything to do debug
info and I couldn't find any reason for having it in the history, so
removing `-g` is the simplest solution to make the test work on both
Linux and Windows.

This should fix the post-commit issue that showed up in:
https://github.com/intel/llvm/pull/6721

* [SYCL][ABI-Break] Add SYCL 2020 kernel_device_specific::max_sub_group_size (#6782)

SYCL 2020 promotes the info::kernel_device_specific::max_sub_group_size
query on kernels, but removes the additional argument. This commit adds
an overload with no additional argument and deprecates the old variant.

* [SYCL][NFC] Add another option for the lambda size mismatch message (#6794)

* [SYCL] Introduce a fully-mocked PI plugin for unit tests (#6684)

The idea behind this PR is to introduce an infrastructure which allows
to write unit-tests, which are not dependent on a presence of actual
backends (like OpenCL, L0, etc.) or devices (like CPU, GPU, etc.).

Motivation for the patch: host device is going to be removed and we have
a number of in-tree LIT tests, which can't pass if there are no devices
available, so they are likely to be removed/disabled in short-term
(#6683), unless we move them into llvm-test-suite or unit-tests with new
infrastructure suggested here.

Signed-off-by: Larsen, Steffen <[email protected]>
Co-authored-by: Sachkov, Alexey <[email protected]>

* [SYCL][ESIMD][EMU] Enable fp64/double type support (#6796)

* For PI_DEVICE_INFO_EXTENSIONS, 'piDeviceGetInfo' returns 'cl_khr_fp64'
as the type can be supported as native 'double' type
* Fixes current pre-commit issue with unexpected pass on SYCL :: spec_const/spec_const_double.cpp

* [SYCL] Fix macro definition conflicting with MSVC (#6798)

The _CONCAT macro is defined in MSVC headers so to avoid redefinition
warnings this commit changes the naming in the PI mock plugin from
_CONCAT to _PI_MOCK_PLUGIN_CONCAT.

Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL] Update DPCPP library major version (#6801)

* [SYCL] Add assert for device_global without device_image_scope (#6791)

device_global is currently not fully supported but for backends that
support it, they should be usable on device only when the
device_image_scope property is present. This commit adds a temporary
static assert to avoid prevent the use of device_global without
device_image_scope until proper initialization has been implemented.

Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL] Fix unittests on MSVC (#6805)

This commit makes two changes:
* Fixes a cast of incompatible size in the PI mock plugin that caused a
warning on MSVC.
* Changes the definition of the captureless lambda function in the
RedefineAPI PiMock unittest to avoid MSVC considering it equal to the
function defined by the PI mock plugin.

Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL][Doc] Closing the ABI breaking changes window (#6800)

Next one is expected to be not earlier than June 2023.

* [SYCL] Fix deprecation warning for headers (#6808)

Using pragmas to emit warnings didn't work because SYCL headers are
considered to be system headers and any warnings in them are suppressed.

Use "#warning" instead. Unfortunately, MSVC doesn't support it (although
it's part of C23/C++23 and they'll have to add support eventually), so
we need some #if guards. Also, #warning cannot be put inside a macro
definition, thus we have to have some code duplication. Luckily, entire
headers deprecations aren't as often and we can be a little bit verbose.

* [SYCL][DOC] Extension to add hints to stop batching and start executing (#6465)

Signed-off-by: James Brodman <[email protected]>
Co-authored-by: Greg Lueck <[email protected]>
Co-authored-by: John Pennycook <[email protected]>

* [SYCL] Make host device inaccessible through SYCL API (#6685)

This commit removes the host device from the device list and as such the
host device will no longer be available in user code. The following
changes are a result of this:
* Device filters using 'host' as either backend or device type will
cause a warning at runtime. Since there is no host device selectable for
these filters, the resulting device list will not contain a host
   device.
* is_host() on SYCL objects has been deprecated. Any use of them
internally on a host device should cause an assertion to fail.
* host_selector deprecation message has been changed to better reflect
that there is no alternative.

Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL] Fix typos in xmethods script (#6814)

* [SYCL][NFC] Fix aspects detection of AllocaInst when opaque pointers are enabled (#6767)

* [NFC][Sema] Minor code quality change in SemaSYCL.cpp (#6827)

Signed-off-by: Elizabeth Andrews <[email protected]>

* [SYCL] Remove device-dependent tests from in-tree LIT (#6829)

This commit removes the currently failing in-tree LIT tests that were
previously dependent on a device being present. The removed tests are
either moved to the test-suite or were already there.

Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL][NFC] Fix unused argument warning in host_selector (#6831)

Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL] Make the mock plugin report cl_khr_il_program as supported (#6823)

To make build-log tests not depend on the binary type, this commit makes
the mock plugin report that the cl_khr_il_program is supported. Together
with adding PI_DEVICE_INFO_NAME to mock_piDeviceGetInfo, the
corresponding redefinition from the build-log tests can be removed.

Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL][NFC] Reintroduce missing Windows symbols (#6830)

https://github.com/intel/llvm/pull/6685 unintentionally caused the
windows library to lose symbols for handler::verifyKernelInvoc and
handler::is_host. This patch introduces these as unused members to avoid
the patch being ABI-breaking.

Signed-off-by: Larsen, Steffen <[email protected]>

* Fix available memory reporting for Arc devices (#6825)

current PI Level Zero implementation assumed device memory is only HBM
and hence returned 0 for Alchemist GPUs.

* [CI] Enable independent build of libclc (#6833)

* Update CMakeLists to build libclc based on project presence in cmake
command
* Add remangled versions build in --ci-defaults
* Fix issue on possible libclc targets strings concat

* [SYCL][DOCS] Remove references to host device in top-level docs (#6836)

This commit removes references to the host device from the Getting
Started Guide and the FAQ documents.

Signed-off-by: Larsen, Steffen <[email protected]>

* [CI] Enable CUDA SYCL CTS tests (#6439)

Signed-off-by: Yin Yang <[email protected]>
Co-authored-by: Alexey Bader <[email protected]>

* [ESIMD] Do simd<T, N>* to <N x T> arg/ret type conv when possible. (#6835)

Today, `simd<T, N> foo(simd<T, N> x)` is codegenerated by clang as
   `void foo(simd<T, N>* sret(simd<T, N>) %res, simd<T, N>* %x)`
for the SPIRV target (unless `__regcall` is specified), which is then
converted to
  `void foo(<N x T>* sret(<N x T>) %res, <N x T>* %x)`
in the LowerESIMDVecArg, then to
  `<N x T> foo(<N x T> %x)`
in the VC BE.

With the opaque pointers this becomes impossible, and the optimization
must happen in the "ESIMD FE". This patch implements it.

It also changes `lowerEsimdConstructs` in sycl-post-link.cpp to use new
pass manager to avoid the old pass manager-related boiler plate code in
new ESIMD transformations.

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

* [SYCL][L0] Use compute engine for me…
@rodburns
Copy link

This seems to be addressed by the change to an error report rather than exit/die so closing.

jsji pushed a commit that referenced this issue Aug 11, 2023
* Don't wrap kernels that are not being called in the module

This patch is a result of a reflection about previously merged PR
KhronosGroup/SPIRV-LLVM-Translator#1149
"add an entry point wrapper around functions (llvm pass)"
and is enspired by various reported translator, clang (OpenCL) and Intel
GPU drivers issues (see
KhronosGroup/SPIRV-LLVM-Translator#2029 for
reference).

While SPIR-V spec states:
===
*OpName*
--//--. This has nosemantic impact and can safely be removed from a module.
===
yet having EntryPoint function and a function that shares the name via
OpName might be confusing by both (old) drivers and programmers, who
read the SPIR-V file.

This patch prevents generation of the wrapper function when it's not
necessary to generate it aka if a kernel function is not called by other kernel.

We can do better in other cases as well, for example I have experiments
of renaming a wrapped function adding a previx, so it could be
distinguished from the actual kernel/entry point, but for now it doesn't
pass validation for E2E OpenCL tests.

Signed-off-by: Sidorov, Dmitry <[email protected]>

* prevent a copy

Signed-off-by: Sidorov, Dmitry <[email protected]>

This patch is a result of a reflection about previously merged PR #1149 "add an entry point wrapper around functions (llvm pass)" and is enspired by various reported translator, clang (OpenCL) and Intel GPU drivers issues (see

While SPIR-V spec states:
OpName
--//--. This has nosemantic impact and can safely be removed from a module.
yet having EntryPoint function and a function that shares the name via OpName might be confusing by both not-up-to-date drivers and programmers, who read the SPIR-V file.

This patch prevents generation of the wrapper function when it's not necessary to generate it aka if a kernel function is not called by other kernel.

Signed-off-by: Sidorov, Dmitry <[email protected]>

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@46285e4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CTS Impacts Khronos SYCL CTS cuda CUDA back-end runtime Runtime library related issue
Projects
None yet
Development

No branches or pull requests

6 participants