Skip to content

Commit

Permalink
[SYCL][CUDA] Initial CUDA backend support (#1091)
Browse files Browse the repository at this point in the history
* [SYCL][LIBCLC] Additional libclc builtins to support SYCL work

Adds builtins to libclc to support the CUDA backend for SYCL.

Contributors
Alexander Johnston <[email protected]>
David Wood <[email protected]>
Victor Lomuller <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] CMake and lit support for SYCL CUDA backend

Adds defines CMake and lit variables used for SYCL CUDA backend
development and test

Contributors
Alexander Johnston <[email protected]>
Bjoern Knafla <[email protected]>
Ruyman Reyes <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Local Accessor Support for CUDA

Provides the LocalAccessorToSharedMemory compiler pass required
for supporting SYCL local accessors in CUDA.

Contributors
Alexander Johnston <[email protected]>
David Wood <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Change __spirv_BuiltIn.. to functions

Changes the following builtins to functions

__spirv_BuiltInGlobalSize
__spirv_BuiltInWorkgroupSize
__spirv_BuiltInNumWorkgroups
__spirv_BuiltInLocalInvocationId
__spirv_BuiltInWorkgroupId
__spirv_BuiltInGlobalOffset

Contributors
David Wood <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Add SYCL CUDA support to clang driver

Adds CUDA support for sycl compilation in the clang driver

Contributors
Alexander Johnston <[email protected]>
David Wood <[email protected]>
Victor Lomuller <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Initial Implementation of the CUDA backend

Contributors
Alan Forbes <[email protected]>
Alexander Johnston <[email protected]>
Bjoern Knafla <[email protected]>
Daniel Soutar <[email protected]>
David Wood <[email protected]>
Kumudha Narasimhan <[email protected]>
Mehdi Goli <[email protected]>
Przemek Malon <[email protected]>
Ruyman Reyes <[email protected]>
Stuart Adams <[email protected]>
Svetlozar Georgiev <[email protected]>
Steffen Larsen <[email protected]>
Victor Lomuller <[email protected]>

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Update libclc install rules

Have libclc install clc-* and libspirv-* to lib and share

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Inline cl namespace to simplify SYCL API usage

Synchronise the CUDA backend with the general SYCL changes from #974.

Signed-off-by: Andrea Bocci <[email protected]>

* Added missing flags for device-side builtins

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Removing unnecessary tool from the tree

Acked-by: Victor Lomuller <[email protected]>
Signed-off-by: Ruyman <[email protected]>

* [SYCL][PI] Fix kernel group info parameter conversion

Signed-off-by: Steffen Larsen <[email protected]>

* [SYCL][CUDA] Refactor __SYCL_INLINE macro

Synchronise the CUDA backend with the general SYCL changes from #1121.

Signed-off-by: Andrea Bocci <[email protected]>

* [SYCL] Have default_selector consider SYCL_BE

Have the default_selector consider the env var SYCL_BE when rating
device scores to make choosing a backend easier.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Select GlobalPlugin based on SYCL_BE

Rather than choose the last found plugin as GlobalPlugin, select
it depending on the SYCL_BE env var.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Improve default device selection checks

Better checks for CUDA and OpenCL devices to match with SYCL_BE in the
default device selection, based on the platform version info.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Formatting update for device_selector.cpp

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Changed CUDA unit tests to call through plugin

Signed-off-by: Steffen Larsen <[email protected]>

* [SYCL] Pass SYCL_BE=PI_OPENCL in check-sycl

To ensure that the check-sycl targets test OpenCL devices, pass
SYCL_BE=PI_OPENCL. This mirrors the check-sycl-cuda target which
passes SYCL_BE=PI_CUDA. Without this it is nondeterministic which
device is tested by check-sycl.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Remove PI_CUDA specific details from clang

Removes PI_CUDA specific code paths and tests from clang, opting to
always enable them.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Disable linear_id/opencl-interop.cpp for cuda

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Further fixes to CUDA device selection

Fix platform string comparison for CUDA platform detection.
Fix device info platform query so that it uses the device's plugin,
rather than the GlobalPlugin.

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Code style and cleanup to CUDA support

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL] Enable asserts in all buildbot builds

Signed-off-by: Alexander Johnston <[email protected]>

* [SYCL][CUDA] Minor test and build configuration

Fix minor test and build configuration issues introduced in the
development of the CUDA backend.

Signed-off-by: Alexander Johnston <[email protected]>

Co-authored-by: Andrea Bocci <[email protected]>
Co-authored-by: Ruyman <[email protected]>
Co-authored-by: Steffen Larsen <[email protected]>
  • Loading branch information
4 people authored Feb 24, 2020
1 parent a0c0e33 commit 7a9a425
Show file tree
Hide file tree
Showing 820 changed files with 20,902 additions and 3,437 deletions.
58 changes: 39 additions & 19 deletions buildbot/configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,30 +11,49 @@ def do_configure(args):
sycl_dir = os.path.join(args.src_dir, "sycl")
spirv_dir = os.path.join(args.src_dir, "llvm-spirv")
ocl_header_dir = os.path.join(args.obj_dir, "OpenCL-Headers")
icd_loader_lib = ''
icd_loader_lib = os.path.join(args.obj_dir, "OpenCL-ICD-Loader", "build")
llvm_targets_to_build = 'X86'
llvm_enable_projects = 'clang;llvm-spirv;sycl;opencl-aot'
libclc_targets_to_build = ''
sycl_build_pi_cuda = 'OFF'
llvm_enable_assertions = 'ON'

if platform.system() == 'Linux':
icd_loader_lib = os.path.join(args.obj_dir, "OpenCL-ICD-Loader", "build", "libOpenCL.so")
icd_loader_lib = os.path.join(icd_loader_lib, "libOpenCL.so")
else:
icd_loader_lib = os.path.join(args.obj_dir, "OpenCL-ICD-Loader", "build", "OpenCL.lib")
icd_loader_lib = os.path.join(icd_loader_lib, "OpenCL.lib")

if args.cuda:
llvm_targets_to_build += ';NVPTX'
llvm_enable_projects += ';libclc'
libclc_targets_to_build = 'nvptx64--;nvptx64--nvidiacl'
sycl_build_pi_cuda = 'ON'

if args.assertions:
llvm_enable_assertions = 'ON'

install_dir = os.path.join(args.obj_dir, "install")

cmake_cmd = ["cmake",
"-G", "Ninja",
"-DCMAKE_BUILD_TYPE={}".format(args.build_type),
"-DLLVM_EXTERNAL_PROJECTS=sycl;llvm-spirv;opencl-aot",
"-DLLVM_EXTERNAL_SYCL_SOURCE_DIR={}".format(sycl_dir),
"-DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR={}".format(spirv_dir),
"-DLLVM_ENABLE_PROJECTS=clang;sycl;llvm-spirv;opencl-aot",
"-DOpenCL_INCLUDE_DIR={}".format(ocl_header_dir),
"-DOpenCL_LIBRARY={}".format(icd_loader_lib),
"-DLLVM_BUILD_TOOLS=ON",
"-DSYCL_ENABLE_WERROR=ON",
"-DLLVM_ENABLE_ASSERTIONS=ON",
"-DCMAKE_INSTALL_PREFIX={}".format(install_dir),
"-DSYCL_INCLUDE_TESTS=ON", # Explicitly include all kinds of SYCL tests.
llvm_dir]
cmake_cmd = [
"cmake",
"-G", "Ninja",
"-DCMAKE_BUILD_TYPE={}".format(args.build_type),
"-DLLVM_ENABLE_ASSERTIONS={}".format(llvm_enable_assertions),
"-DLLVM_TARGETS_TO_BUILD={}".format(llvm_targets_to_build),
"-DLLVM_EXTERNAL_PROJECTS=sycl;llvm-spirv;opencl-aot",
"-DLLVM_EXTERNAL_SYCL_SOURCE_DIR={}".format(sycl_dir),
"-DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR={}".format(spirv_dir),
"-DLLVM_ENABLE_PROJECTS={}".format(llvm_enable_projects),
"-DLIBCLC_TARGETS_TO_BUILD={}".format(libclc_targets_to_build),
"-DOpenCL_INCLUDE_DIR={}".format(ocl_header_dir),
"-DOpenCL_LIBRARY={}".format(icd_loader_lib),
"-DSYCL_BUILD_PI_CUDA={}".format(sycl_build_pi_cuda),
"-DLLVM_BUILD_TOOLS=ON",
"-DSYCL_ENABLE_WERROR=ON",
"-DCMAKE_INSTALL_PREFIX={}".format(install_dir),
"-DSYCL_INCLUDE_TESTS=ON", # Explicitly include all kinds of SYCL tests.
llvm_dir
]

print(cmake_cmd)

Expand Down Expand Up @@ -63,6 +82,8 @@ def main():
parser.add_argument("-o", "--obj-dir", metavar="OBJ_DIR", required=True, help="build directory")
parser.add_argument("-t", "--build-type",
metavar="BUILD_TYPE", required=True, help="build type, debug or release")
parser.add_argument("--cuda", action='store_true', help="switch from OpenCL to CUDA")
parser.add_argument("--assertions", action='store_true', help="build with assertions")

args = parser.parse_args()

Expand All @@ -74,4 +95,3 @@ def main():
ret = main()
exit_code = 0 if ret else 1
sys.exit(exit_code)

3 changes: 3 additions & 0 deletions clang/include/clang/Basic/DiagnosticDriverKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ def warn_drv_unknown_cuda_version: Warning<
"Unknown CUDA version %0. Assuming the latest supported version %1">,
InGroup<CudaUnknownVersion>;
def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">;
def err_drv_no_sycl_libspirv : Error<
"cannot find `libspirv-nvptx64--nvidiacl.bc`. Provide path to libspirv library via "
"-fsycl-libspirv-path, or pass -fno-sycl-libspirv to build without linking with libspirv.">;
def err_drv_mix_cuda_hip : Error<"Mixed Cuda and HIP compilation is not supported.">;
def err_drv_invalid_thread_model_for_target : Error<
"invalid thread model '%0' in '%1' for this target">;
Expand Down
2 changes: 1 addition & 1 deletion clang/include/clang/Basic/DiagnosticIDs.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ namespace clang {
// Size of each of the diagnostic categories.
enum {
DIAG_SIZE_COMMON = 300,
DIAG_SIZE_DRIVER = 250, // 200 -> 250 for SYCL related diagnostics
DIAG_SIZE_DRIVER = 210,
DIAG_SIZE_FRONTEND = 150,
DIAG_SIZE_SERIALIZATION = 120,
DIAG_SIZE_LEX = 400,
Expand Down
3 changes: 3 additions & 0 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -1872,6 +1872,9 @@ def fsycl_help_EQ : Joined<["-"], "fsycl-help=">,
def fsycl_help : Flag<["-"], "fsycl-help">, Alias<fsycl_help_EQ>,
Flags<[DriverOption, CoreOption]>, AliasArgs<["all"]>, HelpText<"Emit help information "
"from all of the offline compilation tools">;
def fsycl_libspirv_path_EQ : Joined<["-"], "fsycl-libspirv-path=">,
Flags<[CC1Option, CoreOption]>, HelpText<"Path to libspirv library">;
def fno_sycl_libspirv : Flag<["-"], "fno-sycl-libspirv">, HelpText<"Disable check for libspirv">;
def fsyntax_only : Flag<["-"], "fsyntax-only">,
Flags<[DriverOption,CoreOption,CC1Option]>, Group<Action_Group>;
def ftabstop_EQ : Joined<["-"], "ftabstop=">, Group<f_Group>;
Expand Down
3 changes: 2 additions & 1 deletion clang/lib/Basic/Targets/NVPTX.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
.Default(32);
}

TLSSupported = false;
// FIXME: Needed for compiling SYCL to PTX.
TLSSupported = Triple.getEnvironment() == llvm::Triple::SYCLDevice;
VLASupported = false;
AddrSpaceMap = &NVPTXAddrSpaceMap;
UseAddrSpaceMapMangling = true;
Expand Down
6 changes: 6 additions & 0 deletions clang/lib/Basic/Targets/NVPTX.h
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,12 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public TargetInfo {
Opts.support("cl_khr_global_int32_extended_atomics");
Opts.support("cl_khr_local_int32_base_atomics");
Opts.support("cl_khr_local_int32_extended_atomics");
// PTX actually supports 64 bits operations even if the Nvidia OpenCL
// runtime does not report support for it.
// This is required for libclc to compile 64 bits atomic functions.
// FIXME: maybe we should have a way to control this ?
Opts.support("cl_khr_int64_base_atomics");
Opts.support("cl_khr_int64_extended_atomics");
}

/// \returns If a target requires an address within a target specific address
Expand Down
3 changes: 0 additions & 3 deletions clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -842,9 +842,6 @@ void EmitAssemblyHelper::EmitAssembly(BackendAction Action,
PerFunctionPasses.add(
createTargetTransformInfoWrapperPass(getTargetIRAnalysis()));

if (LangOpts.SYCLIsDevice)
PerFunctionPasses.add(createSYCLLowerWGScopePass());

CreatePasses(PerModulePasses, PerFunctionPasses);

legacy::PassManager CodeGenPasses;
Expand Down
6 changes: 6 additions & 0 deletions clang/lib/CodeGen/CGCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -755,6 +755,12 @@ CodeGenTypes::arrangeLLVMFunctionInfo(CanQualType resultType,
return *FI;

unsigned CC = ClangCallConvToLLVMCallConv(info.getCC());
// This is required so SYCL kernels are successfully processed by tools from CUDA. Kernels
// with a `spir_kernel` calling convention are ignored otherwise.
if (CC == llvm::CallingConv::SPIR_KERNEL && CGM.getTriple().isNVPTX() &&
getContext().getLangOpts().SYCLIsDevice) {
CC = llvm::CallingConv::C;
}

// Construct the function info. We co-allocate the ArgInfos.
FI = CGFunctionInfo::create(CC, instanceMethod, chainCall, info,
Expand Down
13 changes: 13 additions & 0 deletions clang/lib/CodeGen/CodeGenAction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "CodeGenModule.h"
#include "CoverageMappingGen.h"
#include "MacroPPCallbacks.h"
#include "SYCLLowerIR/LowerWGScope.h"
#include "clang/AST/ASTConsumer.h"
#include "clang/AST/ASTContext.h"
#include "clang/AST/DeclCXX.h"
Expand All @@ -33,6 +34,7 @@
#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/LLVMRemarkStreamer.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Module.h"
#include "llvm/IRReader/IRReader.h"
#include "llvm/Linker/Linker.h"
Expand Down Expand Up @@ -326,6 +328,17 @@ namespace clang {
CodeGenOpts.getProfileUse() != CodeGenOptions::ProfileNone)
Ctx.setDiagnosticsHotnessRequested(true);

// The parallel_for_work_group legalization pass can emit calls to
// builtins function. Definitions of those builtins can be provided in
// LinkModule. We force the pass to legalize the code before the link
// happens.
if (LangOpts.SYCLIsDevice) {
PrettyStackTraceString CrashInfo("Pre-linking SYCL passes");
legacy::PassManager PreLinkingSyclPasses;
PreLinkingSyclPasses.add(createSYCLLowerWGScopePass());
PreLinkingSyclPasses.run(*getModule());
}

// Link each LinkModule into our module.
if (LinkInModules())
return;
Expand Down
2 changes: 2 additions & 0 deletions clang/lib/CodeGen/CodeGenModule.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,8 @@ void CodeGenModule::createSYCLRuntime() {
switch (getTriple().getArch()) {
case llvm::Triple::spir:
case llvm::Triple::spir64:
case llvm::Triple::nvptx:
case llvm::Triple::nvptx64:
SYCLRuntime.reset(new CGSYCLRuntime(*this));
break;
default:
Expand Down
Loading

0 comments on commit 7a9a425

Please sign in to comment.