[SYCL] Always let the backend choose the binary #1587

hiaselhans · 2020-04-25T09:10:17Z

There is still a little issue that in nvptx binaries entries are not distributed.
When compiling with -fsycl-targets=nvptx64-nvidia-opencl-sycldevice,spir64-unknown-opencl-sycldevice programm_manager is only providing the spir64 binary which has the proper kernel name in entries. that's why the length of binaries is 1 and sycl directly continues to execute with the wrong binary and the error thrown at a later point is misleading:

terminate called after throwing an instance of 'cl::sycl::feature_not_supported'
  what():  Online compilation is not supported in this context -59 (CL_INVALID_OPERATION)

The overhead by always having the backend check for valid binary is not too big in my eyes and the error code becomes that:

terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  OpenCL API failed. OpenCL API returns: -42 (CL_INVALID_BINARY) -42 (CL_INVALID_BINARY)

While the issue with entry names remains (i will look into it but not sure where to start), i guess this also happened in a case where binaries are completely missing for the selected device backend.

When there is only one binary available the backend should still choose the binary to avoid misleading cl_error_codes Signed-off-by: hiaselhans <[email protected]>

hiaselhans · 2020-04-25T09:19:45Z

see #1588 for the related binary-entries-table issue

Ruyk · 2020-04-27T09:36:59Z

Just to understand this, the intention of the patch is to provide a better error message? or are you going to add more later on?

hiaselhans · 2020-04-27T09:44:06Z

the only intention of this PR is to prevent sending wrong binaries to the backend.

the conditional exception allowed to enter a path that should not be allowed (continuing with a non-matching backend/binary combination).

kbobrovs

LGTM

kbobrovs · 2020-04-28T04:02:48Z

i will look into it but not sure where to start

sycl-post-link tool generates the symbol table - use -symbols switch.
As PTX does not support spec constants, post-link should generate 2-column table - first column lists binaries, second - corresponding symbol files. See Driver.cpp/appendLinkDependences code after the picture illustrating the action graph. To support symbol table, post-link should generate TY_Tempfiletable in case of PTX too. Use file-table-tform to process filenames in the table, similar to non-PTX path

hiaselhans · 2020-04-28T12:13:52Z

So, with this PR there is a failing test: kernel_from_file

>>> ProgramManager::getDeviceImage(-1, "0", 0x89d880)
available device images:
  ++++++ Kernel set: 0
  --- Image 0xd934c0
    Version  : 1
    Kind     : 4
    Format   : 2
    Target   : <unknown>
    Bin size : 13532
    Compile options : 
    Link options    : 
    Entries  : 
    Properties [0-0]:
    OSModuleHandle=-2
    DYNAMICALLY CREATED

programm_manager fails to select the image because target is set to unknown via DynRTDeviceBinaryImage

for now DynRTDeviceBinaryImage is only used within the UseSpvEnv so we could hardcode the target to spir64?

Signed-off-by: hiaselhans <[email protected]>

kbobrovs · 2020-04-28T17:34:45Z

for now DynRTDeviceBinaryImage is only used within the UseSpvEnv so we could hardcode the target to spir64?

The SPIRV format should have been determined byBin->Format = pi::getBinaryImageFormat(Bin->BinaryStart, DataSize); in the constructor. I think this should be investigated/fixed rather than hardcoding SPIRV.

kbobrovs · 2020-04-28T17:42:53Z

OK, the problem is in target, not in the format. So I'd suggest then to initialize Bin->target with SPIRV64 depending on Bin->format.

hiaselhans · 2020-04-28T17:46:20Z

OK, the problem is in target, not in the format. So I'd suggest then to initialize Bin->target with SPIRV64 depending on Bin->format.

yep, i just found that. so i will set target with a switch statement checking format:

pi.h:

static constexpr pi_device_binary_type PI_DEVICE_BINARY_TYPE_NONE = 0;
// specific to a device
static constexpr pi_device_binary_type PI_DEVICE_BINARY_TYPE_NATIVE = 1;
// portable binary types go next
// SPIR-V
static constexpr pi_device_binary_type PI_DEVICE_BINARY_TYPE_SPIRV = 2;
// LLVM bitcode
static constexpr pi_device_binary_type PI_DEVICE_BINARY_TYPE_LLVMIR_BITCODE = 3;

kbobrovs · 2020-04-28T17:55:45Z

yes, sounds good

Signed-off-by: hiaselhans <[email protected]>

kbobrovs

LGTM

hiaselhans · 2020-04-28T20:05:37Z

Did that, it still feels a little hard-coded but it's a bit of an edge case anyways.

thx @kbobrovs !

Signed-off-by: hiaselhans <[email protected]>

Ruyk

Just few comments, I haven't tried with the CUDA backend, does this patch solves the problem?

Ruyk · 2020-04-29T08:49:51Z

sycl/source/detail/program_manager/program_manager.cpp

    std::cerr << ">>> ProgramManager::getDeviceImage(" << M << ", \"" << KSId
              << "\", " << getRawSyclObjImpl(Context) << ")\n";
+
+    std::cerr << "available device images:\n";


Could help when filtering the output...

Suggested change

std::cerr << "available device images:\n";

std::cerr << "ProgramManager: Available device images:\n";

actually i just moved this part up because it might be unreachable at it's old place.

If i were to improve those messages i would query for PI_TRACE at runtime instead? what do you think?

That would be even better

@kbobrovs should i replace all DbgProgMgr checks with pi_trace in programmanager?

@hiaselhans , sorry for delay. Yes, definitely makes sense. Can be done as a separate PR.

Ruyk · 2020-04-29T08:51:07Z

sycl/source/detail/program_manager/program_manager.cpp

+    break;
+  default:
+    Bin->DeviceTargetSpec = PI_DEVICE_BINARY_TARGET_UNKNOWN;
+  }


Does this work with the CUDA backend when there are multiple binaries?

i think you can only load one DynRTDeviceBinaryImage at a time using the SYCL_USE_KERNEL_SPV env-variable. this just sets the devicetarget to spir64 in case it is a spir-file so the opencl backend can recognize it.

hiaselhans · 2020-04-29T09:21:19Z

Just few comments, I haven't tried with the CUDA backend, does this patch solves the problem?

It doesn't yet solve the issue quoted, but it does help when someone compiles only nvptx64 binaries and uses an opencl device and vice-versa. The if-clause prevented the backend from checking the image type before executing it.

Ruyk · 2020-04-29T09:27:43Z

#1543 rejects OpenCL CUDA platform on DPC++, so should help as well.

hiaselhans · 2020-04-29T09:45:00Z

from looking at #1543 i do like the fact that there's now always getBackend() == backend::opencl
makes a lot of sense to me! :)

however there's still this: isDeviceBinaryTypeSupported

llvm/sycl/source/detail/program_manager/program_manager.cpp

Line 259 in ec0846c

static bool isDeviceBinaryTypeSupported(const context &C,

so i wonder if we could have a PI call piDeviceSupported(pi_device *device, bool *support) and reject all those opencl 1.2 devices altogether?

Ruyk · 2020-05-01T09:38:22Z

I don't think we should ban OpenCL 1.2 devices, since lots of them could work (specially if they expose cl_khr_il_program). This is checked below in

llvm/sycl/source/detail/program_manager/program_manager.cpp

Lines 279 to 290 in ec0846c

    
             for (const device &D : Devices) { 
        
               // We need cl_khr_il_program extension to be present 
        
               // and we can call clCreateProgramWithILKHR using the extension 
        
               vector_class<string_class> Extensions = 
        
                   D.get_info<info::device::extensions>(); 
        
               if (Extensions.end() == 
        
                   std::find(Extensions.begin(), Extensions.end(), "cl_khr_il_program")) 
        
                 return false; 
        
             } 
        
             return true; 
        
           }

It is also, IMHO, not a problem of the PI API to decide if a device is valid for a plugin, one of the reason is because the pi_device object may come from a different object to the plugin you are calling the function and they are not necessarily compatible (there is no guarantee on PI API that PI types are compatible at all across implementations).
Since a pi_device is returned from querying a pi_platform , by definition, the pi_device should be valid.

In terms of CUDA, note the PiDeviceBinaryType check:

llvm/sycl/source/detail/program_manager/program_manager.cpp

Lines 262 to 263 in ec0846c

    
           if (Format != PI_DEVICE_BINARY_TYPE_SPIRV) 
        
             return true;

,

Currently, a PTX binary is appearing as NATIVE, so this function itself is not used,

llvm/sycl/source/detail/program_manager/program_manager.cpp

Lines 335 to 338 in ec0846c

    
           // assert(Format != PI_DEVICE_BINARY_TYPE_NONE && "Image format not set"); 
        
           if (!isDeviceBinaryTypeSupported(Context, Format)) 
        
             throw feature_not_supported(

The code is handled as a Binary blob loaded, like any other native binary format:

llvm/sycl/source/detail/program_manager/program_manager.cpp

Lines 345 to 348 in ec0846c

    
           Format == PI_DEVICE_BINARY_TYPE_SPIRV 
        
               ? createSpirvProgram(Ctx, RawImg.BinaryStart, ImgSize) 
        
               : createBinaryProgram(Ctx, RawImg.BinaryStart, ImgSize);

FYI, definition of Binary types:

llvm/sycl/include/CL/sycl/detail/pi.h

Lines 579 to 587 in ec0846c

    
           // format is not determined 
        
           static constexpr pi_device_binary_type PI_DEVICE_BINARY_TYPE_NONE = 0; 
        
           // specific to a device 
        
           static constexpr pi_device_binary_type PI_DEVICE_BINARY_TYPE_NATIVE = 1; 
        
           // portable binary types go next 
        
           // SPIR-V 
        
           static constexpr pi_device_binary_type PI_DEVICE_BINARY_TYPE_SPIRV = 2; 
        
           // LLVM bitcode 
        
           static constexpr pi_device_binary_type PI_DEVICE_BINARY_TYPE_LLVMIR_BITCODE = 3;

hiaselhans · 2020-05-01T10:32:55Z

thx @Ruyk

Ok, i see. I was wrong in a way that isDeviceBinaryTypeSupported does return true on opencl >= 2.1 and not false <=. Wrong assumption from my side, but in a way i just meant if we could use that logic there to filter unsupported opencl devices.

So let me rephrase my point:

The isDeviceBinaryTypeSuported function does only check for format=PI_DEVICE_BINARY_TYPE_SPIRV if a device supports sprir-v.
It is the opencl backend to deliver these pi_devices, so why shouldn't it also be responsible to check if that device is a "good" opencl device? in a way it could also be a bool isSupported or similar stored with pi_device which can be checked in device_selector.

Ruyk · 2020-05-01T13:56:08Z

It is the opencl backend to deliver these pi_devices, so why shouldn't it also be responsible to check if that device is a "good" opencl device? in a way it could also be a bool isSupported or similar stored with pi_device which can be checked in device_selector.

Any OpenCL device is a "good" opencl device, it is only the SYCL RT that can use or not a certain OpenCL (or PI device). PI Plugin should provide devices and platforms to the SYCL RT, decisions on what is valid is something that should be on the SYCL RT layers.

hiaselhans · 2020-05-01T18:12:41Z

i think this discussion should be in #1543 because nothing of it is actually addressed here, in this pr.

in other words: If all devices are "good" devices, why do we need special treatment for the cuda opencl device? why not reusing the logic from isDeviceBinaryTypeSupported? and as that logic is quite opencl specific, why not moving it to the opencl backend?

forgive my naive questions but it just seems so much more straightforward to me...

bader · 2020-05-09T13:12:24Z

@hiaselhans, sorry for the delay.
It looks like there is a conflict with c22e34b. Could you resolve it, please?

hiaselhans · 2020-05-10T08:19:48Z

@bader done :)

bader · 2020-05-12T09:46:59Z

@intel/llvm-reviewers-runtime, ping.

Always let the backend choose the binary

bee1066

When there is only one binary available the backend should still choose the binary to avoid misleading cl_error_codes Signed-off-by: hiaselhans <[email protected]>

hiaselhans requested a review from kbobrovs as a code owner April 25, 2020 09:10

Ruyk self-requested a review April 27, 2020 09:37

kbobrovs previously approved these changes Apr 28, 2020

View reviewed changes

set image type to spir64 for DynRTDeviceBinaryImage

b7d6c64

Signed-off-by: hiaselhans <[email protected]>

hiaselhans dismissed kbobrovs’s stale review via b7d6c64 April 28, 2020 13:00

fix clang-format

86eb1e9

Signed-off-by: hiaselhans <[email protected]>

hiaselhans requested a review from kbobrovs April 28, 2020 17:28

set BinaryImage's DeviceTargetSpec based on Format

a18ddcd

Signed-off-by: hiaselhans <[email protected]>

kbobrovs previously approved these changes Apr 28, 2020

View reviewed changes

hiaselhans dismissed kbobrovs’s stale review via 5675eb9 April 28, 2020 20:10

fix clang-format

4ef3f91

Signed-off-by: hiaselhans <[email protected]>

hiaselhans force-pushed the always_select_binary branch from 5675eb9 to 4ef3f91 Compare April 28, 2020 20:11

another clang-format fix

3627035

Signed-off-by: hiaselhans <[email protected]>

hiaselhans requested a review from kbobrovs April 28, 2020 20:18

kbobrovs previously approved these changes Apr 28, 2020

View reviewed changes

Ruyk reviewed Apr 29, 2020

View reviewed changes

hiaselhans changed the title ~~Always let the backend choose the binary~~ [SYCL] Always let the backend choose the binary May 1, 2020

Merge branch 'sycl' into always_select_binary

b3ebb95

hiaselhans dismissed kbobrovs’s stale review via b3ebb95 May 10, 2020 08:17

hiaselhans requested a review from a team as a code owner May 10, 2020 08:17

hiaselhans requested a review from vladimirlaz May 10, 2020 08:17

bader approved these changes May 10, 2020

View reviewed changes

bader requested a review from kbobrovs May 10, 2020 09:16

kbobrovs approved these changes May 12, 2020

View reviewed changes

Ruyk approved these changes May 12, 2020

View reviewed changes

v-klochkov approved these changes May 12, 2020

View reviewed changes

bader merged commit 6233c68 into intel:sycl May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Always let the backend choose the binary #1587

[SYCL] Always let the backend choose the binary #1587

hiaselhans commented Apr 25, 2020 •

edited

Loading

hiaselhans commented Apr 25, 2020

Ruyk commented Apr 27, 2020

hiaselhans commented Apr 27, 2020

kbobrovs left a comment

kbobrovs commented Apr 28, 2020

hiaselhans commented Apr 28, 2020

kbobrovs commented Apr 28, 2020

kbobrovs commented Apr 28, 2020

hiaselhans commented Apr 28, 2020

kbobrovs commented Apr 28, 2020

kbobrovs left a comment

hiaselhans commented Apr 28, 2020 •

edited

Loading

Ruyk left a comment

Ruyk Apr 29, 2020

hiaselhans Apr 29, 2020

Ruyk May 1, 2020

hiaselhans May 1, 2020 •

edited

Loading

kbobrovs May 12, 2020

Ruyk Apr 29, 2020

hiaselhans Apr 29, 2020

hiaselhans commented Apr 29, 2020

Ruyk commented Apr 29, 2020

hiaselhans commented Apr 29, 2020

Ruyk commented May 1, 2020

hiaselhans commented May 1, 2020

Ruyk commented May 1, 2020

hiaselhans commented May 1, 2020 •

edited

Loading

bader commented May 9, 2020

hiaselhans commented May 10, 2020

bader commented May 12, 2020

	std::cerr << "available device images:\n";
	std::cerr << "ProgramManager: Available device images:\n";

[SYCL] Always let the backend choose the binary #1587

[SYCL] Always let the backend choose the binary #1587

Conversation

hiaselhans commented Apr 25, 2020 • edited Loading

hiaselhans commented Apr 25, 2020

Ruyk commented Apr 27, 2020

hiaselhans commented Apr 27, 2020

kbobrovs left a comment

Choose a reason for hiding this comment

kbobrovs commented Apr 28, 2020

hiaselhans commented Apr 28, 2020

kbobrovs commented Apr 28, 2020

kbobrovs commented Apr 28, 2020

hiaselhans commented Apr 28, 2020

kbobrovs commented Apr 28, 2020

kbobrovs left a comment

Choose a reason for hiding this comment

hiaselhans commented Apr 28, 2020 • edited Loading

Ruyk left a comment

Choose a reason for hiding this comment

Ruyk Apr 29, 2020

Choose a reason for hiding this comment

hiaselhans Apr 29, 2020

Choose a reason for hiding this comment

Ruyk May 1, 2020

Choose a reason for hiding this comment

hiaselhans May 1, 2020 • edited Loading

Choose a reason for hiding this comment

kbobrovs May 12, 2020

Choose a reason for hiding this comment

Ruyk Apr 29, 2020

Choose a reason for hiding this comment

hiaselhans Apr 29, 2020

Choose a reason for hiding this comment

hiaselhans commented Apr 29, 2020

Ruyk commented Apr 29, 2020

hiaselhans commented Apr 29, 2020

Ruyk commented May 1, 2020

hiaselhans commented May 1, 2020

Ruyk commented May 1, 2020

hiaselhans commented May 1, 2020 • edited Loading

bader commented May 9, 2020

hiaselhans commented May 10, 2020

bader commented May 12, 2020

hiaselhans commented Apr 25, 2020 •

edited

Loading

hiaselhans commented Apr 28, 2020 •

edited

Loading

hiaselhans May 1, 2020 •

edited

Loading

hiaselhans commented May 1, 2020 •

edited

Loading