-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Always let the backend choose the binary #1587
Conversation
When there is only one binary available the backend should still choose the binary to avoid misleading cl_error_codes Signed-off-by: hiaselhans <[email protected]>
see #1588 for the related binary-entries-table issue |
Just to understand this, the intention of the patch is to provide a better error message? or are you going to add more later on? |
the only intention of this PR is to prevent sending wrong binaries to the backend. the conditional exception allowed to enter a path that should not be allowed (continuing with a non-matching backend/binary combination). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sycl-post-link tool generates the symbol table - use -symbols switch. |
So, with this PR there is a failing test:
programm_manager fails to select the image because target is set to unknown via DynRTDeviceBinaryImage for now DynRTDeviceBinaryImage is only used within the |
Signed-off-by: hiaselhans <[email protected]>
Signed-off-by: hiaselhans <[email protected]>
The SPIRV format should have been determined by |
OK, the problem is in target, not in the format. So I'd suggest then to initialize Bin->target with SPIRV64 depending on Bin->format. |
yep, i just found that. so i will set target with a switch statement checking format: pi.h:
|
yes, sounds good |
Signed-off-by: hiaselhans <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Did that, it still feels a little hard-coded but it's a bit of an edge case anyways. thx @kbobrovs ! |
Signed-off-by: hiaselhans <[email protected]>
5675eb9
to
4ef3f91
Compare
Signed-off-by: hiaselhans <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just few comments, I haven't tried with the CUDA backend, does this patch solves the problem?
std::cerr << ">>> ProgramManager::getDeviceImage(" << M << ", \"" << KSId | ||
<< "\", " << getRawSyclObjImpl(Context) << ")\n"; | ||
|
||
std::cerr << "available device images:\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could help when filtering the output...
std::cerr << "available device images:\n"; | |
std::cerr << "ProgramManager: Available device images:\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i just moved this part up because it might be unreachable at it's old place.
If i were to improve those messages i would query for PI_TRACE at runtime instead? what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be even better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbobrovs should i replace all DbgProgMgr
checks with pi_trace in programmanager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hiaselhans , sorry for delay. Yes, definitely makes sense. Can be done as a separate PR.
break; | ||
default: | ||
Bin->DeviceTargetSpec = PI_DEVICE_BINARY_TARGET_UNKNOWN; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work with the CUDA backend when there are multiple binaries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you can only load one DynRTDeviceBinaryImage
at a time using the SYCL_USE_KERNEL_SPV
env-variable. this just sets the devicetarget to spir64 in case it is a spir-file so the opencl backend can recognize it.
It doesn't yet solve the issue quoted, but it does help when someone compiles only nvptx64 binaries and uses an opencl device and vice-versa. The if-clause prevented the backend from checking the image type before executing it. |
#1543 rejects OpenCL CUDA platform on DPC++, so should help as well. |
from looking at #1543 i do like the fact that there's now always however there's still this: isDeviceBinaryTypeSupported
so i wonder if we could have a PI call piDeviceSupported(pi_device *device, bool *support) and reject all those opencl 1.2 devices altogether?
|
I don't think we should ban OpenCL 1.2 devices, since lots of them could work (specially if they expose llvm/sycl/source/detail/program_manager/program_manager.cpp Lines 279 to 290 in ec0846c
It is also, IMHO, not a problem of the PI API to decide if a device is valid for a plugin, one of the reason is because the In terms of CUDA, note the llvm/sycl/source/detail/program_manager/program_manager.cpp Lines 262 to 263 in ec0846c
Currently, a PTX binary is appearing as NATIVE, so this function itself is not used, llvm/sycl/source/detail/program_manager/program_manager.cpp Lines 335 to 338 in ec0846c
The code is handled as a Binary blob loaded, like any other native binary format: llvm/sycl/source/detail/program_manager/program_manager.cpp Lines 345 to 348 in ec0846c
FYI, definition of Binary types: llvm/sycl/include/CL/sycl/detail/pi.h Lines 579 to 587 in ec0846c
|
thx @Ruyk Ok, i see. I was wrong in a way that So let me rephrase my point: The isDeviceBinaryTypeSuported function does only check for format=PI_DEVICE_BINARY_TYPE_SPIRV if a device supports sprir-v. |
Any OpenCL device is a "good" opencl device, it is only the SYCL RT that can use or not a certain OpenCL (or PI device). PI Plugin should provide devices and platforms to the SYCL RT, decisions on what is valid is something that should be on the SYCL RT layers. |
i think this discussion should be in #1543 because nothing of it is actually addressed here, in this pr. in other words: If all devices are "good" devices, why do we need special treatment for the cuda opencl device? why not reusing the logic from forgive my naive questions but it just seems so much more straightforward to me... |
@hiaselhans, sorry for the delay. |
@bader done :) |
@intel/llvm-reviewers-runtime, ping. |
There is still a little issue that in nvptx binaries entries are not distributed.
When compiling with
-fsycl-targets=nvptx64-nvidia-opencl-sycldevice,spir64-unknown-opencl-sycldevice
programm_manager is only providing the spir64 binary which has the proper kernel name in entries. that's why the length of binaries is 1 and sycl directly continues to execute with the wrong binary and the error thrown at a later point is misleading:The overhead by always having the backend check for valid binary is not too big in my eyes and the error code becomes that:
While the issue with entry names remains (i will look into it but not sure where to start), i guess this also happened in a case where binaries are completely missing for the selected device backend.