-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][CUDA] Minor fixes required to run BabelStream benchmarks on CUDA #1543
Conversation
@@ -101,7 +101,8 @@ static RT::PiProgram createBinaryProgram(const ContextImplPtr Context, | |||
Plugin.call<PiApiKind::piPlatformGetInfo>(Platform, PI_PLATFORM_INFO_NAME, | |||
PlatformName.size(), PlatformName.data(), nullptr); | |||
if (PlatformNameSize > 0u && | |||
std::strncmp(PlatformName.data(), "NVIDIA CUDA", PlatformNameSize) == 0) { | |||
std::strncmp(PlatformName.data(), "NVIDIA CUDA BACKEND", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 occurrence of this literal string, can you turn it into a macro at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the lines 91-108 become a run-time check that the active selected BE (Plugin.getBackend()) is CUDA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 occurrence of this literal string, can you turn it into a macro at least?
Yes, but I am unsure where to set that macro that is shared between the SYCL RT and the PI CUDA plugin, but doesnt create a dependency on the SYCL RT to a header in the CUDA plugin.
Maybe I can just add it to the common pi header as a CUDA extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the lines 91-108 become a run-time check that the active selected BE (Plugin.getBackend()) is CUDA?
Yes, but that should be on a separate patch I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that should be on a separate patch I think
Why? If you do so then you don't need many of the changes you are doing in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to use the is_cuda
check from platform. Plugin.getBackend() is not yet on sycl branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getBackend() is in now, please use it instead
sycl/source/device_selector.cpp
Outdated
// If using PI_CUDA, don't accept a non-CUDA device | ||
if (platformVersion.find("CUDA") == std::string::npos && | ||
backend == "PI_CUDA") { | ||
if (HasCudaString && HasOpenCLString && backend == "PI_CUDA") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire code fragment (lines 37-58) is in conflict with #1522, where SYCL_BE is used as the preferred not as the forced one. For example, if the target device type is CPU, which CUDA PI plugin does not support and users specified SYCL_BE=PI_CUDA , then with this code no device will be found and an exception thrown, while in that code it will use the plugin where the CPU is supported, i.e. OpenCL. I think both semantics are useful and we should probably fork SYCL_BE into SYCL_PI_FORCE & SYCL_PI_PREFER.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that line 51 does is that, if a platform has both the CUDA string and the OpenCL string, it is rejected if , and only if, the desired backend is PI_CUDA
. This prevents the selection of the NVIDIA OpenCL platform when the user wants to use the PI CUDA backend. Everything else should work in the same way. I don't think there is a need to fork the env. variables at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've rebased this on top of the mentioned patches so it uses the get plugin and the backend
With the map/unmap fix you should see the following LIT test XPASS (which fails LIT): sycl/test/scheduler/DataMovement.cpp To make it pass, please remove the |
1d08181
to
8e8fa6f
Compare
Signed-off-by: Ruyman Reyes <[email protected]>
Signed-off-by: Ruyman Reyes <[email protected]>
Implemented a comparison operator for the plugin class in SYCL RT: Two plugins are equal if the pointer to their string is the same. plugin constructor marked explicit to avoid accidental implicit conversions. Signed-off-by: Ruyman Reyes <[email protected]>
8e8fa6f
to
16c2b65
Compare
Constructor of the SYCL queue throws an exception if the device passed in is from a different backend than the context that is associated with the queue. Signed-off-by: Ruyman Reyes <[email protected]>
16c2b65
to
195414f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please do not force push next time, use merge.
Signed-off-by: Ruyman Reyes <[email protected]>
Fixes previous incorrect usage of version Signed-off-by: Ruyman Reyes <[email protected]>
Signed-off-by: Ruyman Reyes <[email protected]>
sycl/source/detail/plugin.hpp
Outdated
/// \ingroup sycl_pi | ||
/// | ||
inline bool operator==(const plugin &lhs, const plugin &rhs) { | ||
return (lhs.getPiPlugin().PluginVersion == rhs.getPiPlugin().PluginVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just check the MBackend (getBackend) of the plugins are the same.
@@ -101,7 +101,8 @@ static RT::PiProgram createBinaryProgram(const ContextImplPtr Context, | |||
Plugin.call<PiApiKind::piPlatformGetInfo>(Platform, PI_PLATFORM_INFO_NAME, | |||
PlatformName.size(), PlatformName.data(), nullptr); | |||
if (PlatformNameSize > 0u && | |||
std::strncmp(PlatformName.data(), "NVIDIA CUDA", PlatformNameSize) == 0) { | |||
std::strncmp(PlatformName.data(), "NVIDIA CUDA BACKEND", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getBackend() is in now, please use it instead
The NVIDIA OpenCL platform is problematic for NVIDIA CUDA backend users and for overall DPCPP users, since it doesnt work straight away and is typically selected on OpenCL backend as a preference. This patch removes the CUDA OpenCL platform from the device selection, and prevents it from being used in the lit testing. Signed-off-by: Ruyman Reyes <[email protected]>
Defines two plugins being equal if their backend types are the same. Signed-off-by: Ruyman Reyes <[email protected]>
sycl/source/device_selector.cpp
Outdated
"Type is not the same"); | ||
|
||
// If no preference, assume OpenCL and reject CUDA backend | ||
if (BackendType == backend::cuda && !BackendPref) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this always disqualifies cuda devices when SYCL_BE
is not set, right...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, OpenCL is the preferred path...
sycl/source/device_selector.cpp
Outdated
} else if (!BackendPref) | ||
return false; | ||
|
||
// If using PI_CUDA, don't accept a non-CUDA device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't we just do:
if (BackendType != *BackendPref)
return true;
CUDA OpenCL platform is ignored on get_devices as suggested by feedback Signed-off-by: Ruyman Reyes <[email protected]>
Signed-off-by: Ruyman Reyes <[email protected]>
@Ruyk i tried your patch and i think there hasn't yet been a clear decision on how it's supposed to work.
just my 2cts but i'm opinionated on this and need to yell it. sorry :) |
The NVIDIA OpenCL platform is unusable with DPC++. We put some effort on our side to make it work initially in DPC++ and also in ComputeCpp. The reality is it creates more problem than it solves, specially if you have a proper CUDA backend. The moment the NVIDIA OpenCL platform is available on the system (e.g., via the ICD loader), it becomes the default device selected in most implementations (its a GPU so score is higher than CPU and host). However, the default options for compilation won't work: you need to pass the specific triple for the OpenCL PTX variant.. Even with that triple passed in, there are still problems in the compiler side and the code generated is not always correct for SYCL. Our current thinking is that, unless someone puts effort on supporting the OpenCL NVIDIA platform on DPC++, is better to disable it all together and simplify using SYCL applications on CUDA as much as we can.
That is still possible. Selecting the PI_CUDA backend doesnt prevent other devices running in the OpenCL one, it only ensures the selection of the CUDA backend is possible. Its a preferred backend when available. Hopefully once the CUDA backend is no longer experimental and we have a more clear interface for multiple backends in the next SYCL specification, the environment variable will not be necessary.
There are other OpenCL 1.2 implementations that support DPC++, POCL or ComputeAorta (Codeplay's implementation) are some of them. They can consume SPIRV and don't create conflicts on the system. When deploying either of the two, they don't pollute the include or library paths forcing them to be defaulted. I could be convinced to filter for SPIR-V support, but I think that could be problematic for OpenCL platforms that work with AOT compilation, since they may not report SPIR-V and expect only AOT.
No need to be sorry! its good to have an open discussion about this :-) |
Actually you are correct, something went weird on my latest merge, this line https://github.com/intel/llvm/pull/1543/files#diff-8822f406c48f60f8a03ef19ad896145fR207 is not used, and this https://github.com/intel/llvm/pull/1543/files#diff-8822f406c48f60f8a03ef19ad896145fR213 shouldn't be there... I'll fix it |
* Removes NVIDIA OpenCL from the available list of platforms * CUDA backend is available only if SYCL_BE=PI_CUDA is set Signed-off-by: Ruyman Reyes <[email protected]>
b7b7b98
to
ad3951c
Compare
I've fixed the bad merge, code is a bit cleaner now. |
I'm at
I have no problem with not supporting nvidia's opencl implementation. I guess it's a mess, but so are other vendor's implementations. amdgpu-pro is stuck with 1.x aswell.
From reading the code i was pretty sure that's not the case. Especially with what's after line 220:
i hope so too :)
Right now there's this code:
This seems to raise when the device only supports opencl < 2.1 Does it work in it's current state with POCL or ComputeAorta? I think after all it should be the backend to decide if a device is supported by it or not. It's just my personal preference on how to treat this env variable but it's probably a quite fundamental decision that has to be taken at some point...? |
Interesting discussion. |
I've seen that mechanism, but what would be needed is the opposite, a "SYCL_DEVICE_DISABLELIST" where you can add the NVIDIA OpenCL backend so its not visible on device selection. The simplest option for end users is to disable the NVIDIA OpenCL, and prefer the CUDA backend. I think a separate discussion is whether if we enable the CUDA backend by default or we need the SYCL_BE flag passed to enable it.
The patch should still enable OpenCL devices even with SYCL_BE=PI_CUDA, but i'll run some more tests to make sure its the case. There may be logic elsewhere in the SYCL RT that uses the SYCL_BE that I have missed. One of the reasons I lean towards having people "activate" the CUDA backend via env variable is to ensure no accidental usage occurs. We have various reports of people using default selection that get "Invalid binary" or other errors because their CUDA backend is selected above other devices in the system. Unless we build always for all binary targets, or there is a clear SYCL API to select backends, users will have to modify their SYCL 1.2.1 codes with custom selectors to avoid choosing the CUDA backend by accident. An example of this problem is the MultiDevice lit test (https://github.com/intel/llvm/blob/sycl/sycl/test/scheduler/MultipleDevices.cpp). Furthermore the test uses standard SYCL selectors, llvm/sycl/test/scheduler/MultipleDevices.cpp Lines 95 to 97 in 5e21b71
Those selectors are perfectly valid selectors, but the GPU selector will select the CUDA backend by default. So that is a perfectly valid SYCL code that once the CUDA backend is enabled by default, it fails with an "invalid device type". I see a couple of options moving forward (all of them assuming we ban the NVIDIA OpenCL platform):
I see good and bad points on all of them. I think we need opinions of a few people (@bader , @keryell , @bjoernknafla , @romanovvlad , @pvchupin to list a few) before we make a decision on what the default experience for DPC++ (or clang sycl) should be. |
Agree. We need a separate discussion on what is default logic for default device selection and how it can be configured by user (e.g. define @hiaselhans, @smaslov-intel, are you okay to commit this PR and address your questions separately, if they are not addressed already? @Ruyk, it's better to send multiple fixes in separate PRs. Long discussion for a single item blocks merging the whole PR i.e. N - 1 of N fixes. |
i think i am the last to actually decide on such a matter. I just wanted to point out that this PR actually DOES change the behaviour by introducing that In the upstream/sycl branch setting With an intel opencl device on upstream/sycl:
before this pr:
after this pr:
So we remove the posibility to have more than one backend active by requiring to actively enable the cuda plugin (which deactivates opencl).
|
const bool HasOpenCL = (platformVersion.find("OpenCL") != std::string::npos); | ||
const bool HasCUDA = (platformVersion.find("CUDA") != std::string::npos); | ||
|
||
backend *PrefBackend = detail::SYCLConfig<detail::SYCL_BE>::get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend *PrefBackend = detail::SYCLConfig<detail::SYCL_BE>::get(); |
Let's move this like after rejecting NVIDIA OpenCL impementation.
// Reject the NVIDIA OpenCL implementation | ||
if (DeviceBackend == backend::opencl && HasCUDA && HasOpenCL) | ||
return true; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend *PrefBackend = detail::SYCLConfig<detail::SYCL_BE>::get(); |
// If no preference, assume OpenCL and reject CUDA | ||
if (DeviceBackend == backend::cuda && !PrefBackend) { | ||
return true; | ||
} else if (!PrefBackend) | ||
return false; | ||
|
||
// If using PI_OPENCL, reject the CUDA backend | ||
if (DeviceBackend == backend::cuda && *PrefBackend == backend::opencl) | ||
return true; | ||
|
||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If no preference, assume OpenCL and reject CUDA | |
if (DeviceBackend == backend::cuda && !PrefBackend) { | |
return true; | |
} else if (!PrefBackend) | |
return false; | |
// If using PI_OPENCL, reject the CUDA backend | |
if (DeviceBackend == backend::cuda && *PrefBackend == backend::opencl) | |
return true; | |
return false; | |
} | |
// If no preference, assume OpenCL and reject CUDA | |
if (DeviceBackend == backend::cuda && (!PrefBackend || *PrefBackend == backend::opencl)) | |
return true; | |
return false; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor fixes to CUDA backend to get BabelStream (https://github.com/UoB-HPC/BabelStream), each one on individual commits:
- Fixed: default selection for CUDA devices in presence of the NVIDIA OpenCL platform
- Fixed: Missing return event on Unmap
- Throw error in queue constructor when backend is not correct (currently segfaults)
From the PR description, this piece of code seems to be "not required to run BabelStream benchmarks on CUDA" and removing NVIDIA OpenCL implementation is enough. Right?
If so, @Ruyk, can we move this part to a separate PR to make @hiaselhans happy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i'll split out the changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not required to run BabelStream benchmarks on CUDA" and removing NVIDIA OpenCL > implementation is enough. Right?
Actually, babel stream doesnt use a device selector but a device list, so the NVIDIA OpenCL still appears. There are other workarounds I can do so not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, @Ruyk, can we move this part to a separate PR to make @hiaselhans happy?
thanks @bader 😄
well, maybe it's me being a dickhead with this.
Line 54 in cf65794
// If SYCL_BE is set then skip platforms which doesn't have specified |
just saying we're re-implementing things already implemented somewhere else...
I am closing this one and submitting separate ones shortly. |
Move the LLVM components to LINK_COMPONENTS because the DEPENDS list has the same semantics as add_dependencies(). In this case it doesn't include the LLVM components when calling the linker. It's almost complete revert of #1543 Original commit: KhronosGroup/SPIRV-LLVM-Translator@cf5a5a4
Minor fixes to CUDA backend to get BabelStream (https://github.com/UoB-HPC/BabelStream), each one on individual commits: