Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][CUDA] Minor fixes required to run BabelStream benchmarks on CUDA #1543

Closed
wants to merge 13 commits into from

Conversation

Ruyk
Copy link
Contributor

@Ruyk Ruyk commented Apr 17, 2020

Minor fixes to CUDA backend to get BabelStream (https://github.com/UoB-HPC/BabelStream), each one on individual commits:

  • Fixed: default selection for CUDA devices in presence of the NVIDIA OpenCL platform
  • Fixed: Missing return event on Unmap
  • Throw error in queue constructor when backend is not correct (currently segfaults)

@Ruyk Ruyk requested review from kbobrovs, smaslov-intel and a team as code owners April 17, 2020 15:20
@Ruyk Ruyk self-assigned this Apr 17, 2020
@Ruyk Ruyk added the cuda CUDA back-end label Apr 17, 2020
sycl/plugins/cuda/pi_cuda.cpp Show resolved Hide resolved
@@ -101,7 +101,8 @@ static RT::PiProgram createBinaryProgram(const ContextImplPtr Context,
Plugin.call<PiApiKind::piPlatformGetInfo>(Platform, PI_PLATFORM_INFO_NAME,
PlatformName.size(), PlatformName.data(), nullptr);
if (PlatformNameSize > 0u &&
std::strncmp(PlatformName.data(), "NVIDIA CUDA", PlatformNameSize) == 0) {
std::strncmp(PlatformName.data(), "NVIDIA CUDA BACKEND",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 occurrence of this literal string, can you turn it into a macro at least?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the lines 91-108 become a run-time check that the active selected BE (Plugin.getBackend()) is CUDA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 occurrence of this literal string, can you turn it into a macro at least?

Yes, but I am unsure where to set that macro that is shared between the SYCL RT and the PI CUDA plugin, but doesnt create a dependency on the SYCL RT to a header in the CUDA plugin.

Maybe I can just add it to the common pi header as a CUDA extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the lines 91-108 become a run-time check that the active selected BE (Plugin.getBackend()) is CUDA?

Yes, but that should be on a separate patch I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that should be on a separate patch I think

Why? If you do so then you don't need many of the changes you are doing in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to use the is_cuda check from platform. Plugin.getBackend() is not yet on sycl branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getBackend() is in now, please use it instead

sycl/source/detail/queue_impl.hpp Outdated Show resolved Hide resolved
// If using PI_CUDA, don't accept a non-CUDA device
if (platformVersion.find("CUDA") == std::string::npos &&
backend == "PI_CUDA") {
if (HasCudaString && HasOpenCLString && backend == "PI_CUDA") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire code fragment (lines 37-58) is in conflict with #1522, where SYCL_BE is used as the preferred not as the forced one. For example, if the target device type is CPU, which CUDA PI plugin does not support and users specified SYCL_BE=PI_CUDA , then with this code no device will be found and an exception thrown, while in that code it will use the plugin where the CPU is supported, i.e. OpenCL. I think both semantics are useful and we should probably fork SYCL_BE into SYCL_PI_FORCE & SYCL_PI_PREFER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that line 51 does is that, if a platform has both the CUDA string and the OpenCL string, it is rejected if , and only if, the desired backend is PI_CUDA. This prevents the selection of the NVIDIA OpenCL platform when the user wants to use the PI CUDA backend. Everything else should work in the same way. I don't think there is a need to fork the env. variables at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misread the code. It is not rejecting the non-CUDA devices under SYCL_BE=PI_CUDA as I initially thought. It is still to be merged properly with the code coming in #1522 (tagging @againull). Also it should be taking platform.getBackend() instead of checking platform name.

Copy link
Contributor Author

@Ruyk Ruyk Apr 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I wait until #1522 and #1490 are merged? I can't use the platform.getBackend() until then.
Alternatively I can use the is_cuda until the other PRs are there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've rebased this on top of the mentioned patches so it uses the get plugin and the backend

@bjoernknafla
Copy link
Contributor

With the map/unmap fix you should see the following LIT test XPASS (which fails LIT): sycl/test/scheduler/DataMovement.cpp

To make it pass, please remove the // XFAIL: cuda and the comment line // TODO: Fix accidential error return when unmapping read-only memory objects. from it.

Implemented a comparison operator for the plugin class in SYCL RT:
Two plugins are equal if the pointer to their string is the same.

plugin constructor marked explicit to avoid accidental implicit
conversions.

Signed-off-by: Ruyman Reyes <[email protected]>
@Ruyk Ruyk force-pushed the fix-default-selector-cuda branch from 8e8fa6f to 16c2b65 Compare April 23, 2020 10:18
Constructor of the SYCL queue throws an exception if the
device passed in is from a different backend than the context
that is associated with the queue.

Signed-off-by: Ruyman Reyes <[email protected]>
@Ruyk Ruyk force-pushed the fix-default-selector-cuda branch from 16c2b65 to 195414f Compare April 23, 2020 11:44
romanovvlad
romanovvlad previously approved these changes Apr 23, 2020
Copy link
Contributor

@romanovvlad romanovvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please do not force push next time, use merge.

Fixes previous incorrect usage of version

Signed-off-by: Ruyman Reyes <[email protected]>
/// \ingroup sycl_pi
///
inline bool operator==(const plugin &lhs, const plugin &rhs) {
return (lhs.getPiPlugin().PluginVersion == rhs.getPiPlugin().PluginVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just check the MBackend (getBackend) of the plugins are the same.

@@ -101,7 +101,8 @@ static RT::PiProgram createBinaryProgram(const ContextImplPtr Context,
Plugin.call<PiApiKind::piPlatformGetInfo>(Platform, PI_PLATFORM_INFO_NAME,
PlatformName.size(), PlatformName.data(), nullptr);
if (PlatformNameSize > 0u &&
std::strncmp(PlatformName.data(), "NVIDIA CUDA", PlatformNameSize) == 0) {
std::strncmp(PlatformName.data(), "NVIDIA CUDA BACKEND",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getBackend() is in now, please use it instead

The NVIDIA OpenCL platform is problematic for NVIDIA CUDA backend users
and for overall DPCPP users, since it doesnt work straight away and is
typically selected on OpenCL backend as a preference.

This patch removes the CUDA OpenCL platform from the device selection,
and prevents it from being used in the lit testing.

Signed-off-by: Ruyman Reyes <[email protected]>
Defines two plugins being equal if their backend types are the same.

Signed-off-by: Ruyman Reyes <[email protected]>
"Type is not the same");

// If no preference, assume OpenCL and reject CUDA backend
if (BackendType == backend::cuda && !BackendPref) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this always disqualifies cuda devices when SYCL_BE is not set, right...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, OpenCL is the preferred path...

} else if (!BackendPref)
return false;

// If using PI_CUDA, don't accept a non-CUDA device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't we just do:

if (BackendType != *BackendPref)
    return true;

CUDA OpenCL platform is ignored on get_devices as suggested by feedback

Signed-off-by: Ruyman Reyes <[email protected]>
@hiaselhans
Copy link
Contributor

@Ruyk i tried your patch and i think there hasn't yet been a clear decision on how it's supposed to work.

  1. I read an argument some time ago by @bader that every device should be shown in case someone just uses sycl as a library to fetch a device list. that would speak against completely hiding away the nvidia opencl device. Is there a consensus on wheater devices should be completely hidden or just not selected?
  2. With your patch it is absolutely necessary to have SYCL_BE=PI_CUDA set in order to run on nvidia devices. There is no way for the sane user (developer) of this library to choose a cuda device in his device_selector at runtime, the only way is to use env. Imagine someone having 2 queues. One using the nvidia backend the other one using an opencl device. How would that be possible at all?
  3. We are now introducing extra logic to hide away the nvidia opencl device. In fact the opencl implementation doesn't support opencl<2 and there are plenty of those devices out there. So why don't we treat them the same? why all the backend-specific logic all over the place? We could have a PI call to the backend asking if a device is supported or not?

just my 2cts but i'm opinionated on this and need to yell it. sorry :)

@Ruyk
Copy link
Contributor Author

Ruyk commented Apr 30, 2020

@Ruyk i tried your patch and i think there hasn't yet been a clear decision on how it's supposed to work.

  1. I read an argument some time ago by @bader that every device should be shown in case someone just uses sycl as a library to fetch a device list. that would speak against completely hiding away the nvidia opencl device. Is there a consensus on wheater devices should be completely hidden or just not selected?

The NVIDIA OpenCL platform is unusable with DPC++. We put some effort on our side to make it work initially in DPC++ and also in ComputeCpp. The reality is it creates more problem than it solves, specially if you have a proper CUDA backend. The moment the NVIDIA OpenCL platform is available on the system (e.g., via the ICD loader), it becomes the default device selected in most implementations (its a GPU so score is higher than CPU and host). However, the default options for compilation won't work: you need to pass the specific triple for the OpenCL PTX variant.. Even with that triple passed in, there are still problems in the compiler side and the code generated is not always correct for SYCL.

Our current thinking is that, unless someone puts effort on supporting the OpenCL NVIDIA platform on DPC++, is better to disable it all together and simplify using SYCL applications on CUDA as much as we can.

  1. With your patch it is absolutely necessary to have SYCL_BE=PI_CUDA set in order to run on nvidia devices. There is no way for the sane user (developer) of this library to choose a cuda device in his device_selector at runtime, the only way is to use env. Imagine someone having 2 queues. One using the nvidia backend the other one using an opencl device. How would that be possible at all?

That is still possible. Selecting the PI_CUDA backend doesnt prevent other devices running in the OpenCL one, it only ensures the selection of the CUDA backend is possible. Its a preferred backend when available.
Note however that there are driver issues that prevent using more than one binary format, so this will still fail at runtime.

Hopefully once the CUDA backend is no longer experimental and we have a more clear interface for multiple backends in the next SYCL specification, the environment variable will not be necessary.

  1. We are now introducing extra logic to hide away the nvidia opencl device. In fact the opencl implementation doesn't support opencl<2 and there are plenty of those devices out there. So why don't we treat them the same? why all the backend-specific logic all over the place? We could have a PI call to the backend asking if a device is supported or not?

There are other OpenCL 1.2 implementations that support DPC++, POCL or ComputeAorta (Codeplay's implementation) are some of them. They can consume SPIRV and don't create conflicts on the system. When deploying either of the two, they don't pollute the include or library paths forcing them to be defaulted.

I could be convinced to filter for SPIR-V support, but I think that could be problematic for OpenCL platforms that work with AOT compilation, since they may not report SPIR-V and expect only AOT.

just my 2cts but i'm opinionated on this and need to yell it. sorry :)

No need to be sorry! its good to have an open discussion about this :-)

@Ruyk
Copy link
Contributor Author

Ruyk commented Apr 30, 2020

  1. With your patch it is absolutely necessary to have SYCL_BE=PI_CUDA set in order to run on nvidia devices. There is no way for the sane user (developer) of this library to choose a cuda device in his device_selector at runtime, the only way is to use env. Imagine someone having 2 queues. One using the nvidia backend the other one using an opencl device. How would that be possible at all?

That is still possible. Selecting the PI_CUDA backend doesnt prevent other devices running in the OpenCL one, it only ensures the selection of the CUDA backend is possible. Its a preferred backend when available.
Note however that there are driver issues that prevent using more than one binary format, so this will still fail at runtime.

Actually you are correct, something went weird on my latest merge, this line https://github.com/intel/llvm/pull/1543/files#diff-8822f406c48f60f8a03ef19ad896145fR207 is not used, and this https://github.com/intel/llvm/pull/1543/files#diff-8822f406c48f60f8a03ef19ad896145fR213 shouldn't be there...

I'll fix it

* Removes NVIDIA OpenCL from the available list of platforms
* CUDA backend is available only if SYCL_BE=PI_CUDA is set

Signed-off-by: Ruyman Reyes <[email protected]>
@Ruyk Ruyk force-pushed the fix-default-selector-cuda branch from b7b7b98 to ad3951c Compare April 30, 2020 12:23
@Ruyk
Copy link
Contributor Author

Ruyk commented Apr 30, 2020

I've fixed the bad merge, code is a bit cleaner now.

@hiaselhans
Copy link
Contributor

I'm at ad3951cb31899cdac307b12ef13114ba64064d71 now. latest i think...

The NVIDIA OpenCL platform is unusable with DPC++. We put some effort on our side to make it work initially in DPC++ and also in ComputeCpp. The reality is it creates more problem than it solves, specially if you have a proper CUDA backend. The moment the NVIDIA OpenCL platform is available on the system (e.g., via the ICD loader), it becomes the default device selected in most implementations (its a GPU so score is higher than CPU and host). However, the default options for compilation won't work: you need to pass the specific triple for the OpenCL PTX variant.. Even with that triple passed in, there are still problems in the compiler side and the code generated is not always correct for SYCL.

Our current thinking is that, unless someone puts effort on supporting the OpenCL NVIDIA platform on DPC++, is better to disable it all together and simplify using SYCL applications on CUDA as much as we can.

I have no problem with not supporting nvidia's opencl implementation. I guess it's a mess, but so are other vendor's implementations. amdgpu-pro is stuck with 1.x aswell.

That is still possible. Selecting the PI_CUDA backend doesnt prevent other devices running in the OpenCL one, it only ensures the selection of the CUDA backend is possible. Its a preferred backend when available.
Note however that there are driver issues that prevent using more than one binary format, so this will still fail at runtime.

From reading the code i was pretty sure that's not the case. Especially with what's after line 220:
// If no preference, assume OpenCL and reject CUDA
I still tried:

Hopefully once the CUDA backend is no longer experimental and we have a more clear interface for multiple backends in the next SYCL specification, the environment variable will not be necessary.

i hope so too :)
But with your patch we move one step away from making SYCL_BE optional.
In the sycl branch i can just select the cuda device and all is fine. I tried with your patch and without setting the env var the cuda device is completely hidden from the selector. On another machine with just an intel i7 and SYCL_BE=PI_CUDA that opencl device is completely hidden too.

There are other OpenCL 1.2 implementations that support DPC++, POCL or ComputeAorta (Codeplay's implementation) are some of them. They can consume SPIRV and don't create conflicts on the system. When deploying either of the two, they don't pollute the include or library paths forcing them to be defaulted.

I could be convinced to filter for SPIR-V support, but I think that could be problematic for OpenCL platforms that work with AOT compilation, since they may not report SPIR-V and expect only AOT.

Right now there's this code:

C.get_platform().get_info<info::platform::version>() >= "2.1")

This seems to raise when the device only supports opencl < 2.1
Does it work in it's current state with POCL or ComputeAorta?

I think after all it should be the backend to decide if a device is supported by it or not.
In my dreams we were just introducing a PI call bool PiSupportsDevice() and call it in device_selector. then just rate unsupported devices with -1.
The above code could end up in that call in opencl plugin...

It's just my personal preference on how to treat this env variable but it's probably a quite fundamental decision that has to be taken at some point...?

@bader
Copy link
Contributor

bader commented Apr 30, 2020

Interesting discussion.
One observation: we already have a mechanism to "disable" unsupported devices using "SYCL_DEVICE_ALLOWLIST". If I understand it correctly you can provide a configuration file with some regular expression to filter out unsupported OpenCL implementations. @Ruyk, will this work for you?

@Ruyk
Copy link
Contributor Author

Ruyk commented May 1, 2020

@bader

Interesting discussion.
One observation: we already have a mechanism to "disable" unsupported devices using "SYCL_DEVICE_ALLOWLIST". If I understand it correctly you can provide a configuration file with some regular expression to filter out unsupported OpenCL implementations. @Ruyk, will this work for you?

I've seen that mechanism, but what would be needed is the opposite, a "SYCL_DEVICE_DISABLELIST" where you can add the NVIDIA OpenCL backend so its not visible on device selection.
Still, a user wanting to use the CUDA backend would need to always pass the SYCL_DEVICE_DISABLELIST to prevent the OpenCL platform being used on the default selection. Even when writing a custom device selector, since the user does not have access to the backend types, it needs to make sure it selects the CUDA backend and not the CUDA OpenCL platform.

The simplest option for end users is to disable the NVIDIA OpenCL, and prefer the CUDA backend.

I think a separate discussion is whether if we enable the CUDA backend by default or we need the SYCL_BE flag passed to enable it.

@hiaselhans

But with your patch we move one step away from making SYCL_BE optional.
In the sycl branch i can just select the cuda device and all is fine. I tried with your patch and >without setting the env var the cuda device is completely hidden from the selector. On another machine with just an intel i7 and SYCL_BE=PI_CUDA that opencl device is completely hidden too.

The patch should still enable OpenCL devices even with SYCL_BE=PI_CUDA, but i'll run some more tests to make sure its the case. There may be logic elsewhere in the SYCL RT that uses the SYCL_BE that I have missed.

One of the reasons I lean towards having people "activate" the CUDA backend via env variable is to ensure no accidental usage occurs. We have various reports of people using default selection that get "Invalid binary" or other errors because their CUDA backend is selected above other devices in the system. Unless we build always for all binary targets, or there is a clear SYCL API to select backends, users will have to modify their SYCL 1.2.1 codes with custom selectors to avoid choosing the CUDA backend by accident.

An example of this problem is the MultiDevice lit test (https://github.com/intel/llvm/blob/sycl/sycl/test/scheduler/MultipleDevices.cpp).
The test itself is well written SYCL 1.2.1 code. A function takes two queues, submits work to both of them and does some checks. It even prints "Good computation" at the end for better user experience.
Now, the test will always fail if you have more than one active backend.
The test is build for one target (SPIR, or PTX), so only one of the binaries is available. But requires two devices to run, so once there are two backends, only one will have the right binary.

Furthermore the test uses standard SYCL selectors,

host_selector hostSelector;
cpu_selector CPUSelector;
gpu_selector GPUSelector;

Those selectors are perfectly valid selectors, but the GPU selector will select the CUDA backend by default.

So that is a perfectly valid SYCL code that once the CUDA backend is enabled by default, it fails with an "invalid device type".
If we only enable the CUDA backend once the SYCL_BE is there, then the perfectly valid SYCL 1.2.1 code still works.
Obviously, this is a test and we could just modify it to handle the CUDA backend somehow. Then it comes the second problem, there is no SYCL API to select a backend. You need to query the device and platform information to ensure you don't pick the CUDA backend.
If we disable the NVIDIA OpenCL CUDA platform then, well, the user can reject any device that has the CUDA BACKEND string on the platform name. But then, a valid SYCL code needs to be modified to cater an implementation-specific behavior.
This will get more complicated if other backends are added, because now users must write device selectors that exclude any other incoming backend.

I see a couple of options moving forward (all of them assuming we ban the NVIDIA OpenCL platform):

  1. We disable CUDA backend by default, need to pass SYCL_BE to enable it (this should not prevent using CUDA and OpenCL devices together, i'll take a look to my patch again). Once SYCL 2020 is there, mechanism for device selection with backends can be used to deal with this problem properly. On SYCL 1.2.1, the default is OpenCL, and CUDA is an opt-in experience.
  2. DPC++ builds for both SPIRV and PTX by default, so all binaries are always available. Then, the SYCL_BE is only used to prevent using the CUDA backend in some cases. This has effects on binary sizes and compilation times, and probably building and deployment implications (e.g., CUDA dependencies are always needed when building DPC++ as BUILD_PI_CUDA would be to be enabled). There is work to get to this point, since currently the clang driver has several issues with multiple backends. This makes CUDA is an opt-out experience. Unclear how to scale this with other backends that may have their own binary formats.
  3. We enable both CUDA and OpenCL as it is now by default, and let users deal with the consecuences. When a system has a CUDA platform and a default selector, some SYCL applications will fail if they haven't been built with PTX and SPIRV support. Note then that an application build and tested on a system with SPIRV devices will suddenly fail if run on a platform that exposes an NVIDIA platform. CUDA is a silently-always-on option.

I see good and bad points on all of them.
To me (3) is a user-experience problem that would lead to more bug reports, but facilitates the usage of the CUDA backend and the experience of advanced users.
(1) is simpler to implement at the moment, but CUDA users may find it difficult to use the CUDA backend.
(2) Is the simplest default user experience, but building times can easily double. This has implications on testing times, among others.

I think we need opinions of a few people (@bader , @keryell , @bjoernknafla , @romanovvlad , @pvchupin to list a few) before we make a decision on what the default experience for DPC++ (or clang sycl) should be.

@bader bader requested a review from romanovvlad May 7, 2020 10:19
@bader
Copy link
Contributor

bader commented May 7, 2020

I think we need opinions of a few people (@bader , @keryell , @bjoernknafla , @romanovvlad , @pvchupin to list a few) before we make a decision on what the default experience for DPC++ (or clang sycl) should be.

Agree. We need a separate discussion on what is default logic for default device selection and how it can be configured by user (e.g. define SYCL_DEVICE_TYPE and SYCL_BE impact).

@hiaselhans, @smaslov-intel, are you okay to commit this PR and address your questions separately, if they are not addressed already?

@Ruyk, it's better to send multiple fixes in separate PRs. Long discussion for a single item blocks merging the whole PR i.e. N - 1 of N fixes.

@hiaselhans
Copy link
Contributor

i think i am the last to actually decide on such a matter. I just wanted to point out that this PR actually DOES change the behaviour by introducing that isDeviceInvalidForBe fn

In the upstream/sycl branch setting SYCL_BE already hides all devices other than the selected backend. i guess it's all we ever need for those lit tests mentioned.

With an intel opencl device on upstream/sycl:

> SYCL_BE=PI_CUDA ./sycl
SYCL host device (/1.2): -1
terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  No device of requested type available. -1 (CL_DEVICE_NOT_FOUND)
sycl.sh: Zeile 21: 361549 Abgebrochen             (Speicherabzug geschrieben) SYCL_BE=PI_CUDA $BUILD_PATH/$FILE
> ./sycl
Intel(R) Gen9 HD Graphics NEO (Intel(R) Corporation/OpenCL 2.1 NEO ): 10
gfx804 (Advanced Micro Devices, Inc./OpenCL 1.2 AMD-APP (2906.7)): -1
SYCL host device (/1.2): -1
The results are correct!

before this pr:

  • SYCL_BE empty -> cuda and opencl backend (and devices) are active
  • SYCL_BE=PI_CUDA -> only cuda backend
  • SYCL_BE=PI_OPENCL -> only opencl backend

after this pr:

  • SYCL_BE empty -> only opencl backend
  • SYCL_BE=PI_CUDA -> only cuda backend
  • SYCL_BE=PI_OPENCL -> only opencl backend

So we remove the posibility to have more than one backend active by requiring to actively enable the cuda plugin (which deactivates opencl).
From those 30 lines of isDeviceInvalidForBe I think that:

  • we already have 80% of the logic in place (activating only one backend with SYCL_BE).
  • 10% of it is requiring to manually activate CUDA backend at runtime (which i dont like) and another
  • 10% to disable opencl devices with 'OpenCL' and 'CUDA' in their caption (which i also don't like).
    In a way there's no difference between that 'gfx804' device that i have and the "opencl cuda" device. They are both unsupported. one shows up, one doesn't. It's a bit inconsistent and inconsequent in my eyes

const bool HasOpenCL = (platformVersion.find("OpenCL") != std::string::npos);
const bool HasCUDA = (platformVersion.find("CUDA") != std::string::npos);

backend *PrefBackend = detail::SYCLConfig<detail::SYCL_BE>::get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
backend *PrefBackend = detail::SYCLConfig<detail::SYCL_BE>::get();

Let's move this like after rejecting NVIDIA OpenCL impementation.

// Reject the NVIDIA OpenCL implementation
if (DeviceBackend == backend::opencl && HasCUDA && HasOpenCL)
return true;

Copy link
Contributor

@bader bader May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
backend *PrefBackend = detail::SYCLConfig<detail::SYCL_BE>::get();

Comment on lines +220 to +231
// If no preference, assume OpenCL and reject CUDA
if (DeviceBackend == backend::cuda && !PrefBackend) {
return true;
} else if (!PrefBackend)
return false;

// If using PI_OPENCL, reject the CUDA backend
if (DeviceBackend == backend::cuda && *PrefBackend == backend::opencl)
return true;

return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If no preference, assume OpenCL and reject CUDA
if (DeviceBackend == backend::cuda && !PrefBackend) {
return true;
} else if (!PrefBackend)
return false;
// If using PI_OPENCL, reject the CUDA backend
if (DeviceBackend == backend::cuda && *PrefBackend == backend::opencl)
return true;
return false;
}
// If no preference, assume OpenCL and reject CUDA
if (DeviceBackend == backend::cuda && (!PrefBackend || *PrefBackend == backend::opencl))
return true;
return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fixes to CUDA backend to get BabelStream (https://github.com/UoB-HPC/BabelStream), each one on individual commits:

  • Fixed: default selection for CUDA devices in presence of the NVIDIA OpenCL platform
  • Fixed: Missing return event on Unmap
  • Throw error in queue constructor when backend is not correct (currently segfaults)

From the PR description, this piece of code seems to be "not required to run BabelStream benchmarks on CUDA" and removing NVIDIA OpenCL implementation is enough. Right?

If so, @Ruyk, can we move this part to a separate PR to make @hiaselhans happy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, i'll split out the changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required to run BabelStream benchmarks on CUDA" and removing NVIDIA OpenCL > implementation is enough. Right?

Actually, babel stream doesnt use a device selector but a device list, so the NVIDIA OpenCL still appears. There are other workarounds I can do so not a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, @Ruyk, can we move this part to a separate PR to make @hiaselhans happy?

thanks @bader 😄
well, maybe it's me being a dickhead with this.

// If SYCL_BE is set then skip platforms which doesn't have specified

just saying we're re-implementing things already implemented somewhere else...

@Ruyk
Copy link
Contributor Author

Ruyk commented May 8, 2020

I am closing this one and submitting separate ones shortly.

@Ruyk Ruyk closed this May 8, 2020
bb-sycl pushed a commit that referenced this pull request Apr 18, 2023
Move the LLVM components to LINK_COMPONENTS because the DEPENDS list has the same semantics as add_dependencies(). In this
case it doesn't include the LLVM components when calling the linker.

It's almost complete revert of #1543

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@cf5a5a4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants