[SYCL][CUDA][LIT] Extend test for non-OpenCL #1654

bjoernknafla · 2020-05-07T14:44:51Z

Mark LIT test that uses cl::reqd_work_group_size as unsupported by
CUDA.

Signed-off-by: Bjoern Knafla [email protected]

v-klochkov

If that is the only problem with cuda available in this test, then please move those 100 lines of code checking reqd_wg_size into a new test test.

The rest of the existing test would stay enabled for cuda (it would be about 575-100 lines after separating those 100 lines checking reqd_wg_size).

bjoernknafla · 2020-05-12T08:26:17Z

It isn't the only OpenCL specific part of the test:

many tests build the program via .build_with_kernel_type with a -cl-std=CL2.0 flag,
other tests check for the device version info which does a hard-coded check for OpenCL (if (OCLVersion...).

I will extract the non-OpenCL tests though it seems that there won't be much to extract.

bjoernknafla · 2020-05-28T09:08:34Z

Work on this is dealyed due to a higher prority bug and as extracting as many tests as possible requires me to understand better how SYCL on top of different OpenCL versions (1.2 vs 2.x) differs to create a more generic test for non-OpenCL backends (CUDA).

bjoernknafla · 2020-07-24T19:30:44Z

As PI is "inspired" by OpenCL, even non-OpenCL PI backends should behave like OpenCL 1.2 in regard to kernel launch error reporting.

I have changed the PR to:

Add more error detection and reporting to PI CUDA.
Adapt the parallel_for_range.cpp LIT tests to work with non-OpenCL PI backends. This way we can avoid duplicating a lot of test code that is tightly coupled to sycl/source/detail/error_handling/enqueue_kernel.cpp#L25.
Clang-formatted the LIT test in a separate commit.

To ease checking the logical (non-formatting) changes to the LIT test it will help to look at the separate commits.

bjoernknafla · 2020-07-27T21:11:18Z

I am not sure why buildbot/sycl-win-x64-pr failed and what to do about it - it seemed to have timed out but it produced the test results.

romanovvlad · 2020-07-28T11:02:00Z

I am not sure why buildbot/sycl-win-x64-pr failed and what to do about it - it seemed to have timed out but it produced the test results.
There are several LIT runs for different backends. So, testing passed for one BE, but "hangs" for another. I restarted the job and it failed again, so I believe there is some bug in your patch.

v-klochkov

Looks Good.
I think the Title of this PR should be updated to summarize the new meaning of the patch.

sycl/plugins/cuda/pi_cuda.cpp

bjoernknafla · 2020-07-30T08:16:22Z

In the buildbot stout output I am seeing:

Testing Time: 406.93s
  Unsupported      :  39
  Passed           : 319
  Expectedly Failed:   6
llvm-lit.py: D:\BuildBot\Product_worker_intel\sycl-win-x64-pr\llvm.obj\bin\..\..\llvm.src\llvm\utils\lit\lit\llvm\config.py:345: note: using clang: d:\buildbot\product_worker_intel\sycl-win-x64-pr\llvm.obj\bin\clang.exe
llvm-lit.py: D:/BuildBot/Product_worker_intel/sycl-win-x64-pr/llvm.src/sycl/test/lit.cfg.py:77: note: Backend (SYCL_BE): PI_OPENCL
llvm-lit.py: D:/BuildBot/Product_worker_intel/sycl-win-x64-pr/llvm.src/sycl/test/lit.cfg.py:131: note: Found available CPU device
llvm-lit.py: D:/BuildBot/Product_worker_intel/sycl-win-x64-pr/llvm.src/sycl/test/lit.cfg.py:157: note: Found available GPU device
llvm-lit.py: D:/BuildBot/Product_worker_intel/sycl-win-x64-pr/llvm.src/sycl/test/lit.cfg.py:186: warning: Accelerator device not found
llvm-lit.py: D:/BuildBot/Product_worker_intel/sycl-win-x64-pr/llvm.src/sycl/test/lit.cfg.py:200: note: Using opencl-aot version which is built as part of the project
llvm-lit.py: D:/BuildBot/Product_worker_intel/sycl-win-x64-pr/llvm.src/sycl/test/lit.cfg.py:215: warning: Couldn't find pre-installed AOT device compiler ocloc
llvm-lit.py: D:/BuildBot/Product_worker_intel/sycl-win-x64-pr/llvm.src/sycl/test/lit.cfg.py:212: note: Found pre-installed AOT device compiler aoc
llvm-lit.py: D:/BuildBot/Product_worker_intel/sycl-win-x64-pr/llvm.src/sycl/test/Unit/lit.cfg.py:71: note: Backend (SYCL_BE): PI_OPENCL
2 warning(s) in tests
-- Testing: 364 tests, 12 workers --
command timed out: 1200 seconds without output running [b'python3', b'llvm_ci/intel/worker/tools/build.py', b'-n', b'3574', b'-b', b'pull/1654/head', b'-r', b'1654', b'-t', b'check-sycl', b'-p', b'sycl', b'-s', b'lit', b'-P', b'intel/llvm', b'-m', b'sycl-win-x64-pr', b'-e', b'd7d9bbc1c847c79d89347b91ddf98a2598689ee6', b'-U', b'http://ci.llvm.intel.com:8010/#/builders/18/builds/3574'], attempting to kill
program finished with exit code 1
elapsedTime=1882.897134
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..

Does that mean that the first set of tests passed for OpenCL?

I guess the following run is for Level0 (as this buildbot run does not seem to test CUDA)? Level0 passes for me on Linux when I tested this. How to work around this for now? Should I add a REQUIRES: linux?

bader · 2020-07-30T09:16:52Z

Does that mean that the first set of tests passed for OpenCL?

It seems so. Actually it's quite confusing. I see that on Linux Level Zero back-end is tested first and OpenCL back-end is second. The Windows buildbot order is reversed.

I guess the following run is for Level0 (as this buildbot run does not seem to test CUDA)? Level0 passes for me on Linux when I tested this. How to work around this for now? Should I add a REQUIRES: linux?

I think this or UNSUPPORTED: windows should work.

bjoernknafla · 2020-07-30T10:12:30Z

I have updated the if-brackets and added UNSUPPORTED: windows to the test that hangs with Level0 on Windows.

bader · 2020-07-30T10:54:42Z

@bjoernknafla, could you apply clang-format patch, please?

v-klochkov

Looks good, but there are conflicts in parallel_for_range.cpp that need resolution.

bjoernknafla · 2020-07-30T16:08:58Z

I’ll resolve the conflicts once my keyboard has recovered from spilling water all over it...

v-klochkov · 2020-07-31T01:28:55Z

sycl/plugins/cuda/pi_cuda.cpp

+      // Determine local work sizes that result in uniform work groups.
+      for (size_t i = 0; i < work_dim; i++) {
+        threadsPerBlock[i] =
+            std::min(static_cast<int>(maxThreadsPerBlock[i]),


I realized that if work_dim > 1, then iterations of the loop at L2310 do not make any useful work for i = 1 and i=2 because at the beginning of the function threadsPerBlock is initialized as {32, 1, 1}. Also, global_work_size[i] is always > 0.
Thus for i = 1 and i = 2:
tmp1=min(threadsPerBlock[i] /* == 1*/, global_work_size[i] /* >0 */); // tmp is always set to 1
threadsPerBlock[i] = min(maxThreadsPerBlock[i], tmp1); // always set to 1.

Yes. I wrote this as general code and then left it to keep working should the default values change. This pessimizes performance.

I'll remove the unncessary loop iterations in a way that will make it easy to reintroduce the loop if necessary later - this might look a bit funny (professional term...).

I simplified the code and added a comment about only processing the first dimension due to the default values.

bjoernknafla · 2020-07-31T12:54:19Z

Clang format checking failued due to a server not being reachable.

bader · 2020-07-31T13:51:35Z

Clang format checking failued due to a server not being reachable.

Interesting... it looks like something went wrong with particular this package.
#2230 - clang-format-9 seems to work okay.

bader · 2020-07-31T13:59:28Z

@romanovvlad, ping.

The DPCPP runtime relies on piEnqueueKernelLaunch for NDRange parameter validity checks. Add missing checks to the PI CUDA backend. Signed-off-by: Bjoern Knafla <[email protected]>

Rewrite parallel_for_range.cpp test to work with non-OpenCL PI backends that behave like OpenCL 1.2. Level0 testing times out on Windows, mark `windows` as unsupported. Signed-off-by: Bjoern Knafla <[email protected]>

Signed-off-by: Bjoern Knafla <[email protected]>

bjoernknafla requested a review from a team as a code owner May 7, 2020 14:44

bjoernknafla requested a review from v-klochkov May 7, 2020 14:44

bjoernknafla mentioned this pull request May 7, 2020

Re-enable CUDA tests #1542

Closed

bader added the cuda CUDA back-end label May 7, 2020

v-klochkov requested changes May 7, 2020

View reviewed changes

v-klochkov requested a review from bader May 7, 2020 19:05

bjoernknafla force-pushed the bjoern/cuda-does-not-support-reqd-work-group-size branch from 43ab921 to d7d9bbc Compare July 24, 2020 19:26

bjoernknafla requested a review from a team as a code owner July 24, 2020 19:26

bjoernknafla requested a review from romanovvlad July 24, 2020 19:26

bjoernknafla requested a review from v-klochkov July 24, 2020 19:31

bader previously approved these changes Jul 26, 2020

View reviewed changes

v-klochkov previously approved these changes Jul 29, 2020

View reviewed changes

v-klochkov reviewed Jul 30, 2020

View reviewed changes

sycl/plugins/cuda/pi_cuda.cpp Outdated Show resolved Hide resolved

bjoernknafla changed the title ~~[SYCL][CUDA][LIT] reqd_wg_size unsupported~~ [SYCL][CUDA][LIT] Extend test for non-OpenCL Jul 30, 2020

bjoernknafla dismissed stale reviews from v-klochkov and bader via e212b19 July 30, 2020 10:10

bjoernknafla force-pushed the bjoern/cuda-does-not-support-reqd-work-group-size branch from d7d9bbc to e212b19 Compare July 30, 2020 10:10

bjoernknafla force-pushed the bjoern/cuda-does-not-support-reqd-work-group-size branch from e212b19 to 9695efb Compare July 30, 2020 11:30

bader previously approved these changes Jul 30, 2020

View reviewed changes

v-klochkov previously approved these changes Jul 30, 2020

View reviewed changes

v-klochkov reviewed Jul 31, 2020

View reviewed changes

bjoernknafla dismissed stale reviews from v-klochkov and bader via c1bc04a July 31, 2020 11:48

bjoernknafla force-pushed the bjoern/cuda-does-not-support-reqd-work-group-size branch from 9695efb to c1bc04a Compare July 31, 2020 11:48

bader previously approved these changes Jul 31, 2020

View reviewed changes

bader requested a review from v-klochkov July 31, 2020 13:59

bjoernknafla dismissed bader’s stale review via 22948c0 July 31, 2020 16:31

bjoernknafla force-pushed the bjoern/cuda-does-not-support-reqd-work-group-size branch from c1bc04a to 22948c0 Compare July 31, 2020 16:31

bjoernknafla added 3 commits July 31, 2020 17:34

[SYCL][CUDA] Check kernel launch params

f7dd22c

The DPCPP runtime relies on piEnqueueKernelLaunch for NDRange parameter validity checks. Add missing checks to the PI CUDA backend. Signed-off-by: Bjoern Knafla <[email protected]>

[SYCL][LIT] Enable test for non-OpenCL

7089b6a

Rewrite parallel_for_range.cpp test to work with non-OpenCL PI backends that behave like OpenCL 1.2. Level0 testing times out on Windows, mark `windows` as unsupported. Signed-off-by: Bjoern Knafla <[email protected]>

[SYCL][LIT] Apply clang-format to test

f8b72c7

Signed-off-by: Bjoern Knafla <[email protected]>

bjoernknafla force-pushed the bjoern/cuda-does-not-support-reqd-work-group-size branch from 22948c0 to f8b72c7 Compare July 31, 2020 17:10

bader approved these changes Jul 31, 2020

View reviewed changes

v-klochkov approved these changes Jul 31, 2020

View reviewed changes

romanovvlad approved these changes Aug 3, 2020

View reviewed changes

bader merged commit 6909c06 into intel:sycl Aug 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA][LIT] Extend test for non-OpenCL #1654

[SYCL][CUDA][LIT] Extend test for non-OpenCL #1654

bjoernknafla commented May 7, 2020

v-klochkov left a comment

bjoernknafla commented May 12, 2020

bjoernknafla commented May 28, 2020 •

edited

Loading

bjoernknafla commented Jul 24, 2020

bjoernknafla commented Jul 27, 2020

romanovvlad commented Jul 28, 2020

v-klochkov left a comment

bjoernknafla commented Jul 30, 2020

bader commented Jul 30, 2020

bjoernknafla commented Jul 30, 2020

bader commented Jul 30, 2020

v-klochkov left a comment

bjoernknafla commented Jul 30, 2020

v-klochkov Jul 31, 2020

bjoernknafla Jul 31, 2020 •

edited

Loading

bjoernknafla Jul 31, 2020

bjoernknafla commented Jul 31, 2020

bader commented Jul 31, 2020

bader commented Jul 31, 2020

[SYCL][CUDA][LIT] Extend test for non-OpenCL #1654

[SYCL][CUDA][LIT] Extend test for non-OpenCL #1654

Conversation

bjoernknafla commented May 7, 2020

v-klochkov left a comment

Choose a reason for hiding this comment

bjoernknafla commented May 12, 2020

bjoernknafla commented May 28, 2020 • edited Loading

bjoernknafla commented Jul 24, 2020

bjoernknafla commented Jul 27, 2020

romanovvlad commented Jul 28, 2020

v-klochkov left a comment

Choose a reason for hiding this comment

bjoernknafla commented Jul 30, 2020

bader commented Jul 30, 2020

bjoernknafla commented Jul 30, 2020

bader commented Jul 30, 2020

v-klochkov left a comment

Choose a reason for hiding this comment

bjoernknafla commented Jul 30, 2020

v-klochkov Jul 31, 2020

Choose a reason for hiding this comment

bjoernknafla Jul 31, 2020 • edited Loading

Choose a reason for hiding this comment

bjoernknafla Jul 31, 2020

Choose a reason for hiding this comment

bjoernknafla commented Jul 31, 2020

bader commented Jul 31, 2020

bader commented Jul 31, 2020

bjoernknafla commented May 28, 2020 •

edited

Loading

bjoernknafla Jul 31, 2020 •

edited

Loading