[SYCL][CUDA] Fix LIT fails on machines without non-NVIDIA OpenCL #1613

bjoernknafla · 2020-04-29T17:34:36Z

We are observing LIT fails on machines that only have the NVIDIA OpenCL, even if running for the SYCL PI CUDA backend.

These commits try to fix the problem but still require testing.

bjoernknafla · 2020-04-29T17:47:05Z

sycl/test/Unit/lit.cfg.py

+
+config.environment['SYCL_BE'] = lit_config.params.get('SYCL_BE', "PI_OPENCL")
+
+lit_config.note("Environment: {}".format(config.environment))


I wonder if there is a better way to print the implicit environment with which LIT runs the unit tests, e.g., one that allows a copy and paste approach to setting up the environment locally to investiage fails?

That looks like debug info. I would prefer to remove it from final sumission.

~~I am not too happy about this either. I am using it in sycl/test/basic_tests/diagnostics/device-check.cpp (see changes just below).~~

~~Alternative could be to only run the test below for OpenCL (by adding REQUIRES: opencl) as it tests a backend independent code path anyway.~~

~~What do you prefer?~~

I am not awake - completely missed what you tried to say here 🤦

You are right, this is information to enable debugging of a failed test as otherwise the environment in which the unit tests are run is invisible which makes replication of bugs very hard.

I will replace this with output of the SYCL_BE environment as we also do in the non-unit tests: sycl/test/lit.cfg.py#L76

The user then still gets the required info to re-run unittests by hand to recreate fails.

Could you clarify which info is missed? if you look into failing tests (e.g. http://ci.llvm.intel.com:8010/#/builders/37/builds/689/steps/15/logs/FAIL__SYCL__reduction_nd_conditional_cpp):
'RUN: at line 4'; env SYCL_DEVICE_TYPE=GPU SYCL_BE=PI_CUDA /localdisk2/sycl_ci/buildbot/worker/Lit_With_Cuda/llvm.obj/tools/sycl/test/reduction/Output/reduction_nd_conditional.cpp.tmp.out

You see the command line with specific environment variables set.

There are two approaches of setting environment variables in use:

explicitly (as shown by your example) as can bee seen in the lit.cfg.py file here: sycl/test/lit.cfg.py#L127 - these will show up when a LIT test fails, and

implicitly by manipulating the environment a LIT test will run in as seen here: sycl/test/lit.cfg.py#L45 - these are not shown when a LIT test fails.

My current thinking about this is:

Use the implicit environment variables to pass the existing environment through. Failing tests will run in the same environment (ignoring CI/buildbot systems for now) when running them by hand. Otherwise LIT filters out most environment variables: llvm/utils/lit/lit/TestingConfig.py#L24

Use the explicit environment variable approach for settings that are very specific to the test and test setup and that we configure inside of LIT as these will be visible when a test fails (or when running lit with the -a flag) and allow copy-and-paste rerunning.

Sadly the explicit approach does not work when running unit tests as there is no way foreseen by LIT (that I know of) to set the environment explicitly when running GTest tests...

sycl/test/basic_tests/get_nonhost_devices.cpp

vladimirlaz · 2020-04-29T19:11:21Z

sycl/test/Unit/lit.cfg.py

+
+config.environment['SYCL_BE'] = lit_config.params.get('SYCL_BE', "PI_OPENCL")
+
+lit_config.note("Environment: {}".format(config.environment))


That looks like debug info. I would prefer to remove it from final sumission.

vladimirlaz

approve for testing

bjoernknafla · 2020-05-06T19:14:02Z

The reduction fails of buildbot/Lit_With_Cuda have been fixed and merged 2h ago: #1641

I have pushed the rebased PR to include the fixes for the failing tests.

sycl/test/basic_tests/get_nonhost_devices.cpp

bader · 2020-05-07T14:58:44Z

sycl/test/basic_tests/diagnostics/handler.cpp

+// RUN: env SYCL_DEVICE_TYPE=HOST %t.out | FileCheck %s
+// RUN: %CPU_RUN_PLACEHOLDER %t.out %CPU_CHECK_PLACEHOLDER
+// RUN: %GPU_RUN_PLACEHOLDER %t.out %GPU_CHECK_PLACEHOLDER
+// RUN: %ACC_RUN_PLACEHOLDER %t.out %ACC_CHECK_PLACEHOLDER


Are these changes needed?
#1543 changed only line 1. Isn't it enough?

Is it related to the issue in sycl/test/basic_tests/get_nonhost_devices.cpp? If so, should it be resolved the same way?

My understanding is that the test code triggers the default selector implicitly when it creates the Queue. Therefore the PLACEHOLDER substitutions are used (for RUN and CHECK) to more explicitly control what is tested.

Suggested change

// RUN: env SYCL_DEVICE_TYPE=HOST %t.out | FileCheck %s

// RUN: %CPU_RUN_PLACEHOLDER %t.out %CPU_CHECK_PLACEHOLDER

// RUN: %GPU_RUN_PLACEHOLDER %t.out %GPU_CHECK_PLACEHOLDER

// RUN: %ACC_RUN_PLACEHOLDER %t.out %ACC_CHECK_PLACEHOLDER

// RUN: env SYCL_BE=%sycl_be %t.out | FileCheck %s

Should we really run this test on 4 different devices to validate exceptions handling?

I'd like to keep existing testing approach and align on overall testing strategy separately.

+1 The check and exception generation is done on host side (before offloading). There is no reason to run it on every device.

I see the point you both are making. Change with next pull.

bader · 2020-05-07T17:29:17Z

sycl/test/basic_tests/queue.cpp

+// RUN: %CPU_RUN_PLACEHOLDER %t.out
+// RUN: %GPU_RUN_PLACEHOLDER %t.out
+// RUN: %ACC_RUN_PLACEHOLDER %t.out


Suggested change

// RUN: %CPU_RUN_PLACEHOLDER %t.out

// RUN: %GPU_RUN_PLACEHOLDER %t.out

// RUN: %ACC_RUN_PLACEHOLDER %t.out

// RUN: env SYCL_BE=%sycl_be %t.out

bader · 2020-05-07T17:29:46Z

sycl/test/basic_tests/diagnostics/handler.cpp

+// RUN: env SYCL_DEVICE_TYPE=HOST %t.out | FileCheck %s
+// RUN: %CPU_RUN_PLACEHOLDER %t.out %CPU_CHECK_PLACEHOLDER
+// RUN: %GPU_RUN_PLACEHOLDER %t.out %GPU_CHECK_PLACEHOLDER
+// RUN: %ACC_RUN_PLACEHOLDER %t.out %ACC_CHECK_PLACEHOLDER


Suggested change

// RUN: env SYCL_DEVICE_TYPE=HOST %t.out | FileCheck %s

// RUN: %CPU_RUN_PLACEHOLDER %t.out %CPU_CHECK_PLACEHOLDER

// RUN: %GPU_RUN_PLACEHOLDER %t.out %GPU_CHECK_PLACEHOLDER

// RUN: %ACC_RUN_PLACEHOLDER %t.out %ACC_CHECK_PLACEHOLDER

// RUN: env SYCL_BE=%sycl_be %t.out | FileCheck %s

Should we really run this test on 4 different devices to validate exceptions handling?

I'd like to keep existing testing approach and align on overall testing strategy separately.

Make the backend used explicit in more LIT tests. These tests failed on machines with only NVIDIA OpenCL available as it is not supported. Signed-off-by: Bjoern Knafla <[email protected]>

Pass the SYCL_BE environment variable to the SYCL-Unit unit tests. Signed-off-by: Bjoern Knafla <[email protected]>

It allows to inject extra "launcher" prefix into active *_RUN_PLACEHOLDER substitutions and can be used, for example, to execute all the tests under valgrind. However, local experiments with some internal ifrastructure showed that, while helpful, it's not enough, so two more minor modifications are done as part of this change: - Enable recursive substitutions when SYCL_E2E_RUN_LAUNCHER is enabled - Provide %e2e_tests_root substitution. It is expected to be used in conjunction with existing "%s" substitution to be able to get a unique relative path to the current test.

bjoernknafla requested a review from a team as a code owner April 29, 2020 17:34

bjoernknafla requested a review from vladimirlaz April 29, 2020 17:34

bjoernknafla commented Apr 29, 2020

View reviewed changes

vladimirlaz suggested changes Apr 29, 2020

View reviewed changes

bjoernknafla force-pushed the bjoern/add-more-cuda-support-to-lit branch from 2d1c256 to 5e3199e Compare May 6, 2020 15:26

bjoernknafla requested a review from vladimirlaz May 6, 2020 15:27

vladimirlaz previously approved these changes May 6, 2020

View reviewed changes

vladimirlaz self-requested a review May 6, 2020 15:52

bader added the cuda CUDA back-end label May 6, 2020

bjoernknafla dismissed vladimirlaz’s stale review via d3e2fb8 May 6, 2020 19:53

bjoernknafla force-pushed the bjoern/add-more-cuda-support-to-lit branch from 5e3199e to d3e2fb8 Compare May 6, 2020 19:53

vladimirlaz reviewed May 7, 2020

View reviewed changes

sycl/test/basic_tests/get_nonhost_devices.cpp Outdated Show resolved Hide resolved

bjoernknafla force-pushed the bjoern/add-more-cuda-support-to-lit branch from d3e2fb8 to 1d7f491 Compare May 7, 2020 14:49

bader reviewed May 7, 2020

View reviewed changes

bader requested a review from vladimirlaz May 7, 2020 15:04

bader reviewed May 7, 2020

View reviewed changes

bjoernknafla added 2 commits May 13, 2020 17:19

[SYCL][CUDA] Run more LIT tests with CUDA

8064200

Make the backend used explicit in more LIT tests. These tests failed on machines with only NVIDIA OpenCL available as it is not supported. Signed-off-by: Bjoern Knafla <[email protected]>

[SYCL][CUDA] Pass SYCL_BE to LIT unit tests

a9bf0b6

Pass the SYCL_BE environment variable to the SYCL-Unit unit tests. Signed-off-by: Bjoern Knafla <[email protected]>

bjoernknafla force-pushed the bjoern/add-more-cuda-support-to-lit branch from 1d7f491 to a9bf0b6 Compare May 13, 2020 16:54

bader approved these changes May 13, 2020

View reviewed changes

vladimirlaz approved these changes May 14, 2020

View reviewed changes

bader merged commit bce2da2 into intel:sycl May 14, 2020

bjoernknafla deleted the bjoern/add-more-cuda-support-to-lit branch May 14, 2020 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA] Fix LIT fails on machines without non-NVIDIA OpenCL #1613

[SYCL][CUDA] Fix LIT fails on machines without non-NVIDIA OpenCL #1613

bjoernknafla commented Apr 29, 2020

bjoernknafla Apr 29, 2020

vladimirlaz Apr 29, 2020

bjoernknafla Apr 30, 2020 •

edited

Loading

bjoernknafla Apr 30, 2020

bjoernknafla May 6, 2020

bjoernknafla May 6, 2020

vladimirlaz May 6, 2020 •

edited

Loading

bjoernknafla May 6, 2020

bjoernknafla May 6, 2020

vladimirlaz Apr 29, 2020

vladimirlaz left a comment

bjoernknafla commented May 6, 2020 •

edited

Loading

bader May 7, 2020

bjoernknafla May 7, 2020

bader May 7, 2020

vladimirlaz May 8, 2020

bjoernknafla May 13, 2020 •

edited

Loading

bader May 7, 2020

bjoernknafla May 13, 2020

bader May 7, 2020


		config.environment['SYCL_BE'] = lit_config.params.get('SYCL_BE', "PI_OPENCL")

		lit_config.note("Environment: {}".format(config.environment))

[SYCL][CUDA] Fix LIT fails on machines without non-NVIDIA OpenCL #1613

[SYCL][CUDA] Fix LIT fails on machines without non-NVIDIA OpenCL #1613

Conversation

bjoernknafla commented Apr 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjoernknafla Apr 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladimirlaz May 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladimirlaz left a comment

Choose a reason for hiding this comment

bjoernknafla commented May 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjoernknafla May 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjoernknafla Apr 30, 2020 •

edited

Loading

vladimirlaz May 6, 2020 •

edited

Loading

bjoernknafla commented May 6, 2020 •

edited

Loading

bjoernknafla May 13, 2020 •

edited

Loading