Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable CUDA tests #1542

Closed
erichkeane opened this issue Apr 17, 2020 · 34 comments
Closed

Re-enable CUDA tests #1542

erichkeane opened this issue Apr 17, 2020 · 34 comments
Assignees
Labels
cuda CUDA back-end

Comments

@erichkeane
Copy link
Contributor

In 7a9a425 Alexander disabled quite a few SYCL tests for CUDA with XFAIL:CUDA in order to get the commit in. These all need to be re-enabled. This likely can be handled by a few different people, but Alexander should own re-enabling that which he disabled.

Note that most have a TODO associated with them.

The following test all have XFAIL:cuda in them, so they need to be analyzed and fixed:
usm/queue_wait.cpp
usm/allocator_vector_fail.cpp
usm/math.cpp
usm/mixed2template.cpp
usm/smemll.cpp
usm/depends_on.cpp
usm/memset.cpp
usm/mixed_queue.cpp
usm/dmemll.cpp
usm/memcpy.cpp
usm/allocator_vector.cpp
usm/allocatorll.cpp
usm/hmemll.cpp
usm/memadvise.cpp
usm/badmalloc.cpp
usm/mixed2.cpp
usm/mixed.cpp
device-code-split/aot-gpu.cpp
device-code-split/split-per-kernel.cpp
device-code-split/split-per-source-main.cpp
regression/kernel_name_class.cpp
regression/kernel_unnamed.cpp
separate-compile/sycl-external.cpp
ordered_queue/oq_kernels.cpp
ordered_queue/ordered_buffs.cpp
basic_tests/buffer/subbuffer.cpp
basic_tests/buffer/buffer.cpp
basic_tests/parallel_for_range.cpp
basic_tests/scalar_vec_access.cpp
basic_tests/vec_op.cpp
basic_tests/image.cpp
basic_tests/compare_exchange_strong.cpp
basic_tests/image_array.cpp
basic_tests/stream/stream.cpp
basic_tests/stream/auto_flush.cpp
basic_tests/parallel_for_indexers.cpp
built-ins/vector_common.cpp
built-ins/scalar_geometric.cpp
built-ins/vector_integer.cpp
built-ins/vector_geometric.cpp
built-ins/scalar_relational.cpp
built-ins/vector_math.cpp
built-ins/vector_relational.cpp
built-ins/nan.cpp
built-ins/scalar_common.cpp
built-ins/scalar_integer.cpp
built-ins/scalar_math.cpp
built-ins/printf.cpp
scheduler/ReleaseResourcesTest.cpp
scheduler/DataMovement.cpp

@Ruyk Ruyk self-assigned this Apr 17, 2020
@bader bader added the cuda CUDA back-end label Apr 17, 2020
@bjoernknafla
Copy link
Contributor

A lot of work of the community and us went into fixing LIT testing to work as expected for CUDA - without marking tests as XFAIL it would have taken longer with more breakage slipping through. Without a lot of visible and invisible work of Alexander there would to this time be no reliable and deterministic LIT testing.

Sometimes I mark tests as XFAIL that we expect to pass in the future. This way the tests will automatically show up as unexpected passes when running tests after adding features.

We have a roadmap for many of the CUDA related LIT XFAILs (and UNSUPPORTEDs) and our whole team takes responsibility for these and is actively working on them. Some XFAILS will be fixed soon, some later, some are not part of what we plan to achieve and we will not work on them.

Tests we are aware off and are tracking will have a TODO or comment added to them about the problem.

Tests without any comment typicall fall into the following categories:

  • Added upstream without testing CUDA while we prepared open sourcing.
  • Updated upstream since open sourcing while features got added or changed upstream without updating or running CUDA tests.

@erichkeane
Copy link
Contributor Author

That is fine, we just need a bug/series of bugs to track these. So, this is here to track them.

At least one is disabled because of design implementation details of __builtin_unique_stable_name, which is how this came up. It is disappointing that it was never brought up via bug/etc, since we could have fixed this trivially immediately (I've since fixed it in the upstream, so we're awaiting the pulldown).

We need to analyze the changes that will need to happen to make sure they do not have ABI implications, otherwise they might never be able to be fixed.

@bader
Copy link
Contributor

bader commented Apr 20, 2020

@erichkeane, it looks like @vladimirlaz has some problems with pulldown, so if __builtin_unique_stable_name fix is required soon, it can be cherry-picked manually.

@erichkeane
Copy link
Contributor Author

I don't know if it necessarily IS required soon, just before we hit a release that locks us into ABI. Presumably, we could have these problems with some of the things that CUDA disabled as well, so I just want to make sure we have a plan for HOW to fix these before a release locks us into not being able to make said change.

@keryell
Copy link
Contributor

keryell commented Apr 22, 2020

ABI, ABI... Ahem... Is it a proof of success when we end up into ABI stability discussion? :-)

@erichkeane
Copy link
Contributor Author

ABI, ABI... Ahem... Is it a proof of success when we end up into ABI stability discussion? :-)

Any language worth using has to end up in an ABI discussion eventually. As a person shipping a compiler frontend, ABI compatibility between versions and compilers is a constant headache.

@bjoernknafla
Copy link
Contributor

Fix of tests for basic_tests/buffer/buffer.cpp: #1570

@bjoernknafla
Copy link
Contributor

bjoernknafla commented Apr 22, 2020

@bjoernknafla
Copy link
Contributor

PR by @fwyzard to fix many of the USM XFAILs: #1577

@bjoernknafla
Copy link
Contributor

The following (merged) PR triaged most LIT tests to mark missing features or features that cannot be supported by CUDA as UNSUPPORTED: cuda and added a comment: #1620

  • SYCL 1.2.1 image spec cannot be implemented with HW support in CUDA.
  • Fixed tests (by passing the required compilation target) to work with CUDA.
  • printf is not supported by CUDA
  • Device code splitting is not supported by CUDA.
  • Unnamed lambdas are not supported by CUDA.

@bjoernknafla
Copy link
Contributor

bjoernknafla commented May 7, 2020

scheduler/ReleaseResourcesTest.cpp fails due to an unexpected trace when running with CUDA as CUDA does not support SPIR-V. Marked the test as unsupported: #1847

@bjoernknafla
Copy link
Contributor

That leaves xfails due to missing builtins - and we are actively working on them.

@bjoernknafla
Copy link
Contributor

basic_tests/buffer/subbuffer.cpp this test needs to be fixed to handle sub-buffer alignment requirements correctly before knowing if it works with CUDA.

@bjoernknafla
Copy link
Contributor

bjoernknafla commented May 7, 2020

basic_tests/parallel_for_range.cpp- CUDA does not support the attribute [[cl::reqd_work_group_size]]: #1654

@bjoernknafla
Copy link
Contributor

parallel_for_indexers.cpp is under investigation (fail is related to global offset handling).

@erichkeane
Copy link
Contributor Author

The following (merged) PR triaged most LIT tests to mark missing features or features that cannot be supported by CUDA as UNSUPPORTED: cuda and added a comment: #1620

* SYCL 1.2.1 image spec cannot be implemented with HW support in CUDA.

* Fixed tests (by passing the required compilation target) to work with CUDA.

* printf is not supported by CUDA

* Device code splitting is not supported by CUDA.

* Unnamed lambdas are not supported by CUDA.

Unnamed Lambdas should now be supported for CUDA I changed the text format so that it should now work, if you're still having a problem, please let me know!

@bjoernknafla
Copy link
Contributor

bjoernknafla commented May 7, 2020

Both unnamed lambda tests use USM and require the fix by @fwyzard : #1577 to then pass.

One of them passes but creates an (ignored) error output due to event handling. The fix for this planned on top of #1471

I will track #1577 to re-enable the unnamed lambda-related tests once it is merged.

@erichkeane
Copy link
Contributor Author

I see, thanks! So nothing to do with the mangling anymore. Not sure how the USM matters here (other than perhaps it is used in the test?).

@bjoernknafla
Copy link
Contributor

bjoernknafla commented May 7, 2020

Both tests (sycl/test/usm/pfor_flatten.cpp, sycl/test/ordered_queue/oq_kernels.cpp) trigger USM allocations and the mentioned PR fixes an issue with 0 alignment values to be handled spec-conform.

@erichkeane
Copy link
Contributor Author

Great, thanks for clarifying.

@bjoernknafla
Copy link
Contributor

bjoernknafla commented Jun 9, 2020

PRs adding builtins:

@bjoernknafla
Copy link
Contributor

bjoernknafla commented Jun 9, 2020

usm/queue_wait.cpp - PR #1577
usm/allocator_vector_fail.cpp - PR #1577
usm/math.cpp - PR #1577
usm/mixed2template.cpp - PR #1577
usm/smemll.cpp - PR #1577
usm/depends_on.cpp - PR #1577
usm/memset.cpp - PR #1577
usm/mixed_queue.cpp - PR #1577
usm/dmemll.cpp - PR #1577
usm/memcpy.cpp - PR #1577
usm/allocator_vector.cpp - PR #1577
usm/allocatorll.cpp - PR #1577
usm/hmemll.cpp - PR #1577
usm/memadvise.cpp - PR #1577
usm/badmalloc.cpp - PR #1577
usm/mixed2.cpp - PR #1577
usm/mixed.cpp - PR #1577
device-code-split/aot-gpu.cpp - unsupported device code splitting and SPIR
device-code-split/split-per-kernel.cpp - unsupported device code splitting and SPIR
device-code-split/split-per-source-main.cpp - unsupported device code splitting and SPIR
regression/kernel_name_class.cpp fixed
regression/kernel_unnamed.cpp fixed
separate-compile/sycl-external.cpp fixed
ordered_queue/oq_kernels.cpp test got removed
ordered_queue/ordered_buffs.cpp test got removed
basic_tests/buffer/subbuffer.cpp - PR #1913
basic_tests/buffer/buffer.cpp fixed
basic_tests/parallel_for_range.cpp - PR #1654
basic_tests/scalar_vec_access.cpp - PR #1831
basic_tests/vec_op.cpp - PR #2009
basic_tests/image.cpp - PR #1970
basic_tests/compare_exchange_strong.cpp - PR #2027
basic_tests/image_array.cpp - unsupported images
basic_tests/stream/stream.cpp - PR #1831
basic_tests/stream/auto_flush.cpp fixed
basic_tests/parallel_for_indexers.cpp - PR #1773
built-ins/vector_common.cpp - PR #1990
built-ins/scalar_geometric.cpp - PR #1832
built-ins/vector_integer.cpp - PR #1828
built-ins/vector_geometric.cpp - PR #1832
built-ins/scalar_relational.cpp - PR #1831
built-ins/vector_math.cpp - PR #1990
built-ins/vector_relational.cpp - PR #1831
built-ins/nan.cpp - PR #2024
built-ins/scalar_common.cpp - PR #1990
built-ins/scalar_integer.cpp - PR #1828
built-ins/scalar_math.cpp - PR #1827
built-ins/scalar_math_2.cpp - PR #1990
built-ins/printf.cpp printf unsupported
scheduler/ReleaseResourcesTest.cpp SPIR-V unsupported PR #1847
scheduler/DataMovement.cpp fixed
sycl/test/basic_tests/vec_convert_half.cpp - PR #2024
sycl/test/atomic_ref/max.cpp - PR #2032
sycl/test/atomic_ref/min.cpp - PR #2032

@bader
Copy link
Contributor

bader commented Oct 13, 2020

SYCL :: basic_tests/image/image_write.cpp fails on CUDA device.
#2627 - Log: http://ci.llvm.intel.com:8010/#/builders/37/builds/3715
#2628 - Log: http://ci.llvm.intel.com:8010/#/builders/37/builds/3716
I'm going to disable it and use use issue to track the status. At first glance it looks like some UB triggered by random changes.

@steffenlarsen
Copy link
Contributor

@erichkeane @bader

I don't see any more XFAIL: cuda in intel/llvm. Can we close this?

@bader
Copy link
Contributor

bader commented Apr 21, 2021

+@vladimirlaz, I think we should check https://github.com/intel/llvm-test-suite/tree/intel/SYCL as most of these tests moved from intel/llvm there.

@steffenlarsen
Copy link
Contributor

I think there are a handful of xfails left in there, but would it make sense to open a new ticket on that repo for tracking those?

@vladimirlaz
Copy link
Contributor

Simple grepping showed that there are some:
It looks like tests in Regression and USM are inherited, the other may be new:

llvm-test-suite/build$ grep -r XFAIL ../SYCL/ | grep cuda
../SYCL/Regression/atomic_load.cpp:// XFAIL: cuda
../SYCL/Regression/implicit_offset_debug_info.cpp:// XFAIL: cuda
../SYCL/Sampler/normalized-clamp-linear-float.cpp:// XFAIL: cuda
../SYCL/Sampler/normalized-mirror-linear-float.cpp:// XFAIL: cuda
../SYCL/Sampler/unnormalized-none-linear-float.cpp:// XFAIL: cuda
../SYCL/Sampler/normalized-repeat-nearest.cpp:// XFAIL: cuda
../SYCL/Sampler/normalized-mirror-nearest.cpp:// XFAIL: cuda
../SYCL/Sampler/normalized-repeat-linear-float.cpp:// XFAIL: cuda
../SYCL/Sampler/unnormalized-clampedge-linear-float.cpp:// XFAIL: cuda
../SYCL/Sampler/normalized-none-linear-float.cpp:// XFAIL: cuda
../SYCL/Sampler/normalized-clampedge-linear-float.cpp:// XFAIL: cuda
../SYCL/Sampler/unnormalized-clamp-linear-float.cpp:// XFAIL: cuda
../SYCL/Config/select_device.cpp:// XFAIL: cuda
../SYCL/Basic/reqd_work_group_size.cpp:// XFAIL: cuda
../SYCL/Scheduler/CommandCleanupThreadSafety.cpp:// XFAIL: cuda
../SYCL/Scheduler/MemObjRemapping.cpp:// XFAIL: cuda
../SYCL/KernelAndProgram/build-log.cpp:// XFAIL: cuda
../SYCL/KernelAndProgram/basic-program.cpp:// XFAIL: cuda
../SYCL/KernelAndProgram/kernel-and-program.cpp:// XFAIL: cuda
../SYCL/KernelAndProgram/cache-build-result.cpp:// XFAIL: cuda
../SYCL/KernelAndProgram/get-options.cpp:// XFAIL: cuda
../SYCL/USM/dmem_varied.cpp:// XFAIL: cuda
../SYCL/USM/smem_concurrent.cpp:// XFAIL: cuda
../SYCL/USM/math.cpp:// XFAIL: cuda
../SYCL/USM/pointer_query.cpp:// XFAIL: cuda
../SYCL/USM/smem_varied.cpp:// XFAIL: cuda
../SYCL/USM/prefetch.cpp:// XFAIL: cuda

@steffenlarsen
Copy link
Contributor

To my knowledge, some of them have both XFAIL: cuda and REQUIRES: level-zero, such as SYCL/USM/smem_concurrent.cpp. Am I right in assuming the latter makes lit ignore the former?

@vladimirlaz
Copy link
Contributor

To my knowledge, some of them have both XFAIL: cuda and REQUIRES: level-zero, such as SYCL/USM/smem_concurrent.cpp. Am I right in assuming the latter makes lit ignore the former?

You are right.

@bader
Copy link
Contributor

bader commented Apr 22, 2021

I think there are a handful of xfails left in there, but would it make sense to open a new ticket on that repo for tracking those?

I'm okay with that.
I can even transfer this issue to llvm-test-suite if you are okay with it.

@steffenlarsen
Copy link
Contributor

I can even transfer this issue to llvm-test-suite if you are okay with it.

I think it may be cleaner to just restart it and link. That way people who read it won't have to go through all the previous discussions on fixed issues to get to the relevant information. Thank you though!

If you are okay with that solution, I will go find the relevant XFAILs and list them in a new issue on llvm-test-suite. I'll leave out the ones with REQUIRES: level-zero and make a PR to remove XFAIL: cuda from them.

@bader
Copy link
Contributor

bader commented Apr 22, 2021

Sounds good to me. Thanks!

@steffenlarsen
Copy link
Contributor

I have opened the continuation issue here: intel/llvm-test-suite#249

@AlexeySachkov
Copy link
Contributor

Closing this one, because we have a more up-to-date tracker for those issues here: intel/llvm-test-suite#249

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

No branches or pull requests

8 participants