Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forward_progress_kernel_param_L0_gpu.cpp fails in post-commit #14692

Open
bader opened this issue Jul 22, 2024 · 5 comments
Open

forward_progress_kernel_param_L0_gpu.cpp fails in post-commit #14692

bader opened this issue Jul 22, 2024 · 5 comments
Assignees
Labels
bug Something isn't working confirmed

Comments

@bader
Copy link
Contributor

bader commented Jul 22, 2024

Describe the bug

Log from post-commit results for 8bf7ae3 (non-functional change).

Full log:
logs_26294248843 (1).zip, GitHub Actions Link.

-- Testing: 2110 tests, 24 workers --
FAIL: SYCL :: forward_progress/forward_progress_kernel_param_L0_gpu.cpp (2049 of 2110)
******************** TEST 'SYCL :: forward_progress/forward_progress_kernel_param_L0_gpu.cpp' FAILED ********************
Exit Code: -8

Command Output (stdout):
--
# RUN: at line 2
/__w/llvm/llvm/toolchain/bin//clang++   -fsycl -fsycl-targets=spir64  /__w/llvm/llvm/llvm/sycl/test-e2e/forward_progress/forward_progress_kernel_param_L0_gpu.cpp -o /__w/llvm/llvm/build-e2e/forward_progress/Output/forward_progress_kernel_param_L0_gpu.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/forward_progress/forward_progress_kernel_param_L0_gpu.cpp -o /__w/llvm/llvm/build-e2e/forward_progress/Output/forward_progress_kernel_param_L0_gpu.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 3
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/forward_progress/Output/forward_progress_kernel_param_L0_gpu.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/forward_progress/Output/forward_progress_kernel_param_L0_gpu.cpp.tmp.out
# note: command had no output on stdout or stderr
# error: command failed with exit status: -8

--

To reproduce

No response

Environment

  • OS: Linux
  • Target device and vendor: Intel GPU
  • DPC++ version: 8bf7ae3
  • Dependencies version:
[opencl:gpu][opencl:0] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A750 Graphics OpenCL 3.0 NEO  [24.22.29735.20]
[opencl:cpu][opencl:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i9-12900 OpenCL 3.0 (Build 0) [2024.18.6.0.02_160000]
[opencl:fpga][opencl:2] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.18.6.0.02_160000]
[level_zero:gpu][level_zero:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A750 Graphics 12.55.8 [1.3.29735]
[native_cpu:cpu][native_cpu:0] SYCL_NATIVE_CPU, SYCL Native CPU 0.1 [0.0.0]

Platforms: 5
Platform [#1]:
    Version  : OpenCL 3.0 
    Name     : Intel(R) OpenCL Graphics
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type              : gpu
        Version           : OpenCL 3.0 NEO 
        Name              : Intel(R) Arc(TM) A750 Graphics
        Vendor            : Intel(R) Corporation
        Driver            : 24.22.29735.20
        UUID              : [13](https://github.com/intel/llvm/actions/runs/10047331687/job/27769470583#step:17:14)412816186800030000000
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_srgb ext_intel_device_id ext_intel_legacy_image ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_intel_matrix ext_oneapi_private_alloca
        info::device::sub_group_sizes: 8 [16](https://github.com/intel/llvm/actions/runs/10047331687/job/27769470583#step:17:17) 32
        Architecture: intel_gpu_acm_g10
Platform [#2]:
    Version  : OpenCL 3.0 LINUX
    Name     : Intel(R) OpenCL
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#1]:
        Type              : cpu
        Version           : OpenCL 3.0 (Build 0)
        Name              : 12th Gen Intel(R) Core(TM) i9-12900
        Vendor            : Intel(R) Corporation
        Driver            : 2024.[18](https://github.com/intel/llvm/actions/runs/10047331687/job/27769470583#step:17:19).6.0.02_160000
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_oneapi_srgb ext_oneapi_native_assert ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_private_alloca
        info::device::sub_group_sizes: 4 8 16 32 64
        Architecture: x86_64
Platform [#3]:
    Version  : OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version [20](https://github.com/intel/llvm/actions/runs/10047331687/job/27769470583#step:17:21).3
    Name     : Intel(R) FPGA Emulation Platform for OpenCL(TM)
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#2]:
        Type              : fpga
        Version           : OpenCL 1.2 
        Name              : Intel(R) FPGA Emulation Device
        Vendor            : Intel(R) Corporation
        Driver            : 20[24](https://github.com/intel/llvm/actions/runs/10047331687/job/27769470583#step:17:25).18.6.0.02_160000
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : accelerator fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_atomic_host_allocations usm_atomic_shared_allocations ext_oneapi_srgb ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_private_alloca
        info::device::sub_group_sizes: 4 8 16 32 64
        Architecture: unknown
Platform [#4]:
    Version  : 1.3
    Name     : Intel(R) Level-Zero
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type              : gpu
        Version           : 12.55.8
        Name              : Intel(R) Arc(TM) A750 Graphics
        Vendor            : Intel(R) Corporation
        Driver            : 1.3.29735
        UUID              : 1341[28](https://github.com/intel/llvm/actions/runs/10047331687/job/27769470583#step:17:29)16186800030000000
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_intel_matrix ext_oneapi_limited_graph ext_oneapi_private_alloca ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem
        info::device::sub_group_sizes: 8 16 32
        Architecture: intel_gpu_acm_g10
Platform [#5]:
    Version  : 0.1
    Name     : SYCL_NATIVE_CPU
    Vendor   : tbd
    Devices  : 1
        Device [#0]:
        Type              : cpu
        Version           : 0.1
        Name              : SYCL Native CPU
        Vendor            : Intel(R) Corporation
        Driver            : 0.0.0
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : cpu fp16 fp64 queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64
        info::device::sub_group_sizes: 1
        Architecture: unknown
default_selector()      : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A750 Graphics 12.55.8 [1.3.[29](https://github.com/intel/llvm/actions/runs/10047331687/job/27769470583#step:17:30)735]
accelerator_selector()  : fpga, Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.18.6.0.02_160000]
cpu_selector()          : cpu, Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i9-12900 OpenCL 3.0 (Build 0) [2024.18.6.0.02_160000]
gpu_selector()          : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A750 Graphics 12.55.8 [1.3.29735]
custom_selector(gpu)    : gpu, Intel(R) Level-Zero, Intel(R) Arc(TM) A750 Graphics 12.55.8 [1.3.29735]
custom_selector(cpu)    : cpu, Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i9-12900 OpenCL 3.0 (Build 0) [2024.18.6.0.02_160000]
custom_selector(acc)    : fpga, Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.18.6.0.02_160000]

Additional context

No response

@bader bader added bug Something isn't working confirmed labels Jul 22, 2024
@lbushi25 lbushi25 self-assigned this Jul 23, 2024
@lbushi25
Copy link
Contributor

lbushi25 commented Jul 23, 2024

AFAIK, this test is not known to be flaky. It is also passing for the latest couple of post-commit runs. Just in case, i'll run it locally with the repo state set to the commit that saw the post-commit failure.

@lbushi25
Copy link
Contributor

This is passing locally:

lbushi@scsel-tl-02:~/sycl_workspace/llvm/build/bin$ ./sycl-ls
INFO: Output filtered by ONEAPI_DEVICE_SELECTOR environment variable, which is set to level_zero:gpu.
To see device ids, use the --ignore-device-selectors CLI option.

[level_zero:gpu] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 12.0.0 [1.3.28202]
lbushi@scsel-tl-02:~/sycl_workspace/llvm/build/bin$ llvm-lit -v ../../sycl/test-e2e/forward_progress/forward_progress_kernel_param_L0_gpu.cpp
llvm-lit: /nfs/site/home/lbushi/sycl_workspace/llvm/sycl/test-e2e/lit.cfg.py:414: note: Targeted devices: all
INFO: Output filtered by ONEAPI_DEVICE_SELECTOR environment variable, which is set to level_zero:gpu.
To see device ids, use the --ignore-device-selectors CLI option.

llvm-lit: /nfs/site/home/lbushi/sycl_workspace/llvm/sycl/test-e2e/lit.cfg.py:632: note: Found pre-installed AOT device compiler ocloc
llvm-lit: /nfs/site/home/lbushi/sycl_workspace/llvm/sycl/test-e2e/lit.cfg.py:632: note: Found pre-installed AOT device compiler opencl-aot
llvm-lit: /nfs/site/home/lbushi/sycl_workspace/llvm/sycl/test-e2e/lit.cfg.py:733: note: Aspects for level_zero:gpu: ext_oneapi_opportunistic_group, ext_intel_esimd, ext_intel_device_id, atomic64, ext_intel_gpu_slices, ext_oneapi_bindless_images_shared_usm, usm_host_allocations, ext_intel_pci_address, ext_oneapi_private_alloca, ext_intel_gpu_eu_count_per_subslice, usm_shared_allocations, ext_intel_gpu_hw_threads_per_eu, ext_oneapi_limited_graph, online_compiler, queue_profiling, ext_intel_gpu_eu_count, online_linker, usm_device_allocations, gpu, ext_oneapi_virtual_mem, ext_oneapi_mipmap, fp16, ext_intel_device_info_uuid, ext_intel_legacy_image, ext_intel_memory_clock_rate, ext_intel_gpu_subslices_per_slice, ext_oneapi_mipmap_anisotropy, ext_oneapi_ballot_group, ext_oneapi_tangle_group, ext_oneapi_fixed_size_group, ext_intel_gpu_eu_simd_width, ext_oneapi_queue_profiling_tag, ext_oneapi_bindless_images_2d_usm, ext_intel_memory_bus_width, ext_oneapi_bindless_images
llvm-lit: /nfs/site/home/lbushi/sycl_workspace/llvm/sycl/test-e2e/lit.cfg.py:745: note: SG sizes for level_zero:gpu: 32, 8, 16
llvm-lit: /nfs/site/home/lbushi/sycl_workspace/llvm/sycl/test-e2e/lit.cfg.py:754: note: Architectures for level_zero:gpu: intel_gpu_tgllp
-- Testing: 1 tests, 1 workers --
PASS: SYCL :: forward_progress/forward_progress_kernel_param_L0_gpu.cpp (1 of 1)

Testing Time: 3.84s

Total Discovered Tests: 1
  Passed: 1 (100.00%)
lbushi@scsel-tl-02:~/sycl_workspace/llvm/build/bin$ git show HEAD
commit 8bf7ae39fbbc2bca93b63c511b3350a0b2da9ab1 (HEAD)
Author: Alexey Bader <[email protected]>
Date:   Mon Jul 22 12:43:22 2024 -0700

    [CODEOWNERS] Fix merge conflict with community change. (#14688)

diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
index 90177ad8d05e..772af752ea60 100644
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -47,12 +47,6 @@ sycl/test-e2e/Plugin/dll-detach-order.cpp @intel/llvm-reviewers-runtime
 sycl/plugins/**/cuda/ @intel/llvm-reviewers-cuda
 sycl/plugins/**/hip/ @intel/llvm-reviewers-cuda

-# Transform Dialect in MLIR.
-/mlir/include/mlir/Dialect/Transform/* @ftynse @nicolasvasilache
-/mlir/lib/Dialect/Transform/* @ftynse @nicolasvasilache
-/mlir/**/*TransformOps* @ftynse @nicolasvasilache
-
-
 # CUDA specific runtime implementations
 sycl/include/sycl/ext/oneapi/experimental/cuda/ @intel/llvm-reviewers-cuda

lbushi@scsel-tl-02:~/sycl_workspace/llvm/build/bin$

@lbushi25
Copy link
Contributor

Closing as I cannot reproduce and I don't see this failure in recent post-commits.

@sarnex
Copy link
Contributor

sarnex commented Aug 30, 2024

@lbushi25 Saw this again today here, maybe you can take another look and see if it's sporadic? Thanks

# RUN: at line 2
/__w/llvm/llvm/toolchain/bin//clang++  -Werror  -fsycl -fsycl-targets=spir64  /__w/llvm/llvm/llvm/sycl/test-e2e/forward_progress/forward_progress_kernel_param_L0_gpu.cpp -o /__w/llvm/llvm/build-e2e/forward_progress/Output/forward_progress_kernel_param_L0_gpu.cpp.tmp.out
# executed command: /__w/llvm/llvm/toolchain/bin//clang++ -Werror -fsycl -fsycl-targets=spir64 /__w/llvm/llvm/llvm/sycl/test-e2e/forward_progress/forward_progress_kernel_param_L0_gpu.cpp -o /__w/llvm/llvm/build-e2e/forward_progress/Output/forward_progress_kernel_param_L0_gpu.cpp.tmp.out
# note: command had no output on stdout or stderr
# RUN: at line 3
env ONEAPI_DEVICE_SELECTOR=level_zero:gpu  /__w/llvm/llvm/build-e2e/forward_progress/Output/forward_progress_kernel_param_L0_gpu.cpp.tmp.out
# executed command: env ONEAPI_DEVICE_SELECTOR=level_zero:gpu /__w/llvm/llvm/build-e2e/forward_progress/Output/forward_progress_kernel_param_L0_gpu.cpp.tmp.out
# .---command stderr------------
# | terminate called after throwing an instance of 'sycl::_V1::exception'
# |   what():  UR error
# `-----------------------------
# error: command failed with exit status: -6

--

********************

@sarnex sarnex reopened this Aug 30, 2024
@callumfare
Copy link
Contributor

Just encountered this failure:
https://github.com/intel/llvm/actions/runs/10718850999/job/29725040213

It passed again on a retry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed
Projects
None yet
Development

No branches or pull requests

4 participants