Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL spec example throw an instance of 'sycl::_v1::invalid_parameter_error' #7568

Closed
wangzy0327 opened this issue Nov 29, 2022 · 7 comments
Closed

Comments

@wangzy0327
Copy link

wangzy0327 commented Nov 29, 2022

Describe the bug
terminate called after throwing an instance of 'sycl::_V1::invalid_parameter_error'
what(): Index out of range -30 (PI_ERROR_INVALID_VALUE)

ref

To Reproduce
Please describe the steps to reproduce the behavior:

These are my example code;

my example code
#include <sycl/sycl.hpp>
#include <iostream>
#include <random>

using namespace sycl;

constexpr size_t N = 10;

class RandomFiller {
 public:
  RandomFiller(accessor<int> ptr)
      : ptr_ { ptr } {
    std::random_device hwRand;
    std::uniform_int_distribution<> r { 1, 100 };
    randomNum_ = r(hwRand);
  }
  void operator()(item<1> item) const { ptr_[item.get_id()] = get_random(); }
  int get_random() const { return randomNum_; }

 private:
  accessor<int> ptr_;
  int randomNum_;
};

void workFunction(buffer<int, 1>& b, queue& q, const range<1> r) {
  q.submit([&](handler& cgh) {
    accessor ptr { b, cgh };
    RandomFiller filler { ptr };

    cgh.parallel_for(r, filler);
  });
}

int main(){
    queue myQueue;
    auto myContext = myQueue.get_context();
    auto myDev = myQueue.get_device();

    range<1> r { N };

    buffer<int, 1> a { r };

    auto exeBundle = get_kernel_bundle<bundle_state::executable>(myContext);
    
    std::vector<kernel_id> execKernelIds = exeBundle.get_kernel_ids();
    std::cout<<"executable kernel size : "<<execKernelIds.size()<<std::endl;
    for(auto kernel_id : execKernelIds){
        std::cout<<"exec kernel id name : "<<kernel_id.get_name()<<std::endl;
        kernel myKernel = exeBundle.get_kernel<bundle_state::executable>(kernel_id);
        std::cout<<"exec kernel id info function_name : "<<myKernel.get_info<info::kernel::function_name>()<<std::endl;
        std::cout<<"exec kernel id info reference count : "<<myKernel.get_info<info::kernel::reference_count>()<<std::endl;
        std::cout<<"exec kernel id info num_args : "<<myKernel.get_info<info::kernel::num_args>()<<std::endl;
        std::cout<<"exec kernel id info attributes : "<<myKernel.get_info<info::kernel::attributes>()<<std::endl;
        std::cout<<std::endl;
    }

    workFunction(a,myQueue,r);

    myQueue.wait();

    host_accessor rand {a,read_only};
    std::cout << std::endl << "Result:" << std::endl;
    for (size_t i = 0; i < N; i++) {
        std::cout<<"random array r["<<i<<"] = "<<r[i]<<std::endl;
        // std::cout<<"B B["<<i<<"] = "<<B[i]<<std::endl;
        // std::cout<<"C C["<<i<<"] = "<<C[i]<<std::endl;
        // Compare the result to the analytic value
        if (r[i] <= 0 || r[i] >= 100) {
            std::cout << "Wrong value " << r[i] << " on element " << i << " " << std::endl;
            exit(-1);
        }
    }
    std::cout << "Good Random distribution !" << std::endl;
    return 0;
}

CMakeLists.txt
cmake_minimum_required(VERSION 2.8.12)

set(DPCPP_HOME "/home/wzy/sycl_workspace")
set(DPCPP_SYCL_HOME "${DPCPP_HOME}/build-cuda-2022-09-debug")
# set(DPCPP_SYCL_HOME "${DPCPP_HOME}/build-cuda-2022-09")

set(CMAKE_C_COMPILER "${DPCPP_SYCL_HOME}/bin/clang")
set(CMAKE_CXX_COMPILER "${DPCPP_SYCL_HOME}/bin/clang++")
set(CMAKE_CXX_STANDARD 17)

project(function-objects)


include_directories("${DPCPP_SYCL_HOME}/include/sycl")
include_directories("${DPCPP_SYCL_HOME}/include")

message(STATUS "dpcpp_home : ${DPCPP_HOME}")
message(STATUS "dpcpp_cuda_sycl_home : ${DPCPP_SYCL_HOME}")


message(STATUS "find library path : ${DPCPP_SYCL_HOME}/lib")
set(CMAKE_BUILD_RPATH "${DPCPP_SYCL_HOME}/lib;${CMAKE_BUILD_RPATH}")
message(STATUS "cmake build rpath : ${CMAKE_BUILD_RPATH}")


set(CMAKE_BUILD_TYPE "Debug")
# set(CMAKE_BUILD_TYPE "Release")
set(CMAKE_CXX_FLAGS "-fsycl -fsycl-targets=nvptx64-nvidia-cuda")
set(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -ggdb -std=c++17")
set(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall -std=c++17")


link_directories("${DPCPP_SYCL_HOME}/lib")

aux_source_directory(. DIR_SRCS)
add_executable(function-objects ${DIR_SRCS})
target_include_directories(function-objects PRIVATE "${DPCPP_SYCL_HOME}/include/sycl")
target_include_directories(function-objects PRIVATE "${DPCPP_SYCL_HOME}/include")
target_link_libraries(function-objects PRIVATE sycl)

image

Environment (please complete the following information):

  • OS: Linux
  • Target device and vendor:Nvidia GPU
  • DPC++ version:Clang++ 16 2022-09
  • Dependencies version: cuda 11.2

Additional context

There are other problem.

1、what the mean of myKernel.get_infoinfo::kernel::num_args() ? Is parallel_for funcion object operator () arguments ?
2、why the kernel size from bundle_kernel not equals queue submit kernel size ?

@wangzy0327 wangzy0327 added the bug Something isn't working label Nov 29, 2022
@npmiller
Copy link
Contributor

The crash here is because in the verification loop you're using r which is the sycl::range when I think you mean to use rand, the host_accessor, changing that in this part makes the sample work:

    host_accessor rand {a,read_only};
    std::cout << std::endl << "Result:" << std::endl;
    for (size_t i = 0; i < N; i++) {
        std::cout<<"random array rand["<<i<<"] = "<<rand[i]<<std::endl;
        // std::cout<<"B B["<<i<<"] = "<<B[i]<<std::endl;
        // std::cout<<"C C["<<i<<"] = "<<C[i]<<std::endl;
        // Compare the result to the analytic value
        if (rand[i] <= 0 || rand[i] >= 100) {
            std::cout << "Wrong value " << rand[i] << " on element " << i << " " << std::endl;
            exit(-1);
        }
    }

@MrSidims MrSidims removed the bug Something isn't working label Nov 29, 2022
@wangzy0327
Copy link
Author

The crash here is because in the verification loop you're using r which is the sycl::range when I think you mean to use rand, the host_accessor, changing that in this part makes the sample work:

    host_accessor rand {a,read_only};
    std::cout << std::endl << "Result:" << std::endl;
    for (size_t i = 0; i < N; i++) {
        std::cout<<"random array rand["<<i<<"] = "<<rand[i]<<std::endl;
        // std::cout<<"B B["<<i<<"] = "<<B[i]<<std::endl;
        // std::cout<<"C C["<<i<<"] = "<<C[i]<<std::endl;
        // Compare the result to the analytic value
        if (rand[i] <= 0 || rand[i] >= 100) {
            std::cout << "Wrong value " << rand[i] << " on element " << i << " " << std::endl;
            exit(-1);
        }
    }

There are other problem.

1、what the mean of myKernel.get_infoinfo::kernel::num_args() ? Is parallel_for funcion object operator () arguments ?
2、why the kernel size from bundle_kernel not equals queue submit kernel size ?

@AlexeySachkov
Copy link
Contributor

Hi @wangzy0327,

1、what the mean of myKernel.get_infoinfo::kernel::num_args() ? Is parallel_for funcion object operator () arguments ?

Yes, that query returns a number of arguments your kernel has. When kernel is defined as a named function object, all it member variables are turned into kernel arguments, see 4.12.4. Rules for parameter passing to kernels:

If the kernel is a named function object, the operator() member function (or other member functions that it calls) may reference member variables inside the same named function object. Any such member variables become parameters to the kernel. If the kernel is a lambda function, any variables captured by the lambda become parameters to the kernel.

2、why the kernel size from bundle_kernel not equals queue submit kernel size ?

Could you please clarify what do you mean by "queue submit kernel size" here?

@wangzy0327
Copy link
Author

wangzy0327 commented Nov 29, 2022

Hi @wangzy0327,

1、what the mean of myKernel.get_infoinfo::kernel::num_args() ? Is parallel_for funcion object operator () arguments ?

Yes, that query returns a number of arguments your kernel has. When kernel is defined as a named function object, all it member variables are turned into kernel arguments, see 4.12.4. Rules for parameter passing to kernels:

If the kernel is a named function object, the operator() member function (or other member functions that it calls) may reference member variables inside the same named function object. Any such member variables become parameters to the kernel. If the kernel is a lambda function, any variables captured by the lambda become parameters to the kernel.

2、why the kernel size from bundle_kernel not equals queue submit kernel size ?

Could you please clarify what do you mean by "queue submit kernel size" here?

2、why the kernel size from bundle_kernel not equals queue submit kernel size ?

for the example

function_object.cc

#include <sycl/sycl.hpp>
#include <iostream>
#include <random>

using namespace sycl;

constexpr size_t N = 10;

class RandomFiller {
 public:
  RandomFiller(accessor<int> ptr)
      : ptr_ { ptr } {
    std::random_device hwRand;
    std::uniform_int_distribution<> r { 1, 100 };
    randomNum_ = r(hwRand);
  }
  void operator()(item<1> item) const { ptr_[item.get_id()] = get_random(); }
  int get_random() const { return randomNum_; }

 private:
  accessor<int> ptr_;
  int randomNum_;
};

void workFunction(buffer<int, 1>& b, queue& q, const range<1> r) {
  q.submit([&](handler& cgh) {
    accessor ptr { b, cgh };
    RandomFiller filler { ptr };

    cgh.parallel_for(r, filler);
  });
}

int main(){
    queue myQueue;
    auto myContext = myQueue.get_context();
    auto myDev = myQueue.get_device();

    range<1> r { N };

    buffer<int, 1> a { r };

    auto exeBundle = get_kernel_bundle<bundle_state::executable>(myContext);
    
    std::vector<kernel_id> execKernelIds = exeBundle.get_kernel_ids();
    std::cout<<"executable kernel size : "<<execKernelIds.size()<<std::endl;
    std::cout<<std::endl;
    for(auto kernel_id : execKernelIds){
        std::cout<<"exec kernel id name : "<<kernel_id.get_name()<<std::endl;
        kernel myKernel = exeBundle.get_kernel<bundle_state::executable>(kernel_id);
        std::cout<<"exec kernel id info function_name : "<<myKernel.get_info<info::kernel::function_name>()<<std::endl;
        std::cout<<"exec kernel id info reference count : "<<myKernel.get_info<info::kernel::reference_count>()<<std::endl;
        std::cout<<"exec kernel id info num_args : "<<myKernel.get_info<info::kernel::num_args>()<<std::endl;
        std::cout<<"exec kernel id info attributes : "<<myKernel.get_info<info::kernel::attributes>()<<std::endl;
        std::cout<<std::endl;
    }

    kernel myKernel = exeBundle.get_kernel<bundle_state::executable>(execKernelIds[0]);

    workFunction(a,myQueue,r);
    // myQueue.submit([&](handler& cgh){
    //     accessor ptr = {a,cgh};
    //     cgh.set_args(ptr);
    //     cgh.parallel_for(r,myKernel);
    // });

    myQueue.wait();

    host_accessor rand {a,read_only};
    std::cout << std::endl << "Result:" << std::endl;
    for (size_t i = 0; i < N; i++) {
        std::cout<<"random array rand["<<i<<"] = "<<rand[i]<<std::endl;
        if (rand[i] <= 0 || rand[i] >= 100) {
            std::cout << "Wrong value " << rand[i] << " on element " << i << " " << std::endl;
            exit(-1);
        }
    }
    std::cout << "Good Random distribution !" << std::endl;
    return 0;
}

In the example code, it just submit a kernel function by queue.But kernel_bundle get_kernel_ids() get 2 kernel function. In the print result, there are kernel_id :_ZTS12RandomFiller and kernel_id : _ZTSN4sycl3_V16detail18RoundedRangeKernelINS0_4itemILi1ELb1EEELi1E12RandomFillerEE

image

@AlexeySachkov
Copy link
Contributor

In the example code, it just submit a kernel function by queue.But kernel_bundle get_kernel_ids() get 2 kernel function. In the print result, there are kernel_id :_ZTS12RandomFiller and kernel_id : _ZTSN4sycl3_V16detail18RoundedRangeKernelINS0_4itemILi1ELb1EEELi1E12RandomFillerEE

The second kernel is implicitly create by our implementation. It is a part of "range rounding" feature (see #2703): when user passes a range into parallel_for, implementation is responsible for choosing work-group size of a kernel. However, if range contains prime numbers, implementation will have to use work-group size of 1, which will affect performance. To improve such cases, we generate a second kernel and launch it with increased global size so we are able to select bigger work-group size and get better performance results.

@wangzy0327
Copy link
Author

In the example code, it just submit a kernel function by queue.But kernel_bundle get_kernel_ids() get 2 kernel function. In the print result, there are kernel_id :_ZTS12RandomFiller and kernel_id : _ZTSN4sycl3_V16detail18RoundedRangeKernelINS0_4itemILi1ELb1EEELi1E12RandomFillerEE

The second kernel is implicitly create by our implementation. It is a part of "range rounding" feature (see #2703): when user passes a range into parallel_for, implementation is responsible for choosing work-group size of a kernel. However, if range contains prime numbers, implementation will have to use work-group size of 1, which will affect performance. To improve such cases, we generate a second kernel and launch it with increased global size so we are able to select bigger work-group size and get better performance results.

If I want to invoke kernel by online compilation way.,how do I invoke the kernel and set_args in the example? Can you help me?

void parallel_for(range<1> NumWorkItems, kernel Kernel)

@AlexeySachkov
Copy link
Contributor

If I want to invoke kernel by online compilation way.,how do I invoke the kernel and set_args in the example

You can find such example in the spec, see 4.11.15.1. Controlling the timing of online compilation

Essentially, the only change you need is to call handler::use_kernel_bundle within queue::submit lambda in order to instruct runtime to use your kernel bundle (which may be already in executable state) instead of creating one implicitly on the fly.

You do not need to pass a kernel object to parallel_for for that, your parallel_for should look like usual, RT will do all necessary steps under the hood automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants