Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocThrust support? #952

Open
StarGazerM opened this issue Oct 29, 2024 · 17 comments
Open

rocThrust support? #952

StarGazerM opened this issue Oct 29, 2024 · 17 comments

Comments

@StarGazerM
Copy link

StarGazerM commented Oct 29, 2024

Hi:

It's great to see rocPRIM is supported by chipSTAR, does that means rocThrust can also be supported out-of-box?

@pvelesko
Copy link
Collaborator

It doesn't build out of the box - compilation issues but I'll look into it

@pvelesko
Copy link
Collaborator

pvelesko commented Oct 30, 2024

Checkout rocThrust branch, download submodules. I was able to compile most of the rocThrust examples. Let me know what issues you encounter.

# build chipStar rocThrust branch and install
cmake ../ -DCHIP_BUILD_ROCPRIM=ON -DCMAKE_INSTALL_PREFIX=/space/pvelesko/install/HIP/chipStar/test
ninja install

# set your paths
cd chipStar/rocThrust
# cmake configure and install (not yet integrated into chipStar build system)
cmake ../ -DCMAKE_CXX_COMPILER=hipcc  -DCMAKE_INSTALL_PREFIX=/space/pvelesko/install/HIP/chipStar/test/rocthrust  -DBUILD_EXAMPLES=OFF -DCMAKE_C_COMPILER=hipcc

@StarGazerM

@StarGazerM
Copy link
Author

wow! this looks so great!! I will try it

@StarGazerM
Copy link
Author

I tried the build, seems there is a error pops when building the examples, but in upstream rocPRIME:

/usr/local/include/rocprim/intrinsics/atomic.hpp:51:16: error: no matching function for call to 'atomicAdd'

@StarGazerM StarGazerM reopened this Oct 30, 2024
@pvelesko
Copy link
Collaborator

Did you build exactly how I did? this should be fixed in the submodules

@StarGazerM
Copy link
Author

I turned on the -DBUILD_EXAMPLES=OFF on thrust build

@pvelesko
Copy link
Collaborator

a couple of examples fail to build due to a SPIR-V translation error but the atomicAdd was fixed.

The issue is that you have a previously installed version which picked is getting picked up. Previous version was built without the atomic fix.

Please remove it, and build again - this should build rocPRIM in-tree with the atomic fix.

@StarGazerM
Copy link
Author

StarGazerM commented Oct 31, 2024

I see, will try and report back !
is CHIP-SPV/rocPRIM@4894fd7 the patch you mentioned?(just curious what actually fix this)

@pvelesko
Copy link
Collaborator

No, the patch is in chipStar atomcis, which get copied into rocPRIM headers I believe

@StarGazerM
Copy link
Author

Here is the docker command I am using, I install everything into /usr/local, but seems still throw a function mismatch on atomicAdd

RUN git clone -b rocThrust https://github.com/CHIP-SPV/chipStar.git && \
    cd chipStar && git submodule update --init --recursive 
RUN module unload pocl; module load oneapi/2024.1.0 && which icpx  && \
    cd chipStar &&  \
    mkdir build && \
    cd build && \
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCHIP_BUILD_HIPBLAS=ON -DCHIP_BUILD_ROCPRIM=ON -DCMAKE_INSTALL_PREFIX=/usr/local && \
    sudo make -j$(nproc) && \
    /home/chipStarUser/chipStar/build/samples/0_MatrixMultiply/MatrixMultiply && \
    sudo make install 

RUN module unload pocl && module load oneapi/2024.1.0 &&  module load pocl && \
    cd /home/chipStarUser/chipStar/rocThrust && \
    cmake -Bbuild -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_C_COMPILER=hipcc -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local . && \
    sudo cmake --build build -j$(nproc) && \
    cd build && sudo make install

I saw

extern "C++" inline __device__ unsigned long atomicAdd(unsigned long *address,
in file, this should works seems?

@pvelesko
Copy link
Collaborator

Ah - try adding -DCHIP_ENABLE_NON_COMPLIANT_DEVICELIB_CODE=ON

@StarGazerM
Copy link
Author

StarGazerM commented Oct 31, 2024

after enable this flag, I got a linker error, does that means force enable this cause intwidth issue (could related to my hardware? I am using intel A770 for testing):

[ 57%] Building CXX object examples/CMakeFiles/example_thrust_scan_by_key.dir/scan_by_key.cu.o
InvalidBitWidth: Invalid bit width in input: 24

I also test the compiled program behavior (the sort exmaples I tested). Seems device_vector is not working correctly. It runs into an OOM when doing H->D(? this doesn't make sense to me). I can dig more into it, do you have some clue what could be the problem?

CHIP error [TID 4154] [1730387918.171449236] : hipErrorOutOfMemory (CL_OUT_OF_RESOURCES ) in /home/chipStarUser/chipStar/src/backend/OpenCL/CHIPBackendOpenCL.cc:1767:finish

CHIP error [TID 4154] [1730387918.171550807] : Caught Error: hipErrorOutOfMemory
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  __copy::trivial_device_copy H->D: failed: hipErrorOutOfMemory: hipErrorOutOfMemory
Aborted (core dumped)

@pvelesko
Copy link
Collaborator

pvelesko commented Oct 31, 2024

[ 57%] Building CXX object examples/CMakeFiles/example_thrust_scan_by_key.dir/scan_by_key.cu.o
InvalidBitWidth: Invalid bit width in input: 24

this is a bug in the LLVM -> SPIR-V Translator, filed an issue for it today KhronosGroup/SPIRV-LLVM-Translator#2823

Regarding the test, you didn't specify which sort test you ran.
I ran rocprim.device_merge_sort and only 2 out of 34 assertions fail.

26: [  PASSED  ] 32 tests.
26: [  FAILED  ] 2 tests, listed below:
26: [  FAILED  ] RocprimDeviceSortTests/13.SortKey, where TypeParam = DeviceSortParams<double, test_utils::custom_test_type<double>, rocprim::less<double> >
26: [  FAILED  ] RocprimDeviceSortTests/13.SortKeyValue, where TypeParam = DeviceSortParams<double, test_utils::custom_test_type<double>, rocprim::less<double> >

I'll have to look into this

@StarGazerM
Copy link
Author

I am running the example_thrust_sort under rocThrust/example
The issue is device/host copy failed when using device_vector. Both random access on device_vector and copy from its internal data.

@he97
Copy link

he97 commented Dec 18, 2024

Hello everyone, I am currently using chipstar to compile the test of rocm Thrust. How is your progress? I can only compile to 20% currently. Can you tell me how your progress is and how chipsatr supports rocThrust currently? Thank you so much

@StarGazerM
Copy link
Author

Hello everyone, I am currently using chipstar to compile the test of rocm Thrust. How is your progress? I can only compile to 20% currently. Can you tell me how your progress is and how chipsatr supports rocThrust currently? Thank you so much

rocThrust not working due to an upstream problem in LLVM's SPIR-V backend

@pvelesko
Copy link
Collaborator

How is your progress? I can only compile to 20% currently.

20% of samples?

Current status is that there are some SPIR-V issues when compiling certain samples. I'll be getting back to this after new years

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants