forked from sandialabs/omega_h
-
Notifications
You must be signed in to change notification settings - Fork 10
Build and Run on Corona
Cameron Smith edited this page Nov 20, 2020
·
17 revisions
git clone [email protected]:SCOREC/omega_h.git
cd omega_h
git checkout clang11_hip
Create envCoronaRocm.sh
with the following contents:
module load opt
module load rocm/3.9.0
module unload intel
source envCoronaClang11.sh
mkdir build-omega-clang11
cd !$
cmake ../omega_h \
-DCMAKE_INSTALL_PREFIX=$PWD/install \
-DBUILD_SHARED_LIBS=OFF \
-DOmega_h_USE_HIP=OFF \
-DOmega_h_USE_Kokkos=OFF \
-DOmega_h_USE_MPI=OFF \
-DCMAKE_CXX_COMPILER=/opt/rocm-3.6.0/llvm/bin/clang++ \
-DOmega_h_CXX_WARNINGS=OFF \
-DBUILD_TESTING=ON
make -j4
ctest # all tests should pass
Allocate a mi60 node then build. SLURM will automatically spawn a shell on the allocated node.
Building on a login/front end node results in runtime errors.
salloc -N 1 -p mi60 -t 30
#wait for allocation
source envCoronaClang11.sh
mkdir build-omega-rocm39
cd !$
srcpath=/path/to/omegah/source
hipcc=`which hipcc`
rocm=${hipcc%%bin/hipcc}
export HIP_PATH=$rocm/hip
export CMAKE_PREFIX_PATH=$rocm:$CMAKE_PREFIX_PATH
cmake $srcpath \
-DCMAKE_INSTALL_PREFIX=$PWD/install \
-DBUILD_SHARED_LIBS=OFF \
-DOmega_h_USE_HIP=ON \
-DOmega_h_USE_Kokkos=OFF \
-DCMAKE_CXX_FLAGS="--amdgpu-target=gfx906" \
-DHIP_PATH=${rocm}/hip \
-DOmega_h_USE_MPI=OFF \
-DCMAKE_CXX_COMPILER=hipcc \
-DOmega_h_CXX_WARNINGS=OFF \
-DBUILD_TESTING=ON
make
Run tests
ctest
As of 790805, the run_unit_mesh
test fails with the following errors; all other tests pass:
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
/g/g19/smith516/develop/omega_h/src/Omega_h_qr.hpp:32: Vector<max_m> Omega_h::householder_vector(Omega_h::Int, Matrix<max_m, max_n>, Omega_h::Int) [max_m = 72, max_n = 4]: Device-side assertion `norm_x > 0.0' failed.
:0:rocdevice.cpp :2180: 675783340969 us: Device::callbackQueue aborting with status: 0x1016
Aborted (core dumped)
https://github.com/RadeonOpenCompute/ROCm/issues/1212 was resolved with rocm 3.9