Skip to content

Latest commit

 

History

History
181 lines (132 loc) · 8.37 KB

thetagpu-obsolete.md

File metadata and controls

181 lines (132 loc) · 8.37 KB

ThetaGPU

ThetaGPU is the GPU portion of Theta, and is distinct from the KNL nodes that make up most of Theta.

Context

This assumes you are looking to set up for development for the gccy3 project, developing DES analysis code in CosmoSIS running on GPUs. It also assumes you want to share the installed versions of cosmosis and gpuintegration, as well as all the other dependencies of y3_cluster_cpp.

Checking quota

Each project (e.g. gccy3) has a limited quota available on ThetaGPU. To check a project quota, from the login node use the following (substitute your project name, as needed):

sbank-list-allocations -p gccy3 -r all

This doesn't seem to work from a compute node.

Getting access

To do development of y3_cluster_cpp software, using CUDA, you need to be on a ThetaGPU compute node. Because you'll not be running MPI programs while doing development, you only need 1 GPU. These instructions will get you there.

First log into theta.alcf.anl.gov:

ssh theta.alcf.anl.gov

If you have set up two-factor authentication, this requires using the numeric code generated by the MobilePass+ app on your phone.

Then activate the module that will make the Cobalt commands (e.g. qsub) talk to the GPU node, rather than the KNL nodes:

module load cobalt/cobalt-gpu

The ALCF site for ThetaGPU has some instructions that are wrong. To get access to an interactive GPU node, do this:

qsub -I -q single-gpu -n 1 -t 60 -A gccy3 --attrs=pubnet

Specifying -t 0 is supposed to give you the maximum time allowed, but it fails with a message:

No handlers could be found for logger "Proxy"
<Fault 1001: "Walltime less than the 'single-gpu' queue min walltime of 00:05:00\n">

When the prompt returns, you are on a compute node with 1 GPU allocated.

Getting conda installed

We use an environment module that provides conda. We create our own conda environment to contain the necessary packages. Some are Python, some are C++, and some use Fortran.

Note that the ucx module used is specific to running on ThetaGPU, and was recommended during the May 2021 ALCF Software Performance Workshop. Note that the conda create command below generates a warning that a newer version of conda exists, and suggests upgrading. Because this version of conda is provided by an environment module, you can not update the version of conda. All the modules we will use will be installed in the environment we create, so this should not be an actual impediment.

module load nvhpc-byo-compiler/21.7
module load conda/2021-09-22
#export CUDA_HOME # because it is set, but not exported
#conda create -p /grand/gccy3/cosmosis-2 -c conda-forge astropy cffi cfitsio click cmake configparser cudatoolkit cxx-compiler cython emcee==2.2.1 fftw fitsio fortran-compiler future gsl jinja2 kombine matplotlib minuit2 mpi4py numba numpy nvcc_linux-64 openblas pybind11 pyccl pycparser pytest pyyaml sacc scikit-learn scipy six ucx pytest-runner
conda create -p /grand/gccy3/cosmosis-ompi -c conda-forge astropy cffi cfitsio click cmake configparser cudatoolkit cython emcee==2.2.1 fftw fitsio future gsl jinja2 kombine matplotlib minuit2 mpi4py numba numpy nvcc_linux-64 openblas pybind11 pycparser pytest pyyaml scipy six ucx pytest-runner

Working environment

I set up my working environment under /grand/gccy3. The conda environment installation is under /grand/gccy3/cosmosis. The top-level for the software installation stack (as opposed to conda itself) is /grand/gccy3/top_dir.

Note that this working environment is the one to be use after all the underlying products are built. Before that, not everything here is available. Separate instructions are found below, in the sections on installing the underlying products, for the working environment for those steps.

module load conda/2021-09-22
conda activate base
export CUDA_HOME # because it is set, but not exported
export OMPI_MCA_opal_cuda_support=true
export OMPI_MCA_pml="ucx"
export OMPI_MCA_osc="ucx"
export Y3GCC_DIR=/grand/gccy3/topdir
export Y3_CLUSTER_CPP_DIR=${Y3GCC_DIR}/y3_cluster_cpp
export Y3_CLUSTER_WORK_DIR=${Y3GCC_DIR}/y3_cluster_cpp
export LD_LIBRARY_PATH=${Y3GCC_DIR}/cuba/lib:$LD_LIBRARY_PATH
export http_proxy=http://theta-proxy.tmi.alcf.anl.gov:3128
export https_proxy=https://theta-proxy.tmi.alcf.anl.gov:3128
export HTTP_PROXY=http://theta-proxy.tmi.alcf.anl.gov:3128
export HTTPS_PROXY=https://theta-proxy.tmi.alcf.anl.gov:3128
export PAGANI_DIR=/grand/gccy3/topdir/gpuintegration
cd ${Y3GCC_DIR}/cosmosis
source config/setup-conda-cosmosis /grand/gccy3/cosmosis-2

This should result in a shell in which nvcc picks up the GCC 9.4.0 compiler that is part of the conda environment, rather than the GCC 9.3.0-17ubuntu1~20.04 that comes from the OS. It should also make python be Python v3.9.7, rather than 2.7.18 that is the system python.

We are trying to use a conda environment, because that handles binary compatibility in installation. Installation of python libraries with pip would require more care to assure that any Fortran or C++ was compiled with the right compiler and with the right switches, for binary compatibility.

Building

The following is done to build the software. Most needs to be done only once; the only software we are generally modifying is in y3_cluster_cpp itself. Do the setup below before going through these build steps.

module load conda/2021-06-28
conda activate /grand/gccy3/cosmosis-2
export CUDA_HOME # because it is set, but not exported
export OMPI_MCA_opal_cuda_support=true
export OMPI_MCA_pml="ucx"
export OMPI_MCA_osc="ucx"
export Y3GCC_DIR=/grand/gccy3/topdir
export Y3_CLUSTER_CPP_DIR=${Y3GCC_DIR}/y3_cluster_cpp
export Y3_CLUSTER_WORK_DIR=${Y3GCC_DIR}/y3_cluster_cpp
export LD_LIBRARY_PATH=${Y3GCC_DIR}/cuba/lib:$LD_LIBRARY_PATH
export http_proxy=http://theta-proxy.tmi.alcf.anl.gov:3128
export https_proxy=https://theta-proxy.tmi.alcf.anl.gov:3128
export HTTP_PROXY=http://theta-proxy.tmi.alcf.anl.gov:3128
export HTTPS_PROXY=https://theta-proxy.tmi.alcf.anl.gov:3128
export PAGANI_DIR=/grand/gccy3/topdir/gpuintegration

Clone repositories

Note that we have to use the HTTP protocol; ssh and https do not work from ThetaGPU compute nodes. You may get asked for a username and password, for y3_cluster_cpp.

mkdir -p ${Y3GCC_DIR}
# Clone repositories
git clone http://github.com/marcpaterno/cuba.git
git clone http://bitbucket.org/mpaterno/y3_cluster_cpp.git
git clone http://bitbucket.org/mpaterno/cubacpp.git
git clone http://github.com/marcpaterno/gpuintegration.git
git clone -b better-exceptions http://bitbucket.org/mpaterno/cosmosis.git
cd cosmosis/
git clone -b develop http://bitbucket.org/mpaterno/cosmosis-standard-library.git
cd ${Y3GCC_DIR}

Install cluster_toolkit

cluster_toolkit is not available from Conda, so I have to build it and install it myself.

wget https://github.com/marcpaterno/cluster_toolkit/archive/master.tar.gz
tar xf master.tar.gz
cd cluster_toolkit-master/
python3 setup.py install     # This will install into the environment
cd ../
rm -r cluster_toolkit-master/
rm master.tar.gz

Build CUBA

Note that LD_LIBRARY_PATH above is already set to include the directory into which we will be placing the CUBA dynamic library.

cd ${Y3GCC_DIR}/cuba
# We do not set CC, because it is already set by the environment modules
./makesharedlib.sh
mkdir include
mkdir lib
mv cuba.h include/
mv libcuba.so lib/
cd ${Y3GCC_DIR}

Build cosmosis

cd ${Y3GCC_DIR}/cosmosis
conda deactivate  #
source config/setup-conda-cosmosis /grand/gccy3/cosmosis-2
make

Build gpuintegration

mkdir -p ${PAGANI_DIR}/cudaPagani/build
cd ${PAGANI_DIR}/cudaPagani/build
cmake ../ -DPAGANI_TARGET_ARCH="80-real" -DPAGANI_DIR=${PAGANI_DIR} -DCMAKE_BUILD_TYPE="Release"

Build y3_cluster_cpp

cd ${Y3_CLUSTER_CPP_DIR}
cmake -DUSE_CUDA=On -DY3GCC_TARGET_ARCH="80-real" -DPAGANI_DIR=${PAGANI_DIR} -DCMAKE_MODULE_PATH="${Y3_CLUSTER_CPP_DIR}/cmake;$Y3GCC_DIR/cubacpp/cmake/modules" -DCUBACPP_DIR=${Y3GCC_DIR}/cubacpp -DCUBA_DIR=${Y3GCC_DIR}/cuba -DCMAKE_BUILD_TYPE=Release .

Sometimes this command produces a warning that PAGANI_DIR was defined and not used. This is incorrect; PAGANI_DIR is used in multiple CMakeLists.txt files.