-
Notifications
You must be signed in to change notification settings - Fork 4
Home
- HSA-OpenMP work aim towards enabling OpenMP users to target HSA device with minimal effort. This involves one-time-setup of HSA platform, building OpenMP applications using GCC(from hsa branch) and running on a HSA device.
- NOTE: The initial work started with supporting some of the constructs of OpenMP 3.1 spec
- If you are new to HSA, refer http://developer.amd.com/resources/heterogeneous-computing/what-is-heterogeneous-system-architecture-hsa/ to familiarize yourself
- Detailed material presented at ISCA - http://www.slideshare.net/hsafoundation/isca-2014-heterogeneous-system-architecture-hsa-architecture-and-algorithms-tutorial
This release is intended for use with any hardware configuration that contains a Kaveri APU. The motherboards must support the FM2+ socket, run latest BIOS version and have the IOMMU enabled in the BIOS. The following is a reference hardware configuration that was used for testing purposes:
- APU: AMD A10-7850K APU
- Motherboard: ASUS A88X-PRO motherboard (ATX form factor)
- Memory: G.SKILL Ripjaws X Series 16GB (2 x 8GB) 240-Pin DDR3 SDRAM DDR3 2133
- No discrete GPU present in the system
- HSA enabled kernel image available at https://github.com/HSAFoundation/Linux-HSA-Drivers-And-Images-AMD
- OKRA (Offloadable Kernel Runtime API) interface available at https://github.com/HSAFoundation/Okra-Interface-to-HSA-Device. OKRA uses the HSA Runtime API and the implementation from: https://github.com/HSAFoundation/HSA-Runtime-AMD which is based on 1.0 Provisional specification
Actual set-up is tested with Ubuntu and OpenSuSE platform.To download:
- Ubuntu : 14.04 64-bit edition available at http://www.ubuntu.com/download
- OpenSuSE : Install x86_64 openSUSE 13.1 from http://software.opensuse.org/131/en.
- build-dependency package. On Ubuntu, run "sudo apt-get build-dep gcc" at shell prompt
- build-essential package. On Ubuntu, run "sudo apt-get install build-essential" at shell prompt
- Flex, bison, git, gcc, gcc-c++, make, libelf-dev
- GCC with HSA support available in 'hsa' branch at http://gcc.gnu.org/svn/gcc/branches/hsa/
There is also a README in svn repository with recipe instructions (Ref: https://gcc.gnu.org/viewcvs/gcc/branches/hsa/gcc/README.hsa?view=markup)
- Until all of the HSA drivers and features are available in stock Linux and have been pulled down into distribution we will need a special HSA enabled kernel image
- Please refer the section "Installing and configuring the kernel" in https://github.com/HSAFoundation/Linux-HSA-Drivers-And-Images-AMD for up-to-date kfd installation instruction
$ cd ~
$ git clone https://github.com/HSAFoundation/Linux-HSA-Drivers-And-Images-AMD.git
From here we can install our new image and setup the HSA KFD (the driver for HSA)and reboot to the new kernel.
KFD and Firmware for Ubuntu is pre-packaged and available in just 'cloned' HSA-Drivers site
$ cd ~/HSA-Drivers-Linux-AMD
$ sudo dpkg -i kfd-0.9/ubuntu/*.deb
- If you face problems with graphical installation - hang when it was about to switch to a graphics mode - use text based one using textmode=1 boot option when in GRUB menu. If you happen to run your installer in EFI mode, passing boot options is performed by pressing the e key while "Installation" isselected and then adding "textmode=1" at the end of the line beginning with linuxefi. In non-EFI GRUB, you write boot options directly when you highlight "Installation" so just type "textmode=1" there and press Enter. Alternatively, one can also set kernel boot options with "nomodeset"
- KFD for OpenSuSE has to be explicitly pulled from a different repository
$ zypper addrepo http://download.opensuse.org/repositories/home:/jamborm:/hsa-kfd/openSUSE_13.1_standard/ kfd-kernels
$ zypper install --from kfd-kernels kernel-default kmodule-ordering
You will be asked whether you trust a key with fingerprint D0B75ED5820890CD9F7D54E3740AE7AF302697A8. Please do.
Firmware need to be downloaded and copied to the location as specified below:
$ wget http://people.freedesktop.org/~gabbayo/kfd-v0.9/radeon_ucode.tar.gz
$ tar xzf radeon_ucode.tar.gz
$ cp -iv radeon_ucode/kaveri_*.bin /usr/lib/firmware/radeon/
- If you have used "nomodeset" option in kernel boot, remove "nomodeset" from /etc/default/grub as mentioned in http://doc.opensuse.org/release-notes/x86_64/openSUSE/13.1/#sec.114.kms
$ cd ~/HSA-Drivers-Linux-AMD
$ echo "KERNEL==\"kfd\", MODE=\"0666\"" | sudo tee /etc/udev/rules.d/kfd.rules
$ sudo reboot
- After reboot, 'uname -a' will show something like:
Linux Kaveri-HSA 3.14.11-031460-generic #201409270116 SMP Sat Sep 27 01:17:40 IDT 2014 x86_64 x86_64 x86_64 GNU/Linux
Now we need a runtime for executing HSAIL code. To get latest runtime:
$ cd ~
$ git clone https://github.com/HSAFoundation/HSA-Runtime-AMD
- OKRA is a runtime library that enables applications to do compute offloads to HSA-enabled GPUs.
- GCC uses this OKRA API to lauch kernel on GPU. Latest OKRA can be downloaded from HSA foundation repository
$ cd ~
$ git clone https://github.com/HSAFoundation/Okra-Interface-to-HSA-Device.git
- Pull the GCC sources from hsa branch. Create source, build and installation directory under gcc directory
$ mkdir gcc
$ cd gcc
$ svn co svn://gcc.gnu.org/svn/gcc/branches/hsa src
- Pull mpc, mpfr, gmp pre-requisites required for GCC build. If you still face issues in building GCC, refer exhaustive list of prerequisites at https://gcc.gnu.org/install/prerequisites.html
$ ./src/contrib/download_prerequisites
- Build GCC.
$ cd ..
$ mkdir build
$ cd build
$ ../src/configure --disable-bootstrap --enable-languages=c,c++,fortran --prefix=$(DESTINATION)
$ make
- Install GCC - This will install the gcc in $(DESTINATION) directory you specified before
$ make install
- Run kfd_check_installation.sh script available in HSA enabled kernel image that tests HSA setup. If successful, output will look like:
$ cd ~/HSA-Drivers-Linux-AMD
$ ./kfd_check_installation.sh
Kaveri detected:............................Yes
Kaveri type supported:......................Yes
Radeon module is loaded:....................Yes
KFD module is loaded:.......................Yes
AMD IOMMU V2 module is loaded:..............Yes
KFD device exists:..........................Yes
KFD device has correct permissions:.........Yes
Valid GPU ID is detected:...................Yes
Can run HSA.................................YES
- To sanity check your install you can run a small Squares test app (binary) - Set right environment variables (of OKRA, HSA-Runtime, libhsakmt in LD_LIBRARY_PATH)in env.sh
$ cd ~/Okra-Interface-to-HSA-Device/okra/samples/
$ cat env.sh
#Set the below environment variables appropriately
#This should point to top of the directory obtained after downloading from this github repo: https://github.com/HSAFoundation/Okra-Interface-to-HSA-Device
HSA_OKRA_PATH=$PATH_TO_OKRA/Okra-Interface-to-HSA-Device
#This should point to top of the directory obtained after downloading from this github repo: https://github.com/HSAFoundation/HSA-Runtime-AMD
HSA_RUNTIME_PATH=$PATH_TO_HSA_RUNTIME/HSA-Runtime-AMD
#This should point to the libhsakmt directory obtained after downloading from this github repo: https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD/tree/master/kfd-0.9/libhsakmt
HSA_KMT_PATH=$PATH_TO_HSA_DRIVERS/HSA-Drivers-Linux-AMD/kfd-0.9/libhsakmt
#OKRA has been tested only with 64 bit, so make sure you point to 64-bit binaries of HSA Runtime and KMT libraries
export LD_LIBRARY_PATH=$HSA_OKRA_PATH/okra/dist/bin:$HSA_RUNTIME_PATH/lib/x86_64:$HSA_KMT_PATH/lnx64a:$LD_LIBRARY_PATH
$ source env.sh
$ ./runSquares.sh
using source from Squares.hsail
0->0, 1->1, 2->4, 3->9, 4->16, 5->25, 6->36, 7->49, 8->64, 9->81, 10->100, 11->121, 12->144, 13->169, 14->196, 15->225, 16->256, 17->289, 18->324, 19->361, 20->400, 21->441, 22->484, 23->529, 24->576, 25->625, 26->676, 27->729, 28->784, 29->841, 30->900, 31->961, 32->1024, 33->1089, 34->1156, 35->1225, 36->1296, 37->1369, 38->1444, 39->1521,
PASSED
- Download the samples
$ git clone https://github.com/HSAFoundation/HSA-OpenMP-GCC-AMD.git
- Set OKRA, HSA-Runtime, libhsakmt in $LD_LIBRARY_PATH and point to just built GCC in $PATH(in 'env.sh'
$ cd HSA-OpenMP-GCC-AMD/samples
$ cat env.sh
#Set the below environment variables appropriately
#This should point to top of the directory obtained after downloading from this github repo: https://github.com/HSAFoundation/Okra-Interface-to-HSA-Device
HSA_OKRA_PATH=$PATH_TO_OKRA/Okra-Interface-to-HSA-Device
#This should point to top of the directory obtained after downloading from this github repo: https://github.com/HSAFoundation/HSA-Runtime-AMD
HSA_RUNTIME_PATH=$PATH_TO_HSA_RUNTIME/HSA-Runtime-AMD
#This should point to the libhsakmt directory obtained after downloading from this github repo: https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD/tree/master/kfd-0.9/libhsakmt
HSA_KMT_PATH=$PATH_TO_HSA_DRIVERS/HSA-Drivers-Linux-AMD/kfd-0.9/libhsakmt
#OKRA has been tested only with 64 bit, so make sure you point to 64-bit binaries of HSA Runtime and KMT libraries
# GCC built from HSA branch - https://gcc.gnu.org/svn/gcc/branches/hsa
export GCC_HSA=<EDIT PATH>
export LD_LIBRARY_PATH=$HSA_LIBRARY_PATH:$GCC_HSA/lib64:$GCC_HSA/lib32:$LD_LIBRARY_PATH
$ source env.sh
- Build and run vectorCopy
$ cd vectorCopy
$ make
$ ./run.sh
Vector Copy - Passed
If you get an error "Unable to load libokra_x86_64.so," typically it means the path set in 'env.sh' is not quite right.
- Build and run matrixMultiply
$ cd matrixMultiply
$ make
$ ./run.sh
Matrix multiplication - Passed
- NOTE1: HSA run time will expect the HSA kernel in object file with the same name as the input file, only with the suffix changed to .o, in the current working directory when executing the program. If you use LTO, there is no input file (such as when compiling from standard input) or the input file name does not have a dot in it, run-time will expect the HSA ELF sections in a file called hsakernel.o. This is a temporary situation and will be fixed,of course.
- NOTE2: If you also provide the -fdump-tree-ompexp-details option to the compiler, it will create a file with .ompexp suffix which you can search for optimization notes indicating whether the compiler has succeeded in turning OMP loops into kernels stripped off all OMP-generated control flow and suitable for a GPGPU. If it for some reason failed, the note will also give you the reason why. In vectorCopy example, however, it reports success like this:
omp_veccopy.c:13:12: note: Parallel construct will be turned into an HSA kernel
- Compile time
- GCC transforms user marked parallel regions in OpenMP programs to BRIG (binary representation of textual HSAIL)
- Embeds BRIG into host code
- Set-up runtime calls to use the generated BRIG and to launch kernel
- Run time
- Uses OKRA layer to launch kernel with right dimension, grid and group size
HSA foundation has tools to assemble (HSAIL to BRIG) and disassemble (BRIG to HSAIL) at https://github.com/HSAFoundation/HSAIL-Tools. Download the HSAIL-Tools, follow the README instructions to build, use the disassembler to read the BRIG generated by GCC
$ git clone https://github.com/HSAFoundation/HSAIL-Tools
$ cd libHSAIL
$ make -j LLVM_CONFIG=llvm-config-3.2
$ ./build_linux/hsailasm -disassemble omp_veccopy.o ==> Generates omp_veccopy.hsail
Complete support for OpenMP 3.1 targeting HSA is still ongoing. The current limitations are:
- Unsupported OpenMP constructs:
- Non-looping construct like "omp section"
- Multiple OMP constructs within OMP parallel
- parallel construct within another parallel construct
- Schedule kind - Dynamic, guided and runtime
- Collapse >1
- Reductions
- Limited support of OpenMP runtime calls
- NOTE: If you provide the -fdump-tree-ompexp-details option to the compiler, it will create a file with .ompexp suffix. This will have reason why turning OMP loops into kernels failed.
- Read/Write of globals in Kernel that is declared in host, is not supported yet. GCC would emit a warning describing about such global variable access. Correctness of program is not guaranteed in such cases.
- Scope to improve register allocation (and reduce spilling)
- Function calls: All function calls in a kernel, defined within same compilation unit, gets inlined at >=O1. Across multiple compilation units, one can perform Link time optimization (-flto -flto-partitions=none) to inline those functions.