Skip to content

Commit

Permalink
Update embedded SW build for Vitis intro tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
rwarmstr committed Feb 25, 2021
1 parent cc81d5b commit ab2f5dc
Show file tree
Hide file tree
Showing 9 changed files with 97 additions and 81 deletions.
4 changes: 2 additions & 2 deletions Getting_Started/Vitis/Part3.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,11 @@ There are 4 main steps in the source code for this simple example.

* **Step 1:** The OpenCL environment is initialized. In this section, the host detects the attached Xilinx device, loads the FPGA binary (.xclbin file) from file and programs it into the first Xilinx device it found. Then a command queue and the kernel object are created. All Vitis applications will have code very similar to the one in this section.

* **Step 2:** The application creates the three buffers needed to share data with the kernel: one for each input and one for the output. On data-center platforms, it is more efficient to allocate memory aligned on 4k page boundaries. On embedded platforms, it is more efficient to perform contiguous memory allocation. A simple way of achieving either of these is to let the Xilinx Runtime allocate host memory when creating the buffers. This is done by using the `CL_MEM_ALLOC_HOST_PTR` flag when creating the buffers and then mapping the allocated memory to user-space pointers.
* **Step 2:** The application creates the three buffers needed to share data with the kernel: one for each input and one for the output. On data-center platforms, it is more efficient to allocate memory aligned on 4k page boundaries. On embedded platforms, it is more efficient to perform contiguous memory allocation. A simple way of achieving either of these is to let the Xilinx Runtime allocate host memory when creating the buffers. This is done by using the cl::Buffer constructor to create the buffers and then mapping the allocated memory to user-space pointers.

```cpp
// Create the buffers and allocate memory
cl::Buffer in1_buf(context, CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY, sizeof(int) * DATA_SIZE, NULL, &err);
cl::Buffer in1_buf(context, CL_MEM_READ_ONLY, sizeof(int) * DATA_SIZE, NULL, &err);

// Map host-side buffer memory to user-space pointers
int *in1 = (int *)q.enqueueMapBuffer(in1_buf, CL_TRUE, CL_MAP_WRITE, 0, sizeof(int) * DATA_SIZE);
Expand Down
39 changes: 18 additions & 21 deletions Getting_Started/Vitis/Part4.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,16 +42,22 @@ There are slight differences when targeting data-center and embedded platforms.
source <VITIS_install_path>/settings64.sh
source <XRT_install_path>/setup.sh
unset LD_LIBRARY_PATH
source $XILINX_VITIS/data/emulation/qemu/unified_qemu_v5_0/environment-setup-aarch64-xilinx-linux
```

* Then make sure the following environment variables are correctly set to point to the your ZCU102 platform, rootfs and sysroot directories respectively.

```bash
export PLATFORM_REPO_PATHS=<path to the ZCU102 platform install dir>
export ROOTFS=<path to the ZYNQMP common image directory, containing rootfs>
export SYSROOT=$ROOTFS/sysroots/aarch64-xilinx-linux
```

To properly source the cross-compilation SDK, run the `environment-setup-aarch64-xilinx-linux` script in the directory
where you extracted the SDK source.

```bash
source <path to the SDK>/environment-setup-aarch64-xilinx-linux
```

*NOTE: The ZYNQMP common image file can be downloaded from the [Vitis Embedded Platforms](https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/embedded-platforms.html) page, and contains the Sysroot, Rootfs, and boot Image for Xilinx Zynq MPSoC devices.*


Expand All @@ -63,15 +69,15 @@ export SYSROOT=$ROOTFS/sysroots/aarch64-xilinx-linux
```bash
cd <Path to the cloned repo>/Getting_Started/Vitis/example/zcu102/sw_emu

aarch64-linux-gnu-g++ -Wall -g -std=c++11 ../../src/host.cpp -o app.exe -I${SYSROOT}/usr/include/xrt -L${SYSROOT}/usr/lib -lOpenCL -lpthread -lrt -lstdc++ --sysroot=${SYSROOT}
$(CXX) -Wall -g -std=c++11 ../../src/host.cpp -o app.exe -I/usr/include/xrt -lOpenCL -lpthread -lrt -lstdc++
v++ -c -t sw_emu --config ../../src/zcu102.cfg -k vadd -I../../src ../../src/vadd.cpp -o vadd.xo
v++ -l -t sw_emu --config ../../src/zcu102.cfg ./vadd.xo -o vadd.xclbin
v++ -p -t sw_emu --config ../../src/zcu102.cfg ./vadd.xclbin --package.out_dir package --package.rootfs ${ROOTFS}/rootfs.ext4 --package.sd_file ${ROOTFS}/Image --package.sd_file xrt.ini --package.sd_file app.exe --package.sd_file vadd.xclbin --package.sd_file run_app.sh
```


Here is a brief explanation of each of these five commands:
1. `aarch64-linux-gnu-g++` compiles the host application using the ARM cross-compiler.
1. `$(CXX)` compiles the host application using the ARM cross-compiler. This variable contains the full compiler executable plus flags relevant to cross-compilation, and is set when you source the SDK environment setup script.
2. `v++ -c` compiles the source code for the vector-add accelerator into a compiled kernel object (.xo file).
3. `v++ -l` links the compiled kernel with the target platform and generates the FPGA binary (.xclbin file).
4. `v++ -p` packages the host executable, the rootfs, the FPGA binary and a few other files and generates a bootable image.
Expand Down Expand Up @@ -100,13 +106,10 @@ data=all:all:all
* This command with launch software emulation, start the Xilinx Quick Emulation (QEMU) and initiate the boot sequence. Once Linux has finished booting, enter the following commands to run the example program:

```bash
mount /dev/mmcblk0p1 /mnt
cd /mnt
cp platform_desc.txt /etc/xocl.txt
cd /media/sd-mmcblk0p1
export XILINX_XRT=/usr
export XILINX_VITIS=/mnt
export XCL_EMULATION_MODE=sw_emu
./app.exe
./app.exe vadd.xclbin
```

* You should see the following messages, indicating that the run completed successfully:
Expand All @@ -129,7 +132,7 @@ TEST PASSED
```bash
cd ../hw_emu

aarch64-linux-gnu-g++ -Wall -g -std=c++11 ../../src/host.cpp -o app.exe -I${SYSROOT}/usr/include/xrt -L${SYSROOT}/usr/lib -lOpenCL -lpthread -lrt -lstdc++ --sysroot=${SYSROOT}
$(CXX) -Wall -g -std=c++11 ../../src/host.cpp -o app.exe -I/usr/include/xrt -lOpenCL -lpthread -lrt -lstdc++
v++ -c -t hw_emu --config ../../src/zcu102.cfg -k vadd -I../../src ../../src/vadd.cpp -o vadd.xo
v++ -l -t hw_emu --config ../../src/zcu102.cfg ./vadd.xo -o vadd.xclbin
v++ -p -t hw_emu --config ../../src/zcu102.cfg ./vadd.xclbin --package.out_dir package --package.rootfs ${ROOTFS}/rootfs.ext4 --package.sd_file ${ROOTFS}/Image --package.sd_file xrt.ini --package.sd_file app.exe --package.sd_file vadd.xclbin --package.sd_file run_app.sh
Expand All @@ -146,13 +149,10 @@ v++ -p -t hw_emu --config ../../src/zcu102.cfg ./vadd.xclbin --package.out_dir p
* Once Linux has finished booting, enter the following commands on the QEMU command line to run the example program:

```bash
mount /dev/mmcblk0p1 /mnt
cd /mnt
cp platform_desc.txt /etc/xocl.txt
cd /media/sd-mmcblk0p1
export XILINX_XRT=/usr
export XILINX_VITIS=/mnt
export XCL_EMULATION_MODE=hw_emu
./app.exe
./app.exe vadd.xclbin
```

* You should see messages that say TEST PASSED indicating that the run completed successfully
Expand All @@ -168,7 +168,7 @@ export XCL_EMULATION_MODE=hw_emu
```bash
cd ../hw

aarch64-linux-gnu-g++ -Wall -g -std=c++11 ../../src/host.cpp -o app.exe -I${SYSROOT}/usr/include/xrt -L${SYSROOT}/usr/lib -lOpenCL -lpthread -lrt -lstdc++ --sysroot=${SYSROOT}
$(CXX) -Wall -g -std=c++11 ../../src/host.cpp -o app.exe -I/usr/include/xrt -llOpenCL -lpthread -lrt -lstdc++
v++ -c -t hw --config ../../src/zcu102.cfg -k vadd -I../../src ../../src/vadd.cpp -o vadd.xo
v++ -l -t hw --config ../../src/zcu102.cfg ./vadd.xo -o vadd.xclbin
v++ -p -t hw --config ../../src/zcu102.cfg ./vadd.xclbin --package.out_dir package --package.rootfs ${ROOTFS}/rootfs.ext4 --package.sd_file ${ROOTFS}/Image --package.sd_file xrt.ini --package.sd_file app.exe --package.sd_file vadd.xclbin --package.sd_file run_app.sh
Expand All @@ -179,12 +179,9 @@ v++ -p -t hw --config ../../src/zcu102.cfg ./vadd.xclbin --package.out_dir packa
* After the build process completes, copy the sd_card directory to an SD card and plug it into the platform and boot until you see the Linux prompt. At that point, enter the following commands to execute the accelerated application:

```bash
mount /dev/mmcblk0p1 /mnt
cd /mnt
cp platform_desc.txt /etc/xocl.txt
cd /media/sd-mmcblk0p1
export XILINX_XRT=/usr
export XILINX_VITIS=/mnt
./app.exe
./app.exe vadd.xclbin
```

* You will see the same TEST PASSED message indicating that the run completed successfully.
Expand Down
97 changes: 51 additions & 46 deletions Getting_Started/Vitis/example/src/host.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,36 +38,36 @@ EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

// Forward declaration of utility functions included at the end of this file
std::vector<cl::Device> get_xilinx_devices();
char* read_binary_file(const std::string &xclbin_file_name, unsigned &nb);
char *read_binary_file(const std::string &xclbin_file_name, unsigned &nb);

// ------------------------------------------------------------------------------------
// Main program
// ------------------------------------------------------------------------------------
int main(int argc, char** argv)
int main(int argc, char **argv)
{
// ------------------------------------------------------------------------------------
// Step 1: Initialize the OpenCL environment
// ------------------------------------------------------------------------------------
// ------------------------------------------------------------------------------------
// Step 1: Initialize the OpenCL environment
// ------------------------------------------------------------------------------------
cl_int err;
std::string binaryFile = (argc != 2) ? "vadd.xclbin" : argv[1];
unsigned fileBufSize;
unsigned fileBufSize;
std::vector<cl::Device> devices = get_xilinx_devices();
devices.resize(1);
cl::Device device = devices[0];
cl::Context context(device, NULL, NULL, NULL, &err);
char* fileBuf = read_binary_file(binaryFile, fileBufSize);
char *fileBuf = read_binary_file(binaryFile, fileBufSize);
cl::Program::Binaries bins{{fileBuf, fileBufSize}};
cl::Program program(context, devices, bins, NULL, &err);
cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE, &err);
cl::Kernel krnl_vector_add(program,"vadd", &err);
cl::Kernel krnl_vector_add(program, "vadd", &err);

// ------------------------------------------------------------------------------------
// Step 2: Create buffers and initialize test values
// ------------------------------------------------------------------------------------
// Create the buffers and allocate memory
cl::Buffer in1_buf(context, CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY, sizeof(int) * DATA_SIZE, NULL, &err);
cl::Buffer in2_buf(context, CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY, sizeof(int) * DATA_SIZE, NULL, &err);
cl::Buffer out_buf(context, CL_MEM_ALLOC_HOST_PTR | CL_MEM_WRITE_ONLY, sizeof(int) * DATA_SIZE, NULL, &err);
// ------------------------------------------------------------------------------------
// Step 2: Create buffers and initialize test values
// ------------------------------------------------------------------------------------
// Create the buffers and allocate memory
cl::Buffer in1_buf(context, CL_MEM_READ_ONLY, sizeof(int) * DATA_SIZE, NULL, &err);
cl::Buffer in2_buf(context, CL_MEM_READ_ONLY, sizeof(int) * DATA_SIZE, NULL, &err);
cl::Buffer out_buf(context, CL_MEM_WRITE_ONLY, sizeof(int) * DATA_SIZE, NULL, &err);

// Map buffers to kernel arguments, thereby assigning them to specific device memory banks
krnl_vector_add.setArg(0, in1_buf);
Expand All @@ -76,40 +76,43 @@ int main(int argc, char** argv)

// Map host-side buffer memory to user-space pointers
int *in1 = (int *)q.enqueueMapBuffer(in1_buf, CL_TRUE, CL_MAP_WRITE, 0, sizeof(int) * DATA_SIZE);
int *in2 = (int *)q.enqueueMapBuffer(in2_buf, CL_TRUE, CL_MAP_WRITE, 0, sizeof(int) * DATA_SIZE);
int *in2 = (int *)q.enqueueMapBuffer(in2_buf, CL_TRUE, CL_MAP_WRITE, 0, sizeof(int) * DATA_SIZE);
int *out = (int *)q.enqueueMapBuffer(out_buf, CL_TRUE, CL_MAP_WRITE | CL_MAP_READ, 0, sizeof(int) * DATA_SIZE);

// Initialize the vectors used in the test
for(int i = 0 ; i < DATA_SIZE ; i++){
for (int i = 0; i < DATA_SIZE; i++)
{
in1[i] = rand() % DATA_SIZE;
in2[i] = rand() % DATA_SIZE;
out[i] = 0;
out[i] = 0;
}

// ------------------------------------------------------------------------------------
// Step 3: Run the kernel
// ------------------------------------------------------------------------------------
// ------------------------------------------------------------------------------------
// Step 3: Run the kernel
// ------------------------------------------------------------------------------------
// Set kernel arguments
krnl_vector_add.setArg(0, in1_buf);
krnl_vector_add.setArg(1, in2_buf);
krnl_vector_add.setArg(2, out_buf);
krnl_vector_add.setArg(3, DATA_SIZE);

// Schedule transfer of inputs to device memory, execution of kernel, and transfer of outputs back to host memory
q.enqueueMigrateMemObjects({in1_buf, in2_buf}, 0 /* 0 means from host*/);
q.enqueueMigrateMemObjects({in1_buf, in2_buf}, 0 /* 0 means from host*/);
q.enqueueTask(krnl_vector_add);
q.enqueueMigrateMemObjects({out_buf}, CL_MIGRATE_MEM_OBJECT_HOST);

// Wait for all scheduled operations to finish
q.finish();
// ------------------------------------------------------------------------------------
// Step 4: Check Results and Release Allocated Resources
// ------------------------------------------------------------------------------------

// ------------------------------------------------------------------------------------
// Step 4: Check Results and Release Allocated Resources
// ------------------------------------------------------------------------------------
bool match = true;
for (int i = 0 ; i < DATA_SIZE ; i++){
int expected = in1[i]+in2[i];
if (out[i] != expected){
for (int i = 0; i < DATA_SIZE; i++)
{
int expected = in1[i] + in2[i];
if (out[i] != expected)
{
std::cout << "Error: Result mismatch" << std::endl;
std::cout << "i = " << i << " CPU result = " << expected << " Device result = " << out[i] << std::endl;
match = false;
Expand All @@ -119,54 +122,56 @@ int main(int argc, char** argv)

delete[] fileBuf;

std::cout << "TEST " << (match ? "PASSED" : "FAILED") << std::endl;
std::cout << "TEST " << (match ? "PASSED" : "FAILED") << std::endl;
return (match ? EXIT_SUCCESS : EXIT_FAILURE);
}



// ------------------------------------------------------------------------------------
// Utility functions
// ------------------------------------------------------------------------------------
std::vector<cl::Device> get_xilinx_devices()
std::vector<cl::Device> get_xilinx_devices()
{
size_t i;
cl_int err;
std::vector<cl::Platform> platforms;
err = cl::Platform::get(&platforms);
cl::Platform platform;
for (i = 0 ; i < platforms.size(); i++){
for (i = 0; i < platforms.size(); i++)
{
platform = platforms[i];
std::string platformName = platform.getInfo<CL_PLATFORM_NAME>(&err);
if (platformName == "Xilinx"){
if (platformName == "Xilinx")
{
std::cout << "INFO: Found Xilinx Platform" << std::endl;
break;
}
}
if (i == platforms.size()) {
if (i == platforms.size())
{
std::cout << "ERROR: Failed to find Xilinx platform" << std::endl;
exit(EXIT_FAILURE);
}
//Getting ACCELERATOR Devices and selecting 1st such device

//Getting ACCELERATOR Devices and selecting 1st such device
std::vector<cl::Device> devices;
err = platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices);
return devices;
}
char* read_binary_file(const std::string &xclbin_file_name, unsigned &nb)

char *read_binary_file(const std::string &xclbin_file_name, unsigned &nb)
{
if(access(xclbin_file_name.c_str(), R_OK) != 0) {
if (access(xclbin_file_name.c_str(), R_OK) != 0)
{
printf("ERROR: %s xclbin not available please build\n", xclbin_file_name.c_str());
exit(EXIT_FAILURE);
}
//Loading XCL Bin into char buffer
//Loading XCL Bin into char buffer
std::cout << "INFO: Loading '" << xclbin_file_name << "'\n";
std::ifstream bin_file(xclbin_file_name.c_str(), std::ifstream::binary);
bin_file.seekg (0, bin_file.end);
bin_file.seekg(0, bin_file.end);
nb = bin_file.tellg();
bin_file.seekg (0, bin_file.beg);
char *buf = new char [nb];
bin_file.seekg(0, bin_file.beg);
char *buf = new char[nb];
bin_file.read(buf, nb);
return buf;
}
}
15 changes: 10 additions & 5 deletions Getting_Started/Vitis/example/zcu102/hw/Makefile
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
ndef = $(if $(value $(1)),,$(error $(1) must be set prior to running))

all: package/sd_card.img

app.exe: ../../src/host.cpp
aarch64-linux-gnu-g++ -Wall -g -std=c++11 ../../src/host.cpp -o app.exe \
-I${SYSROOT}/usr/include/xrt \
-L${SYSROOT}/usr/lib -lOpenCL -lpthread -lrt -lstdc++ --sysroot=${SYSROOT}
$(call ndef,SDKTARGETSYSROOT)
$(CXX) -Wall -g -std=c++11 ../../src/host.cpp -o app.exe \
-I/usr/include/xrt \
-lOpenCL -lpthread -lrt -lstdc++

vadd.xo: ../../src/vadd.cpp
v++ -c -t ${TARGET} --config ../../src/zcu102.cfg -k vadd -I../../src ../../src/vadd.cpp -o vadd.xo
Expand All @@ -12,9 +15,11 @@ vadd.xclbin: ./vadd.xo
v++ -l -t ${TARGET} --config ../../src/zcu102.cfg ./vadd.xo -o vadd.xclbin

package/sd_card.img: app.exe vadd.xclbin xrt.ini run_app.sh
v++ -p -t ${TARGET} --config ../../src/zcu102.cfg ./vadd.xclbin \
$(call ndef,ROOTFS)
v++ -p -t ${TARGET} --config ../../src/zcu102.cfg ./vadd.xclbin -o vadd.xclbin \
--package.out_dir package \
--package.rootfs ${ROOTFS}/rootfs.ext4 \
--package.sd_file vadd.xclbin \
--package.sd_file ${ROOTFS}/Image \
--package.sd_file xrt.ini \
--package.sd_file emconfig.json \
Expand All @@ -25,4 +30,4 @@ clean:
rm -rf vadd* app.exe *json *csv *log *summary _x package *.json .run .Xil .ipcache *.jou

# Unless specified, use the current directory name as the v++ build target
TARGET ?= $(notdir $(CURDIR))
TARGET ?= $(notdir $(CURDIR))
Empty file modified Getting_Started/Vitis/example/zcu102/hw/build.sh
100644 → 100755
Empty file.
13 changes: 9 additions & 4 deletions Getting_Started/Vitis/example/zcu102/hw_emu/Makefile
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@

ndef = $(if $(value $(1)),,$(error $(1) must be set prior to running))

all: package/sd_card.img

app.exe: ../../src/host.cpp
aarch64-linux-gnu-g++ -Wall -g -std=c++11 ../../src/host.cpp -o app.exe \
-I${SYSROOT}/usr/include/xrt \
-L${SYSROOT}/usr/lib -lOpenCL -lpthread -lrt -lstdc++ --sysroot=${SYSROOT}
$(call ndef,SDKTARGETSYSROOT)
$(CXX) -Wall -g -std=c++11 ../../src/host.cpp -o app.exe \
-I/usr/include/xrt \
-lOpenCL -lpthread -lrt -lstdc++

vadd.xo: ../../src/vadd.cpp
v++ -c -t ${TARGET} --config ../../src/zcu102.cfg -k vadd -I../../src ../../src/vadd.cpp -o vadd.xo
Expand All @@ -13,9 +16,11 @@ vadd.xclbin: ./vadd.xo
v++ -l -t ${TARGET} --config ../../src/zcu102.cfg ./vadd.xo -o vadd.xclbin

package/sd_card.img: app.exe emconfig.json vadd.xclbin xrt.ini run_app.sh
$(call ndef,ROOTFS)
v++ -p -t ${TARGET} --config ../../src/zcu102.cfg ./vadd.xclbin \
--package.out_dir package \
--package.rootfs ${ROOTFS}/rootfs.ext4 \
--package.sd_file vadd.xclbin \
--package.sd_file ${ROOTFS}/Image \
--package.sd_file xrt.ini \
--package.sd_file emconfig.json \
Expand All @@ -26,7 +31,7 @@ emconfig.json:
emconfigutil --platform xilinx_zcu102_base_202020_1 --nd 1

clean:
rm -rf vadd* app.exe *json *csv *log *summary _x package *.json .run .Xil .ipcache *.jou
rm -rf vadd* app.exe *json *csv *log *summary _x package *.json .run .Xil .ipcache *.jou *.xclbin

# Unless specified, use the current directory name as the v++ build target
TARGET ?= $(notdir $(CURDIR))
Empty file modified Getting_Started/Vitis/example/zcu102/hw_emu/build_and_run.sh
100644 → 100755
Empty file.
Loading

0 comments on commit ab2f5dc

Please sign in to comment.