In this tutorial you will learn how to:
- Add profiling APIs in graph and PS code to calculate design performance.
- Generate VCD file to view design information.
- Cross check between calculated performance with profiling APIs and vitis_analyzer trace view.
- Use profiling features to inspect design.
Start profiling API,
/// Start profiling and acquire resources needed for profiling. Should be called after graph::init().
/// @param io Plarform PLIO or GMIO object.
/// @param option io_profiling_option enum.
/// @param value Optional value for the specified option.
/// @return Return event::handle to be used by read_profiling and stop_profiling. Return event::invalid_handle for error conditions or unsupported use cases.
static handle start_profiling(IoAttr& io, io_profiling_option option, uint32 value = 0);
Read profiling API,
/// Read profiling.
/// @param h event::handle returned from start_profiling.
/// @return Profiling value.
static long long read_profiling(handle h);
Stop profiling API,
/// Stop profiling and release resources needed for profiling.
/// @param h event::handle returned from start_profiling.
static void stop_profiling(handle h);
1. Profiling APIs with AI Engine Simulator
2. Profiling APIs with HW Emulator
4. Cross Check I/O Performance values with VCD
Clone the project source from git repository and unzip the zip file.
Use this tutorial's graph.cpp.profile_api that contains AI Engine profiling APIs.
cd ${DOWNLOAD_PATH}/AI_Engine_Development/Feature_Tutorials/09-debug-walkthrough/aie
cp graph.cpp.profile_api graph.cpp
cd ..
cp Makefile.emu Makefile
make libadf.a
aiesimulator --pkg-dir=./Work --i=. --profile
vitis_analyzer ./aiesimulator_output/default.aierun_summary
After step 1.1 completed. Use this tutorial's host.cpp.profile_api that contains AI Engine profiling APIs and Makefile for HW Emulation.
cd ${DOWNLOAD_PATH}/AI_Engine_Development/Feature_Tutorials/09-debug-walkthrough/sw
cp host.cpp.profile_api host.cpp
cd ..
cp Makefile.emu Makefile
make
./launch_hw_emu.sh
After Petalinux boots up.
cd /run/media/mmcblk0p1
./host.exe a.xclbin
After step 1.1 completed. Use this tutorial's Makefile for Hardware.
cd ${DOWNLOAD_PATH}/AI_Engine_Development/Feature_Tutorials/09-debug-walkthrough
cp Makefile.profile_hw Makefile
make
Flash generated sd_card.img
to SD card.
Plug in flashed completed SD card to vck190 board's sd slot. Power up the vck190 board.
After vck190 board boots up and ready to accepts commands with Linux prompt, issue these commands from terminal.
cd /run/media/mmcblk0p1
./host.exe a.xclbin
Note: Due to slower memory access, hardware and hardware emulation performance values are not optimized and less than AI Engine simulation.
Command to generate AI Engine simulation VCD file.
aiesimulator --pkg-dir=./Work --i=. --dump-vcd=foo
Command to generate hardware emulation VCD file.
./launch_hw_emu.sh -add-env AIE_COMPILER_WORKDIR=${PROJECT_FULL_PATH}/Work -aie-sim-options ${PROJECT_FULL_PATH}/aiesim_options.txt
Where aiesim_options.txt
content is
AIE_DUMP_VCD=foo
Follow step 2.4 to run application.
To launch Vitis™ Analyzer for AI Engine simulation run.
vitis_analyzer ./aiesimulator_output/default.aierun_summary
To launch Vitis Analyzer for hardware emulation run.
vitis_analyzer ./sim/behav_waveform/xsim/default.aierun_summary
For hardware event trace steps that are available at AI Engine Debug with Event Trace
After default.aierun_summary
file is opened with Vitis Analyzer, select Graph view, locate the output file data/ublf_out0.txt
that associated with ulbfo0
output_plio object from graph.h
.
for(unsigned k=0;k<4; k++) {
ulout[k]=output_plio::create("ulbfo"+std::to_string(k), plio_64_bits, "data/ulbf_out"+std::to_string(k)+".txt");
connect<>(ulbf.out[k], ulout[k].in[0]);
}
This outpt_plio object is configured using profile API for output performance measurement. This can be found from host.cpp
.
event::handle handle1 = event::start_profiling(dut.ulout[0], event::io_stream_start_to_bytes_transferred_cycles, OUT_LEN*sizeof(cint16)*2);
...
if (handle1 != event::invalid_handle)
{
cycle_count1 = event::read_profiling(handle1);
}
...
if (cycle_count1)
{
double throughput1 = (double)OUT_LEN*2/(cycle_count1 * 1e-9); //samples per second
printf(" Output: Throughput %f samples\n", throughput1);
} else {
printf("cycle_count1 is ZERO!\n");
}
Switch to Trace
view, Locate tile(21,0).
To measure execution time with tool, move marker to beginning of kernel execution from an iteration. Add second marker and move the second marker to beginning of another iteration. This method is same to measure execution time for AI Engine Simulation, hardware emulation or on Hardware.
Above example indicates 9.400 microseconds (us) is used for 10 iterations execution time. Each iteration execution time is 0.94 us or 940 nanoseconds (ns) in average.
Per output file ulbf_out0.txt
, 38400 lines for 100 iterations. Each iteration processes and outputs 384 lines. Each line has 2 cint16 samples.
Performance calculation:
1,000,000,000(AI engine frequency in HZ) / 940(clock cycles each iteration) x 384(lines each iteration) x 2(samples per line) = 817,021,276.59(samples/second). This number is close to profiling API reported, 818,527,715.90 samples/s.
Due to limited AIE performance counters, calling AI Engine profiling APIs may return errors. Host code is required to check profiling APIs' return code to ensure correctness of profiling APIs.
GitHub issues will be used for tracking requests and bugs. For questions go to support.xilinx.com.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
XD005 | © Copyright 2021-2022 Xilinx, Inc.