Skip to content

Latest commit

 

History

History
111 lines (97 loc) · 7.98 KB

README.md

File metadata and controls

111 lines (97 loc) · 7.98 KB

Radeon Compute Profiler


Overview

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.

RCP was formerly delivered as part of CodeXL with the executable name "CodeXLGpuProfiler". Prior to its inclusion in CodeXL, it was known as "sprofile" and was part of the AMD APP Profiler product.

Table of Contents

Major Features

  • Measure the execution time of an OpenCL™ or ROCm/HSA kernel.
  • Query the hardware performance counters on an AMD Radeon graphics card.
  • Use the CXLActivityLogger API to trace and measure the execution of segments in the program.
  • Display the IL/HSAIL and ISA (hardware disassembly) code of OpenCL™ kernels.
  • Calculate kernel occupancy information, which estimates the number of in-flight wavefronts on a compute unit as a percentage of the theoretical maximum number of wavefronts that the compute unit can support.
  • When used with CodeXL, all profiler data can be visualized in a user-friendly graphical user interface.

What's New

  • Version 5.4 (6/22/18)
  • Adds support for additional GPUs and APUs.
  • Support for profiling OpenCL applications running on ROCm
  • OpenCL: Support for tracing OpenCL 2.1 and 2.2 APIs
  • ROCm/HSA: Support for ROCm 1.8.
  • ROCm/HSA: Support for tracing AMD vendor extensions.
  • Fixes an issue parsing occupancy data collected on systems with certain locale settings.
  • ROCm/HSA: Fixes an issue with garbage characters in the .atp file for some HSA API string parameters.
  • OpenCL: Fixes profiling on recent amdgpu-pro drivers using the legacy OpenCL stack.
  • OpenCL: Works around a driver issue where GPU clock frequencies remain fixed after profiling on GFX9-based GPUs.

System Requirements

Cloning the Repository

To clone the RCP repository, execute the following git commands

After cloning the repository, please run the following python script to retrieve the required dependencies (see BUILD.md for more information):

  • python Scripts/UpdateCommon.py

UpdateCommon.py has replaced the use of git submodules in the CodeXL repository

Source Code Directory Layout

  • Build -- contains both Linux and Windows build-related files
  • docs -- contains documentation sources
  • Scripts -- scripts to use to clone/update dependent repositories
  • Src/CLCommon -- contains source code shared by the various OpenCL™ agents
  • Src/CLOccupancyAgent -- contains source code for the OpenCL™ agent which collects kernel occupancy information
  • Src/CLProfileAgent -- contains source code for the OpenCL™ agent which collects hardware performance counters
  • Src/CLTraceAgent -- contains source code for the OpenCL™ agent which collects application trace information
  • Src/Common -- contains source code shared by all of RCP
  • Src/DeviceInfo -- builds a lib containing the Common/Src/DeviceInfo code (Linux only)
  • Src/HSAFdnCommon -- contains source code shared by the various ROCm agents
  • Src/HSAFdnPMC -- contains source code for the ROCm agent which collects hardware performance counters
  • Src/HSAFdnTrace -- contains source code for the ROCm agent which collects application trace information
  • Src/HSAUtils -- builds a lib containing the Common ROCm code (Linux only)
  • Src/MicroDLL -- contains source code for API interception (Windows only)
  • Src/PreloadXInitThreads -- contains source code for a library that call XInitThreads (Linux only)
  • Src/ProfileDataParser -- contains source code for a library can be used to parse profiler output data files
  • Src/VersionInfo -- contains version info resource files
  • Src/sanalyze -- contains source code used to analyze and summarize profiler data
  • Src/sprofile -- contains source code for the main profiler executable

Documentation

The documentation for the Radeon Compute Profiler can be found in each GitHub release. In the release RadeonComputeProfiler-v*.zip file or RadeonComputeProfiler-v*.tgz file, there will be a "docs" directory. Simply open the index.html file in a web browser to view the documentation.

The documentation is hosted publicly at: http://radeon-compute-profiler-rcp.readthedocs.io/en/latest/

Why version 5.x?

Although the Radeon Compute Profiler is a newly-branded tool, the technology contained in it has been around for several years. RCP has its roots in the AMD APP Profiler product, which progressed from version 1.x to 3.x. Then the profiler was included in CodeXL, and the codebase was labelled as version 4.x. Now that RCP is being pulled out of CodeXL and into its own codebase again, we've bumped the version number up to 5.x.

Known Issues

  • For the OpenCL Profiler
    • When collecting performance counters on Linux, the current user must have read access to

      /sys/class/drm/card<N>/device/power_dpm_force_performance_level

      where <N> is the index of the card in question. By default this file is only modifiable by root, so the profiler would have to be run as root in order for it to modify this file. It is possible to modify the permissions for the file instead so that it can be written by unprivileged users. The following command will achieve this. Note, however, that changing the permissions on a system file like this could circumvent security. Also, on multi-GPU systems, you may have to replace "card0" with the appropriate card number. Permissions on this file may be reset when rebooting the system:
      • sudo chmod ugo+w /sys/class/drm/card0/device/power_dpm_force_performance_level
  • For the ROCm Profiler
    • API Trace and Perf Counter data may be truncated or missing if the application being profiled does not call hsa_shut_down
    • Kernel occupancy information will only be written to disk if the application being profiled calls hsa_shut_down
    • When collecting a trace for an application that performs memory transfers using hsa_amd_memory_async_copy, if the application asks for the data transfer timestamps directly, it will not get correct timestamps. The profiler will show the correct timestamps, however.
    • When collecting an aql packet trace, if the application asks for the kernel dispatch timestamps directly, it will not get correct timestamps. The profiler will show the correct timestamps, however.
    • When the rocm-profiler package (.deb or .rpm) is installed along with rocm, it may not be able to generate the default single-pass counter files. If you do not see counter files in /opt/rocm/profiler/counterfiles, you can generate them manually with this command: "sudo /opt/rocm/profiler/bin/CodeXLGpuProfiler --list --outputfile /opt/rocm/profiler/counterfiles/counters --maxpassperfile 1"