Skip to content

Commit

Permalink
GPA 3.17 updates
Browse files Browse the repository at this point in the history
  • Loading branch information
eelliott-amd committed Sep 20, 2024
1 parent ebba99f commit 18d231a
Show file tree
Hide file tree
Showing 153 changed files with 5,468 additions and 15,322 deletions.
22 changes: 10 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,18 +31,16 @@ Prebuilt binaries can be downloaded from the Releases page: https://github.com/G
* Provides access to some raw hardware counters. See [Raw Hardware Counters](#raw-hardware-counters) for more information.

## What's New
### Version 3.16 (07/01/2024)
* Added support for additional RDNA 3 based APUs.
* GPA's OpenCL support has been temporarily disabled on RDNA 3 hardware.
* Updated error checking in counter splitting to report error if counter group max is zero.
* Disabled the following counters on RDNA 3 based hardware due to inconsistent results:
* CBMemRead, CBColorAndMaskRead, CBMemWritten, CBColorAndMaskWritten
* Disabled the following counters on RDNA 2 based hardware due to inconsistent results:
* VsGsVerticesIn, VsGsPrimsIn
* Disabled the following counters on RDNA based hardware due to inconsistent results:
* VsGsSALUBusy, VsGsSALUBusyCycles, VsGsVALUBusy, VsGsVALUBusyCycles, VsGsVALUInstCount, VsGsSALUInstCount, PSVALUBusy, PSVALUBusyCycles, PSVALUInstCount, PSSALUBusy, PSSALUBusyCycles, PSSALUInstCount
* Output from pre_build.py script is now generated into build\|win,linux|\ directory.
* Compiled binaries are now generated into build\output\ directory.
### Version 3.17 (09/20/2024)
* OpenGL: GPA is no longer supporting Adrenalin 19.6.3 and older drivers.
* On all hardware and APIs, the following counters were renamed for clarity:
* CSWavefronts was renamed to CSWavefrontsLaunched
* CSThreads was renamed to CSThreadsLaunched
* CSThreadGroups was renamed to CSThreadGroupsLaunched
* On all hardware and APIs the following counters were removed, there are already matching counters in the GlobalMemory group:
* CSMemUnitBusy, CSMemUnitBusyCycles, CSMemUnitStalled, CSMemUnitStalledCycles, CSWriteUnitStalled, CSWriteUnitStalledCycles
* CSALUStalledByLDS and CSALUStalledByLDSCycles are now based on per-wave cycle counts.
* On Radeon RX 5000 Series and newer hardware, counters in the ComputeShader group now have simplified equations.

## System Requirements
* An AMD Radeon GPU or APU based on Graphics IP version 8 and newer.
Expand Down
11 changes: 11 additions & 0 deletions RELEASE_NOTES.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# GPU Performance API Release Notes
---
# Version 3.17 (09/20/2024)
* OpenGL: GPA is no longer supporting Adrenalin 19.6.3 and older drivers.
* On all hardware and APIs, the following counters were renamed for clarity:
* CSWavefronts was renamed to CSWavefrontsLaunched
* CSThreads was renamed to CSThreadsLaunched
* CSThreadGroups was renamed to CSThreadGroupsLaunched
* On all hardware and APIs the following counters were removed, there are already matching counters in the GlobalMemory group:
* CSMemUnitBusy, CSMemUnitBusyCycles, CSMemUnitStalled, CSMemUnitStalledCycles, CSWriteUnitStalled, CSWriteUnitStalledCycles
* CSALUStalledByLDS and CSALUStalledByLDSCycles are now based on per-wave cycle counts.
* On Radeon RX 5000 Series and newer hardware, counters in the ComputeShader group now have simplified equations.

# Version 3.16 (07/01/2024)
* Added support for additional RDNA 3 based APUs.
* GPA's OpenCL support has been temporarily disabled on RDNA 3 hardware.
Expand Down
14 changes: 12 additions & 2 deletions build/cmake_modules/build_flags.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,22 @@ endif()

if(${build-32bit})
set(CMAKE_SIZEOF_VOID_P 4)
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_x86)
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_x86)
else()
set(CMAKE_SIZEOF_VOID_P 8)
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_x64)
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_x64)
endif()

if(${BUILD_ANDROID})
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_android)
endif()

# START_REMOVE_PIX_DURING_SANITIZATION
if (${GPA_PIX_BUILD})
set(GPA_PIX_BUILD ON)
set(OUTPUT_SUFFIX ${OUTPUT_SUFFIX}_pix)
endif()
# END_REMOVE_PIX_DURING_SANITIZATION

# DX11 variable
if(NOT DEFINED skipdx11)
Expand Down
4 changes: 2 additions & 2 deletions build/cmake_modules/common.cmake
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
## Copyright (c) 2018-2023 Advanced Micro Devices, Inc. All rights reserved.
cmake_minimum_required(VERSION 3.5.1)
## Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All rights reserved.
cmake_minimum_required(VERSION 3.10)

include (${GPA_CMAKE_MODULES_DIR}/utils.cmake)

Expand Down
2 changes: 1 addition & 1 deletion build/cmake_modules/defs.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ cmake_minimum_required(VERSION 3.19)

## Define the GPA version
set(GPA_MAJOR_VERSION 3)
set(GPA_MINOR_VERSION 16)
set(GPA_MINOR_VERSION 17)
set(GPA_UPDATE_VERSION 0)

if(NOT DEFINED GPA_BUILD_NUMBER)
Expand Down
5 changes: 1 addition & 4 deletions build/cmake_modules/targets.cmake
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Copyright (c) 2018-2023 Advanced Micro Devices, Inc. All rights reserved.
## Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All rights reserved.
cmake_minimum_required(VERSION 3.10)

## GPA has only Debug and Release
Expand Down Expand Up @@ -98,6 +98,3 @@ endif()
if(NOT ${skipdocs})
add_subdirectory(${GPA_SPHINX_DOCS} ${CMAKE_BINARY_DIR}/${GPA_SPHINX_DOCS_REL_PATH})
endif()



6 changes: 3 additions & 3 deletions build/dependencies_map.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2018-2023 Advanced Micro Devices, Inc. All rights reserved.
# dependencies_map.py
#
# Map of GitHub project names to clone target paths, relative to the GPUPerfAPI
Expand All @@ -14,10 +14,10 @@
"appsdk" : ["external/Lib/AMD/APPSDK", "55a6940ebc963daec69152314a1bb94943287d4c"],
"opengl" : ["external/Lib/Ext/OpenGL", "792c2291a4443ebef17ca5a7e3e24a1f854f0d1d"],
"windows_kits" : ["external/Lib/Ext/Windows-Kits", "51845a3771122a9dc1406b8617e9a67d9a2f55b6"],
"googletest" : ["external/Lib/Ext/GoogleTest", "191f9336bc9212b5f5410ab663176f685cafed2a"],
"googletest" : ["external/Lib/Ext/GoogleTest", "542e057c6c5bf45454b43764b881397b71164d62"],
# Src.
"adl_util" : ["external/Src/ADLUtil", "d62c94514326775c83fc129bb89d299c8749ebd1"],
"device_info" : ["external/Src/DeviceInfo", "00b23198e748e3d235f249cfee6604fce0d43c29"],
"device_info" : ["external/Src/DeviceInfo", "7379d082f1d8d64c9d1168b84b7f6b2a9702c82f"],
"dynamic_library_module" : ["external/Src/DynamicLibraryModule", "e6451ce26b8509cf724c7cf5d007878791143a58"],
"tsingleton" : ["external/Src/TSingleton", "02e8fa7d98f33cdbd0e1f77d1a8a403a32e35882"],
}
Expand Down
2 changes: 1 addition & 1 deletion docs/doxygen/DoxyfilePublic
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ PROJECT_NAME = "GPU Perf API"
# This could be handy for archiving the generated documentation or
# if some version control system is used.

PROJECT_NUMBER = 3.16
PROJECT_NUMBER = 3.17

# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute)
# base path where the generated documentation will be put.
Expand Down
4 changes: 2 additions & 2 deletions docs/sphinx/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@
# built documents.
#
# The short X.Y version.
version = u'3.16'
version = u'3.17'
# The full version, including alpha/beta/rc tags.
release = u'3.16'
release = u'3.17'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
14 changes: 4 additions & 10 deletions docs/sphinx/source/graphics_counter_tables_gfx10.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,9 +113,9 @@ ComputeShader Group
:header: "Counter Name", "Usage", "Brief Description"
:widths: 15, 10, 75

"CSThreadGroups", "Items", "Total number of thread groups."
"CSWavefronts", "Items", "The total number of wavefronts used for the CS."
"CSThreads", "Items", "The number of CS threads processed by the hardware."
"CSThreadGroupsLaunched", "Items", "Total number of thread groups launched."
"CSWavefrontsLaunched", "Items", "The total number of wavefronts launched for the CS."
"CSThreadsLaunched", "Items", "The number of CS threads launched and processed by the hardware."
"CSThreadGroupSize", "Items", "The number of CS threads within each thread group."
"CSVALUInsts", "Items", "The average number of vector ALU instructions executed per work-item (affected by flow control)."
"CSVALUUtilization", "Percentage", "The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of the wave size. Value range: 0% (bad), 100% (ideal - no thread divergence)."
Expand All @@ -127,16 +127,10 @@ ComputeShader Group
"CSVALUBusyCycles", "Cycles", "Number of GPU cycles where vector ALU instructions are processed."
"CSSALUBusy", "Percentage", "The percentage of GPUTime scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal)."
"CSSALUBusyCycles", "Cycles", "Number of GPU cycles where scalar ALU instructions are processed."
"CSMemUnitBusy", "Percentage", "The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound)."
"CSMemUnitBusyCycles", "Cycles", "Number of GPU cycles the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account."
"CSMemUnitStalled", "Percentage", "The percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad)."
"CSMemUnitStalledCycles", "Cycles", "Number of GPU cycles the memory unit is stalled. Try reducing the number or size of fetches and writes if possible."
"CSWriteUnitStalled", "Percentage", "The percentage of GPUTime the write unit is stalled."
"CSWriteUnitStalledCycles", "Cycles", "Number of GPU cycles the write unit is stalled."
"CSGDSInsts", "Items", "The average number of GDS read or GDS write instructions executed per work item (affected by flow control)."
"CSLDSInsts", "Items", "The average number of LDS read/write instructions executed per work-item (affected by flow control)."
"CSALUStalledByLDS", "Percentage", "The percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad)."
"CSALUStalledByLDSCycles", "Cycles", "Number of GPU cycles the ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible."
"CSALUStalledByLDSCycles", "Cycles", "Number of GPU cycles each wavefronts' ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible."
"CSLDSBankConflict", "Percentage", "The percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad)."
"CSLDSBankConflictCycles", "Cycles", "Number of GPU cycles the LDS is stalled by bank conflicts. Value range: 0 (optimal) to GPUBusyCycles (bad)."

Expand Down
32 changes: 22 additions & 10 deletions docs/sphinx/source/graphics_counter_tables_gfx103.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,12 @@ PreTessellation Group
:header: "Counter Name", "Usage", "Brief Description"
:widths: 15, 10, 75

"PreTessVALUInstCount", "Items", "Average number of vector ALU instructions executed for the VS and HS in a pipeline that uses tessellation. Affected by flow control."
"PreTessSALUInstCount", "Items", "Average number of scalar ALU instructions executed for the VS and HS in a pipeline that uses tessellation. Affected by flow control."
"PreTessVALUBusy", "Percentage", "The percentage of GPUTime vector ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation."
"PreTessVALUBusyCycles", "Cycles", "Number of GPU cycles vector where ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation."
"PreTessSALUBusy", "Percentage", "The percentage of GPUTime scalar ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation."
"PreTessSALUBusyCycles", "Cycles", "Number of GPU cycles where scalar ALU instructions are being processed for the VS and HS in a pipeline that uses tessellation."
"PreTessVerticesIn", "Items", "The number of vertices processed by the VS and HS when using tessellation."

PostTessellation Group
Expand All @@ -69,6 +75,12 @@ PostTessellation Group
:widths: 15, 10, 75

"PostTessPrimsOut", "Items", "The number of primitives output by the DS and GS when using tessellation."
"PostTessVALUInstCount", "Items", "Average number of vector ALU instructions executed for the DS and GS in a pipeline that uses tessellation. Affected by flow control."
"PostTessSALUInstCount", "Items", "Average number of scalar ALU instructions executed for the DS and GS in a pipeline that uses tessellation. Affected by flow control."
"PostTessVALUBusy", "Percentage", "The percentage of GPUTime vector ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation."
"PostTessVALUBusyCycles", "Cycles", "Number of GPU cycles vector where ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation."
"PostTessSALUBusy", "Percentage", "The percentage of GPUTime scalar ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation."
"PostTessSALUBusyCycles", "Cycles", "Number of GPU cycles where scalar ALU instructions are being processed for the DS and GS in a pipeline that uses tessellation."

PrimitiveAssembly Group
%%%%%%%%%%%%%%%%%%%%%%%
Expand Down Expand Up @@ -101,20 +113,20 @@ ComputeShader Group
:header: "Counter Name", "Usage", "Brief Description"
:widths: 15, 10, 75

"CSThreadGroups", "Items", "Total number of thread groups."
"CSWavefronts", "Items", "The total number of wavefronts used for the CS."
"CSThreads", "Items", "The number of CS threads processed by the hardware."
"CSThreadGroupsLaunched", "Items", "Total number of thread groups launched."
"CSWavefrontsLaunched", "Items", "The total number of wavefronts launched for the CS."
"CSThreadsLaunched", "Items", "The number of CS threads launched and processed by the hardware."
"CSThreadGroupSize", "Items", "The number of CS threads within each thread group."
"CSMemUnitBusy", "Percentage", "The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound)."
"CSMemUnitBusyCycles", "Cycles", "Number of GPU cycles the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account."
"CSMemUnitStalled", "Percentage", "The percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad)."
"CSMemUnitStalledCycles", "Cycles", "Number of GPU cycles the memory unit is stalled. Try reducing the number or size of fetches and writes if possible."
"CSWriteUnitStalled", "Percentage", "The percentage of GPUTime the write unit is stalled."
"CSWriteUnitStalledCycles", "Cycles", "Number of GPU cycles the write unit is stalled."
"CSVALUInsts", "Items", "The average number of vector ALU instructions executed per work-item (affected by flow control)."
"CSVALUUtilization", "Percentage", "The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of the wave size. Value range: 0% (bad), 100% (ideal - no thread divergence)."
"CSSALUInsts", "Items", "The average number of scalar ALU instructions executed per work-item (affected by flow control)."
"CSVFetchInsts", "Items", "The average number of vector fetch instructions from the video memory executed per work-item (affected by flow control)."
"CSSFetchInsts", "Items", "The average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control)."
"CSVWriteInsts", "Items", "The average number of vector write instructions to the video memory executed per work-item (affected by flow control)."
"CSGDSInsts", "Items", "The average number of GDS read or GDS write instructions executed per work item (affected by flow control)."
"CSLDSInsts", "Items", "The average number of LDS read/write instructions executed per work-item (affected by flow control)."
"CSALUStalledByLDS", "Percentage", "The percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad)."
"CSALUStalledByLDSCycles", "Cycles", "Number of GPU cycles the ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible."
"CSALUStalledByLDSCycles", "Cycles", "The average number of GPU cycles the each wavefronts' ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible."
"CSLDSBankConflict", "Percentage", "The percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad)."
"CSLDSBankConflictCycles", "Cycles", "Number of GPU cycles the LDS is stalled by bank conflicts. Value range: 0 (optimal) to GPUBusyCycles (bad)."

Expand Down
Loading

0 comments on commit 18d231a

Please sign in to comment.