Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update docs on NsightCompute profiling with / in syntax #4362

Open
wants to merge 1 commit into
base: development
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions Docs/sphinx_documentation/source/External_Profiling_Tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -430,12 +430,15 @@ through the ``--nvtx``, ``--nvtx-include`` and ``--nvtx-exclude`` flags. For exa

::

ncu --nvtx --nvtx-include "Hydro()" --nvtx-exclude "StencilA(),StencilC()" -o kernels ${EXE} ${INPUTS} amrex.fpe_trap_invalid=0
ncu --nvtx --nvtx-include "Hydro()/" --nvtx-exclude "StencilA(),StencilC()" -o kernels ${EXE} ${INPUTS} amrex.fpe_trap_invalid=0

will return a file named ``kernels`` which contains an analysis of the CUDA kernels launched inside
the ``Hydro()`` region, ignoring any kernels launched inside ``StencilA()`` and ``StencilC()``.
When using the NVTX regions built into AMReX's TinyProfiler, be aware that the application must be built
with ``TINY_PROFILE=TRUE`` and the NVTX region names are identical to the TinyProfiler timer names.
Note that the ``/`` must be appended to the TinyProfiler timer name specified with ``--nvtx--include``
because TinyProfiler sets NVTX push/pop regions, as described in the Nsight Compute official documentation on
`NVTX Filtering <https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvtx-filtering>`_.

Another helpful flag for selecting a reasonable subset of kernels for analysis is the ``-c`` option. This
flag specifies the total number of kernels to be analyzed. For example:
Expand All @@ -444,13 +447,12 @@ flag specifies the total number of kernels to be analyzed. For example:

::

ncu --nvtx --nvtx-include "GravitySolve()" -c 10 -o kernels ${EXE} ${INPUTS} amrex.fpe_trap_invalid=0
ncu --nvtx --nvtx-include "GravitySolve()/" -c 10 -o kernels ${EXE} ${INPUTS} amrex.fpe_trap_invalid=0

will only analyze the first ten kernels inside of the ``GravitySolve()`` NVTX region.

For further details on how to choose a subset of CUDA kernels to analyze, or to run a more detailed
analysis, including CUDA hardware counters, refer to the Nsight Compute official documentation on
`NVTX Filtering <https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvtx-filtering>`_.
analysis, including CUDA hardware counters, refer to the Nsight Compute official documentation.


Roofline
Expand Down