Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge merge for AMD GPUs #497

Merged
merged 217 commits into from
Feb 23, 2022
Merged

Huge merge for AMD GPUs #497

merged 217 commits into from
Feb 23, 2022

Conversation

mxz297
Copy link
Collaborator

@mxz297 mxz297 commented Jan 5, 2022

This is a huge merge to develop. It includes three main components:

  1. Preliminary rocprofiler support for getting AMD GPU hardware counters.
  2. Even more preliminary support for AOMP (AMD's openmp implementation). Merged from https://github.com/HPCToolkit/hpctoolkit/tree/ompt-amd.
  3. Breakdown of hpcrun initialization and reordering. Merged from https://github.com/dejangrubisic/hpctoolkit/tree/papi_fix_libmonitor_merged_dev.

More details about the rocprofiler support:

  1. Must provide -e gpu=amd together with rocprofiler counter events as we need to start and stop counter collection around kernel launches using the subscriber callback provided by roctracer. We can consider creating a new shared library to wrap HIP kernel launch APIs to decouple rocprofiler from roctracer.
  2. GPU kernel launches are currently serialized because rocprofiler collect counters at device level
  3. Extend correlation and activity channels to support multiple tools threads. Currently, we have both roctracer threads and rocprofiler threads running at the same.
  4. Implement gpu counters as a variable length array whose length is the number counter metrics specified at the command line. This is necessary as there could be hundreds of different GPU counters. This handling is similar to our handling for perf.
  5. Lightly tested with rocm-4.5.2 on ufront. I currently only checked whether GPU counter metrics show up in the viewer and whether the values are obviously wrong.

I expect the following things to be done in separate PRs:

  1. Test and enhance the support on more AMD GPU systems and rocm versions
  2. Comprehensive testing, reasoning, and validating the results with ECP codes and compare results directly with rocprofiler

mxz297 and others added 30 commits June 19, 2020 16:33
This is intended to deal with sample source init code that will dlopen
other libraries (such as PAPI)
1. init load map
2. set up debug flag, log file and measurement directory
3. save vdso
4. fnbounds init
5. register sample source

This re-ordering is intended to monitor dynamic libraryes loaded during
sampling source initialization
@mxz297
Copy link
Collaborator Author

mxz297 commented Jan 21, 2022

@jmellorcrummey By the way, I also rebased the branch from develop and fixed some conflicts and compilation issues

@jmellorcrummey
Copy link
Member

Does it make sense to split the PR into (1) removing the dependence on the debug API, and (2) adding rocprofiler support?

@mxz297
Copy link
Collaborator Author

mxz297 commented Jan 22, 2022

"(1) removing the dependence on the debug API" is based on "(2) adding rocprofiler support". So, (1) has to be after (2). The code change for (1) is actually quite small. We can chat next Monday or so to go through the code changes.

@mxz297 mxz297 changed the title Rocprofiler support Huge merge for AMD GPUs Feb 21, 2022
…r needed to escape

   a character (because Jonathon wrote a script to handle \ properly)
2. Fix merge error causing hpcstruct to fail with cubin
@mxz297
Copy link
Collaborator Author

mxz297 commented Feb 23, 2022

@jmellorcrummey @Jokeren @blue42u This PR is huge. It includes three different components: rocprofiler support, preliminary AOMP support, and hpcrun initialization breakdown. It is large so that I cannot really rebase it if new commits are pushed to the develop branch.

Today, I wanted to test it with some real AMD GPU application. I picked Quicksilver; but Quicksilver does not seem to run with rocm-4.5.2.

I tried this PR with Quicksilver on NVIDIA GPU with CUDA-11.6, which revealed some problems and I have fixed them. PeleC does not seem to work with CUDA-11.6 (even without hpctoolkit).

What additional testing and code reviews are needed for this PR to be merged?

@mxz297 mxz297 merged commit 627d4da into develop Feb 23, 2022
@mxz297 mxz297 deleted the rocprofiler_support branch February 23, 2022 22:57
mxz297 added a commit that referenced this pull request Mar 24, 2022
This is a huge merge to develop. It includes three main components:

1. Preliminary rocprofiler support for getting AMD GPU hardware counters.
2. Prelminary support for AOMP (AMD's openmp implementation).
3. Breakdown of hpcrun initialization and reordering.

(cherry picked from commit 627d4da)
jmellorcrummey pushed a commit that referenced this pull request Mar 28, 2022
* Merge pull request #497 from HPCToolkit/rocprofiler_support

This is a huge merge to develop. It includes three main components:

1. Preliminary rocprofiler support for getting AMD GPU hardware counters.
2. Prelminary support for AOMP (AMD's openmp implementation).
3. Breakdown of hpcrun initialization and reordering.

(cherry picked from commit 627d4da)

* remove some dead code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants