-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge merge for AMD GPUs #497
Conversation
This is intended to deal with sample source init code that will dlopen other libraries (such as PAPI)
1. init load map 2. set up debug flag, log file and measurement directory 3. save vdso 4. fnbounds init 5. register sample source This re-ordering is intended to monitor dynamic libraryes loaded during sampling source initialization
…bscriber callback
…to opencl_instrumentation
…oolkit_aaron into opencl_instrumentation_fix
@jmellorcrummey By the way, I also rebased the branch from develop and fixed some conflicts and compilation issues |
Does it make sense to split the PR into (1) removing the dependence on the debug API, and (2) adding rocprofiler support? |
"(1) removing the dependence on the debug API" is based on "(2) adding rocprofiler support". So, (1) has to be after (2). The code change for (1) is actually quite small. We can chat next Monday or so to go through the code changes. |
- only flush trace during finalization in threads that have launched kernels
trace "[start, end) op xxx" as start xxx end no_activity don't use end + 1 for no_activity to avoid overlap with an adjacent interval
- poll for activities after flush - handle buffer completion event with empty buffer, which happens to be owned
…m branch 'papi_fix_libmonitor_merged_dev'
…r needed to escape a character (because Jonathon wrote a script to handle \ properly) 2. Fix merge error causing hpcstruct to fail with cubin
@jmellorcrummey @Jokeren @blue42u This PR is huge. It includes three different components: rocprofiler support, preliminary AOMP support, and hpcrun initialization breakdown. It is large so that I cannot really rebase it if new commits are pushed to the develop branch. Today, I wanted to test it with some real AMD GPU application. I picked Quicksilver; but Quicksilver does not seem to run with rocm-4.5.2. I tried this PR with Quicksilver on NVIDIA GPU with CUDA-11.6, which revealed some problems and I have fixed them. PeleC does not seem to work with CUDA-11.6 (even without hpctoolkit). What additional testing and code reviews are needed for this PR to be merged? |
This is a huge merge to develop. It includes three main components: 1. Preliminary rocprofiler support for getting AMD GPU hardware counters. 2. Prelminary support for AOMP (AMD's openmp implementation). 3. Breakdown of hpcrun initialization and reordering. (cherry picked from commit 627d4da)
* Merge pull request #497 from HPCToolkit/rocprofiler_support This is a huge merge to develop. It includes three main components: 1. Preliminary rocprofiler support for getting AMD GPU hardware counters. 2. Prelminary support for AOMP (AMD's openmp implementation). 3. Breakdown of hpcrun initialization and reordering. (cherry picked from commit 627d4da) * remove some dead code
This is a huge merge to develop. It includes three main components:
More details about the rocprofiler support:
-e gpu=amd
together with rocprofiler counter events as we need to start and stop counter collection around kernel launches using the subscriber callback provided by roctracer. We can consider creating a new shared library to wrap HIP kernel launch APIs to decouple rocprofiler from roctracer.I expect the following things to be done in separate PRs: