Feature/power #1479

maddyscientist · 2024-07-09T05:09:27Z

This PR is the initial step towards more power awareness in QUDA, as well as adding OMP threading for host kernels

Adds power, temperature and clock monitoring
- Monitoring is enabled with QUDA_ENABLE_MONITOR=1 (default is off)
- Monitoring is performed on a spawned thread, maintaining the history in a linked list
- Default monitoring period is QUDA_ENABLE_MONITOR_PERIOD=1
- Monitor info, together with derived energy usage is dumped to a monitor_*****.tsv file, where the **** encodes the rank id, and the date_time of the dump. All ranks have identical times by construction.
Add OpenMP threading support for all CPU kernels
- Most kernels seem to get reasonable scaling
- QUDA_OPENMP CMake parameter is no longer marked as advanced
Fixes compiler warning introduced in staggered contraction code #1416
Fixes bug introduced in staggered contraction code #1416
Fixed an issue with endQuda if memory leaks were detected when running multi-GPU: printfQuda would fail since comm_rank() would be called after the comms have been torn down

…e elsewhere. Move assertAllMemFree and device::destroy before we bring down the comms (fixes a potential issue where we call a QUDA i/o function in these functions but the comms is down

…lock rate.

…toring period (in microseconds) is set by QUDA_ENABLE_MONITOR_PERIOD (Default = 1000 microseconds = 1 millisecond

include/targets/generic/kernel_host.h

include/targets/generic/block_reduction_kernel_host.h

include/targets/generic/kernel_host.h

lib/monitor.cpp

lib/targets/cuda/device.cpp

include/targets/generic/reduction_kernel_host.h

bjoo

After a quick scan this looks fine to me, I had only a few OpenMP comments I left as single comments regarding the use of collapse() I saw you use it in some cases and not in others, but I was just wondering if it could be used in one or two more. The only thing I worry about is dropping a functor ( Arg::apply ) or some such into a reduction clause. I wonder if it ought to be a custom reduction (like done for Multi-Reductions).
This could be just me not being up to spec with my OpenMP though. Modulo these I approve.

maddyscientist added 7 commits July 8, 2024 09:33

Enable OMP threading for host kernels

d604f80

Fix warning in spin taste and minor cleanup

bb97c6d

Fix bug in GammaApply with introduced in #1416

230d589

Update device.in.hpp for Blackwell

2628d12

Cleanup tune.cpp to expose has and version as public functions for us…

d2fb537

…e elsewhere. Move assertAllMemFree and device::destroy before we bring down the comms (fixes a potential issue where we call a QUDA i/o function in these functions but the comms is down

Added initial support for monitoring the GPU temperature, power and c…

5a3d310

…lock rate.

Monitoring is now enabled by envarg QUDA_ENABLE_MONITOR, and the moni…

2cf1eb5

…toring period (in microseconds) is set by QUDA_ENABLE_MONITOR_PERIOD (Default = 1000 microseconds = 1 millisecond

maddyscientist requested review from a team as code owners July 9, 2024 05:09

maddyscientist requested review from bjoo, weinbe2 and hummingtree July 9, 2024 05:09

bjoo reviewed Jul 9, 2024

View reviewed changes

include/targets/generic/kernel_host.h Show resolved Hide resolved

bjoo reviewed Jul 9, 2024

View reviewed changes

include/targets/generic/block_reduction_kernel_host.h Show resolved Hide resolved

bjoo reviewed Jul 9, 2024

View reviewed changes

include/targets/generic/kernel_host.h Show resolved Hide resolved

hummingtree reviewed Jul 9, 2024

View reviewed changes

lib/monitor.cpp Outdated Show resolved Hide resolved

hummingtree reviewed Jul 9, 2024

View reviewed changes

lib/targets/cuda/device.cpp Outdated Show resolved Hide resolved

bjoo reviewed Jul 9, 2024

View reviewed changes

include/targets/generic/reduction_kernel_host.h Show resolved Hide resolved

bjoo approved these changes Jul 9, 2024

View reviewed changes

maddyscientist added 4 commits July 9, 2024 12:26

Merge branch 'develop' of github.com:lattice/quda into feature/power

07f9a0b

Add some comments and wee bit of cleanup

3f7939e

Tweak NVML_CHECK macro

c52fac3

Apply clang-format

50db5c1

hummingtree approved these changes Jul 9, 2024

View reviewed changes

maddyscientist merged commit d199bd3 into develop Jul 9, 2024
13 of 14 checks passed

maddyscientist deleted the feature/power branch July 9, 2024 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/power #1479

Feature/power #1479

maddyscientist commented Jul 9, 2024

bjoo left a comment

Feature/power #1479

Feature/power #1479

Conversation

maddyscientist commented Jul 9, 2024

bjoo left a comment

Choose a reason for hiding this comment