-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIP and MPI+HIP updates #361
base: master
Are you sure you want to change the base?
Conversation
…MPI+HIP codes. Unify CUDA and HIP code paths (CUDA / HIP => GPU, CUDA_MPIV / HIP_MPIV => MPIV_GPU, etc.).
As noted above, all tests (full test suite) are passing for the CUDA and MPI+CUDA (1 GPU) versions. However, some tests are failing for the HIP and MPI+HIP versions. See the logs below from tests on the MI210s on the AMD AAC. Interestingly, the test failures are slightly different between the HIP and MPI+HIP versions. This is a bit difficult to debug at least when comparing against the working HIP / MPI+HIP versions from the 23.08b release as there are also a number of test failures there. It may be better to pick a commit before the f-function optimizations and run tests there for comparison. Test configuration on the AMD AAC:
HIP test summary and diffs: MPI+HIP test summary and diffs: |
…preprocessor definitions for performance and storage considerations. Refactor preprocessor defintions to avoid unnecessary arithmetic.
…aths for older HIP builds.
…regarding STORE_OPERATOR). Fix segfault in debug builds of GPU code without ERI f function supported enabled but basis contains f functions. Remove unneeded DGEMM operation in CUDA codes in SCF/USCF methods. Other code clean-up.
…ggled on in CMake build.
…power functions (inlined device functions calling pow to preprocessor definitions using multiplication operations). Other code clean-up.
fbd9602
to
f937da6
Compare
…s. Add CMake option to enable LLVM-based address sanitizer (ASAN) for debugging with HIP builds.
… and replace with emulation at full double precision for pre-Pascal NVIDIA GPUs (previously toggled via USE_LEGACY_ATOMICS). Note that the old code was leading to slow and possibly failing SCF convergence which was only exposed during testing with tighter density matrix convergence thresholds and integral cut-offs. This is likely due to the truncation used for energy and gradient calculations (1e-6 and 1e-12, respectively).
…t-offs (abs -> fabs). Tune exchange correlation code.
52c8e65
to
53c25af
Compare
1a98223
to
508fc52
Compare
…ion (< v5.3.0) due to poor performance and use CPU diagonalization routines instead.
…lds on AMD MI300 series GPUs.
… function arguments to save stack space.
d2f68c5
to
b43412e
Compare
…Remove GPU functions from exported function interfaces (not required and may negatively impact compilation). Other code clean-up.
…code reorganization which led to incorrect preprocessor constants being used by several device functions [incorrect memory layout used in accumulating partial results]).
…I routines. Further device code reorganization to localize scopes for eventually removal of relocatable device code flags. Other code clean-up.
I have tested git commit id ab27c99 on following platforms. I tested without f functions on delorean and chinotto and with f functions on Expanse A100 GPU nodes. All tests of the full test suite pass (serial, mpi, cuda, cudampi). delorean
chinotto
Expanse
|
Timers are not correct. It looks like Below outputs are for test QUICK-23.08a
This PR 361, git commit id ab27c99
|
This MR ports the updated CUDA and MPI+CUDA codes (including recently merged f-function optimizations) to HIP and MPI+HIP versions, respectively. Also, this MR begins to unify the CUDA and HIP versions to simpily future GPU code maintainence.
Additional work also included in this PR concerns the following items:
Limitations / Known Issues:
-DENABLEF=TRUE
) does not compile on HIP versions due to resource limitations (stack frame size exceeded in device functions). Errors are similar to the following (from a build on the AMD Accelerator Cloud platform):Closes #344.