Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nqt o2 #433

Merged
merged 20 commits into from
Nov 21, 2024
Merged

nqt o2 #433

merged 20 commits into from
Nov 21, 2024

Conversation

Yurlungur
Copy link
Collaborator

PR Summary

This threads in the second order NQT method developed by Peter Hammond on the AthenaK team. More details soon as I develop them. Also there will be an archive note eventually.

PR Checklist

  • Adds a test for any bugs fixed. Adds tests for new features.
  • Format your changes by using the make format command after configuring with cmake.
  • Document any new features, update documentation for changes made.
  • Make sure the copyright notice on any files you modified is up to date.
  • After creating a pull request, note it in the CHANGELOG.md file.
  • LANL employees: make sure tests pass both on the github CI and on the Darwin CI

If preparing for a new release, in addition please check the following:

  • Update the version in cmake.
  • Move the changes in the CHANGELOG.md file under a new header for the new release, and reset the categories.
  • Ensure that any when='@main' dependencies are updated to the release version in the package.py

@Yurlungur Yurlungur added bug Something isn't working enhancement New feature or request labels Nov 17, 2024
@Yurlungur Yurlungur self-assigned this Nov 17, 2024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if there's code I'm not seeing that addresses this, but do we check if the precision of the typedef Real is double before including this? Is there equivalent 'float' procedures?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are not equivalent float procedures at this time. Nor are there little endians. However all functions in this code are explicitly doubles.

You're worried about building for single-precision I assume. Yes, we should be worried about that. Let's extend that in a later PR, though. Getting this working was enough of a lift.

Copy link
Collaborator

@mauneyc-LANL mauneyc-LANL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for discussion rather than blocking requests.

singularity-eos/base/fast-math/logs.hpp Outdated Show resolved Hide resolved
singularity-eos/base/fast-math/logs.hpp Show resolved Hide resolved
singularity-eos/base/fast-math/logs.hpp Outdated Show resolved Hide resolved
singularity-eos/base/fast-math/logs.hpp Outdated Show resolved Hide resolved
singularity-eos/base/fast-math/logs.hpp Outdated Show resolved Hide resolved
@Yurlungur
Copy link
Collaborator Author

Yurlungur commented Nov 19, 2024

Timing results

x86-volta

NQT o1

                ...L2 difference = 0.0254583
                ...EOSPAC time/point (microseconds) = 0.493015
                ...spiner host time/point (microseconds) = 0.0335447
                ...spiner device time/point (microseconds) = 0.00160943

NQT o2

                ...L2 difference = 0.00807353
                ...EOSPAC time/point (microseconds) = 0.491144
                ...spiner host time/point (microseconds) = 0.033117
                ...spiner device time/point (microseconds) = 0.00142899

NQT true

                ...L2 difference = 0.0171703
                ...EOSPAC time/point (microseconds) = 0.491845
                ...spiner host time/point (microseconds) = 0.0619208
                ...spiner device time/point (microseconds) = 0.00149041

grace-hopper

NQT o1

                ...L2 difference = 0.0254583
                ...EOSPAC time/point (microseconds) = 0.327759
                ...spiner host time/point (microseconds) = 0.0110745
                ...spiner device time/point (microseconds) = 0.000517656

NQT o2

                ...L2 difference = 0.00807353
                ...EOSPAC time/point (microseconds) = 0.302807
                ...spiner host time/point (microseconds) = 0.0137009
                ...spiner device time/point (microseconds) = 0.000571444

NQT true

                ...L2 difference = 0.0171703
                ...EOSPAC time/point (microseconds) = 0.326109
                ...spiner host time/point (microseconds) = 0.0203759
                ...spiner device time/point (microseconds) = 0.000564959

@Yurlungur
Copy link
Collaborator Author

Pointwise error
copper_nqt_comp

@Yurlungur Yurlungur changed the title [WIP] nqt o2 nqt o2 Nov 19, 2024
@Yurlungur
Copy link
Collaborator Author

Yurlungur commented Nov 20, 2024

Timings for the stellar collapse reader in nanoseconds/point

jonahm@sgrb:~/simulations/spiner$ tail -n 2 timings_SFHo_o1_x86_volta.txt 
host = 22.3432
device = 0.641151
jonahm@sgrb:~/simulations/spiner$ tail -n 2 timings_SFHo_o2_x86_volta.txt 
host = 31.665
device = 0.612225
jonahm@sgrb:~/simulations/spiner$ tail -n 2 timings_SFHo_true_x86_volta.txt 
host = 77.0338
device = 0.665203
jonahm@sgrb:~/simulations/spiner$ tail -n 2 timings_SFHo_o1_grace_hopper.txt 
host = 9.79919
device = 0.443848
jonahm@sgrb:~/simulations/spiner$ tail -n 2 timings_SFHo_o2_grace_hopper.txt 
host = 15.484
device = 0.453125
jonahm@sgrb:~/simulations/spiner$ tail -n 2 timings_SFHo_true_grace_hopper.txt 
host = 22.6683
device = 0.443237

so on GPUs, no difference. On grace, o2/true speedup is ~25% and on x86, its ~2x.

@Yurlungur
Copy link
Collaborator Author

Error plots
SFHo_NQT_comp

@Yurlungur
Copy link
Collaborator Author

@jhp-lanl @dholladay00 @pdmullen @jdolence this is ready for review

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New example showing how to profile. Can be set to profile any EOS, even analytic.

@Yurlungur Yurlungur requested a review from chadmeyer November 20, 2024 04:39
CMakeLists.txt Show resolved Hide resolved
test/test_bounds.cpp Show resolved Hide resolved
test/test_math_utils.cpp Show resolved Hide resolved
singularity-eos/base/fast-math/logs.hpp Show resolved Hide resolved
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to this documentation is where the core of the method is. Essentially the quadratic interpolation of $\lg(mantissa)$ on the interval $[0.5, 1)$ has the advantage that the NQT function is C1 everywhere, which improves convergence of linear interpolation. The function itself is still a 100% accurate version of itself, but improving the continuity class of the method improves convergence of stencil ops.

CMakeLists.txt Show resolved Hide resolved
singularity-eos/base/fast-math/logs.hpp Show resolved Hide resolved
test/test_bounds.cpp Show resolved Hide resolved
test/test_math_utils.cpp Show resolved Hide resolved
doc/sphinx/src/contributing.rst Show resolved Hide resolved
@Yurlungur
Copy link
Collaborator Author

Tests passing on re-git. Any objections to merging this? @AstroBarker lets discuss phoebus when you get the chance.

@Yurlungur Yurlungur merged commit d7af402 into main Nov 21, 2024
6 checks passed
@Yurlungur Yurlungur deleted the jmm/nqt-o2 branch November 21, 2024 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants