Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: optimize function pow in module_gint #4680

Merged
merged 8 commits into from
Jul 14, 2024

Conversation

dzzz2001
Copy link
Collaborator

@dzzz2001 dzzz2001 commented Jul 13, 2024

Background

When profiling the abacus program with vtune, I found that the pow function consumed a significant portion of the cal_dpsir_ylm function's runtime. std::pow is an inefficient function, especially when the exponent is a small integer, as direct multiplication can be dozens of times faster than calling std::pow. Thus, I overloaded the pow function for cases where the exponent is a small integer, which has accelerated the computation of cal_gint_force.
Before optimizing, the flame graph of cal_dpsir_ylm:
image
After optimizing, the flame graph of cal_dpsir_ylm(the "_INTERN..." part represents pow function):
image
It's clear that the time portion spent in pow function dropped significantly after the overloaded of pow.
Here is the performance comparison (the testing example is tests/performance/P103_Si128_lcao and the testing command is "OMP_NUM_THREADS=12 mpirun -n 1 abacus"):

before optimization after optimization
cal_gint_force(CPU) 16.81s 14.06s
cal_gint_force(GPU) 3.91s 3.01s

@dzzz2001 dzzz2001 requested a review from mohanchen July 13, 2024 02:22
@mohanchen mohanchen added the The Absolute Zero Reduce the "entropy" of the code to 0 label Jul 13, 2024
@dzzz2001 dzzz2001 closed this Jul 14, 2024
@dzzz2001 dzzz2001 reopened this Jul 14, 2024
@mohanchen mohanchen merged commit 98f0682 into deepmodeling:develop Jul 14, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
The Absolute Zero Reduce the "entropy" of the code to 0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants