Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory record in LCAO code is not correct #3831

Open
8 tasks
dyzheng opened this issue Mar 27, 2024 · 5 comments
Open
8 tasks

Memory record in LCAO code is not correct #3831

dyzheng opened this issue Mar 27, 2024 · 5 comments
Assignees
Labels
Memory Memory issues Refactor Refactor ABACUS codes

Comments

@dyzheng
Copy link
Collaborator

dyzheng commented Mar 27, 2024

Describe the Code Quality Issue

There are memory related issues now:
#3789 there are some errors in memory loggings.
#3675 Memory cost is too large in LCAO code, and the loggings is not enough.
#3710 Memory cost in PW code is not enough too.
#3657 Memory cost is too large in NSPIN=4 calculation in LCAO code.

Additional Context

There are two methods to record the used memory in ABACUS:

  1. cmake -B build -DDEBUG_INFO=1 and then check the remaining memory in every ModuleBase::TITLE() function.
  2. check the MEMORY(MB) block in OUT.{suffix}/running_{calculation}.log .

Ideally, these two method can get same peak memory cost in one calculation, but now they are not.

We should try to follow all memory records in TITLE to fix the omissive ModuleBase::Memory::record() .

Task list for Issue attackers (only for developers)

  • Identify the specific code file or section with the code quality issue.
  • Investigate the issue and determine the root cause.
  • Research best practices and potential solutions for the identified issue.
  • Refactor the code to improve code quality, following the suggested solution.
  • Ensure the refactored code adheres to the project's coding standards.
  • Test the refactored code to ensure it functions as expected.
  • Update any relevant documentation, if necessary.
  • Submit a pull request with the refactored code and a description of the changes made.
@dyzheng dyzheng self-assigned this Mar 27, 2024
@jieli-matrix
Copy link
Collaborator

jieli-matrix commented Mar 27, 2024

As we know, memory usage involves several cases:

  1. Stack Memory
    • Used to store local variables, function parameters, and return addresses within functions.
    • Stack memory is automatically allocated and freed by the compiler.
  2. Heap Memory
  3. Global/Static Storage, Constant Storage Area and Code Segment/Text Segment

So usually we should focus on the 1st and 2nd point, especially the 2nd one. Furthermore, I think it is better to analyze them independently(stack-memory v.s. time, heap-memory v.s. time) rather than analyze the whole memory usage(memory v.s. time).
I suggest some tools for profiling of stack memory and heap memory as follows:

  1. Callgrind
    Do we really care about the exact usage of stack memory? It simply reflects the overly deep recursive calls, so we can use callgrind and related graph tools to show the function call graphs.
  2. Massif
    Massif will show the which part of the code allocates the most memory and the report will display a timeline of memory allocations.

At last, we should define the problem:

  • Do we really need to provide memory tools for general users?
  • Shall we perform memory profiling only in some large-scale computation process?

@jieli-matrix
Copy link
Collaborator

jieli-matrix commented Apr 1, 2024

I suggest to overload new and delete operators to record memory usage.
Reference: https://www.cprogramming.com/tutorial/operator_new.html
cc: @dyzheng @Religious-J

@dyzheng
Copy link
Collaborator Author

dyzheng commented Apr 1, 2024

I suggest to overload new and delete operators to record memory usage. Reference: https://www.cprogramming.com/tutorial/operator_new.html cc: @dyzheng

I don't think so, the first reason is the new and delete would be called many times, the record memory operators would lead to poor performance; the second reason is there are some containers don't use the new or delete, such as containers in STL, maybe they use malloc or others; the third reason is the purpose of memory records in ABACUS is not for record exact memory cost, but for developers to analyze the memory cost of algorithm, only few classes or functions are the hotspots, we only need to find and record them.

@jieli-matrix
Copy link
Collaborator

I don't think so, the first reason is the new and delete would be called many times, the record memory operators would lead to poor performance; the second reason is there are some containers don't use the new or delete, such as containers in STL, maybe they use malloc or others; the third reason is the purpose of memory records in ABACUS is not for record exact memory cost, but for developers to analyze the memory cost of algorithm, only few classes or functions are the hotspots, we only need to find and record them.

To the first reason, the overloaded operators can only used in memory debugging mode but not global scope.
To the second reason, that's right and maybe we should find some solutions.
To the third reason, we need to discuss how we invent tools for developers to analyze the memory cost of algorithm and in which cases.

Solving the current memory record error is important, however we may not depend on developers to find and record memory every time in the future.

@WHUweiqingzhou
Copy link
Collaborator

Please also pay attention to #3852

@mohanchen mohanchen added Refactor Refactor ABACUS codes and removed High Priority labels Oct 17, 2024
@mohanchen mohanchen self-assigned this Oct 23, 2024
@mohanchen mohanchen added the Memory Memory issues label Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Memory Memory issues Refactor Refactor ABACUS codes
Projects
None yet
Development

No branches or pull requests

4 participants