Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature & Refactor: read and write Hexx(R) in CSR format #3727

Merged
merged 9 commits into from
Apr 2, 2024

Conversation

maki49
Copy link
Collaborator

@maki49 maki49 commented Mar 15, 2024

Background

Currently, EXX-NSCF suffers from cereal's bug in reading Hexx(R) (related issues: #2235 #3323), which forces the users to turn to PyATB.

What's changed

  • refactor (unify) the CSR writers
  • read and write Hexx(R) in CSR format to bypass cereal
  • add a parameter "reduce" to CSR writers: if false, each process will write its data respectively to its file.

@PeizeLin
Copy link
Collaborator

All HexxR have been reduced to only one process. Will this lead to memory crashes in large-scale computing?

@maki49
Copy link
Collaborator Author

maki49 commented Mar 22, 2024

All HexxR have been reduced to only one process. Will this lead to memory crashes in large-scale computing?

exx_lri.Hexxs is already a global matrix (with the same value on all the processors) no need for extra reduction.
What I did is simply that: if DRANK==0, convert to CSR and write; else, do nothing.

@PeizeLin
Copy link
Collaborator

All HexxR have been reduced to only one process. Will this lead to memory crashes in large-scale computing?

exx_lri.Hexxs is already a global matrix (with the same value on all the processors) no need for extra reduction. What I did is simply that: if DRANK==0, convert to CSR and write; else, do nothing.

For each process, only part of Hexx is stored in memory.
Although the same Hexx[I][J] may appear repeatedly in multiple processes, it does not mean that each process contains all Hexx.
For large systems, the memory requirement to store the global Hamiltonian matrices is unacceptable.

@maki49
Copy link
Collaborator Author

maki49 commented Mar 23, 2024

For each process, only part of Hexx is stored in memory. Although the same Hexx[I][J] may appear repeatedly in multiple processes, it does not mean that each process contains all Hexx. For large systems, the memory requirement to store the global Hamiltonian matrices is unacceptable.

So there's a waste of memory in Hexx to be improved. Maybe I can have a try in another PR.
This PR is to circumvent cereal's bug, as long as to "convert to CSR and write" in every processor can achieve this goal, right?

@PeizeLin
Copy link
Collaborator

For each process, only part of Hexx is stored in memory. Although the same Hexx[I][J] may appear repeatedly in multiple processes, it does not mean that each process contains all Hexx. For large systems, the memory requirement to store the global Hamiltonian matrices is unacceptable.

So there's a waste of memory in EXX Hexx to be improved. Maybe I can have a try in another PR. This PR is to circumvent cereal's bug, as long as to "convert to CSR and write" in every processor can achieve this goal, right?

right

source/module_ri/Exx_LRI_interface.h Outdated Show resolved Hide resolved
source/module_esolver/esolver_ks_lcao.cpp Show resolved Hide resolved
@dyzheng dyzheng merged commit dc7938b into deepmodeling:develop Apr 2, 2024
12 checks passed
@maki49 maki49 deleted the HexxCSR branch April 17, 2024 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants