Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double peak memory cost in cast_memory_op #4153

Closed
16 tasks
dyzheng opened this issue May 11, 2024 · 1 comment · Fixed by #4154 or #4160
Closed
16 tasks

Double peak memory cost in cast_memory_op #4153

dyzheng opened this issue May 11, 2024 · 1 comment · Fixed by #4154 or #4160
Assignees
Labels
GPU & DCU & HPC GPU and DCU and HPC related any issues

Comments

@dyzheng
Copy link
Collaborator

dyzheng commented May 11, 2024

Describe the bug

In esolver_ks_pw.cpp:

    this->kspw_psi = GlobalV::device_flag == "gpu" 
                         || GlobalV::precision_flag == "single"
                         ? new psi::Psi<T, Device>(this->psi[0])
                         : reinterpret_cast<psi::Psi<T, Device>*>(this->psi);

the constructor of Psi used the function of cast_memory_op:

template <typename T_out, typename T_in>
struct cast_memory<T_out, T_in, container::DEVICE_CPU, container::DEVICE_GPU> {
    void operator()(
        T_out* arr_out,
        const T_in* arr_in,
        const size_t& size)
    {
        auto * arr = (T_in*) malloc(sizeof(T_in) * size);
        cudaErrcheck(cudaMemcpy(arr, arr_in, sizeof(T_in) * size, cudaMemcpyDeviceToHost));
        for (int ii = 0; ii < size; ii++) {
            arr_out[ii] = static_cast<T_out>(arr[ii]);
        }
        free(arr);
    }
};

the temporary memory of arr is same as Psi, which should be optimized as soon as possible.

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).
@mohanchen mohanchen added the GPU & DCU & HPC GPU and DCU and HPC related any issues label May 11, 2024
@caic99
Copy link
Member

caic99 commented May 12, 2024

Hi @dyzheng ,
I'm interested in this problem and wonder does the type conversion really happens?
FYI, you can select code lines and paste the permalink to show codes in input box. This way provides easier access to reference source codes.

this->kspw_psi = GlobalV::device_flag == "gpu"
|| GlobalV::precision_flag == "single"
? new psi::Psi<T, Device>(this->psi[0])
: reinterpret_cast<psi::Psi<T, Device>*>(this->psi);

@denghuilu denghuilu linked a pull request May 15, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU & DCU & HPC GPU and DCU and HPC related any issues
Projects
None yet
4 participants