Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Free Memory Reporting for Intel Arc(TM) A770 Graphics #750

Open
avimanyu786 opened this issue Jul 31, 2024 · 3 comments
Open

Incorrect Free Memory Reporting for Intel Arc(TM) A770 Graphics #750

avimanyu786 opened this issue Jul 31, 2024 · 3 comments
Labels
L0 Sysman Issue related to L0 Sysman

Comments

@avimanyu786
Copy link

avimanyu786 commented Jul 31, 2024

Description

There is an inconsistency in the reported GPU free memory between the Intel Compute Runtime and tools such as xpu-smi. When using the Intel Compute Runtime on Intel Arc(TM) A770 Graphics, the reported free memory value is incorrect, consistently showing the same value as the total memory, even when memory is being consumed. This issue was observed in both Python (dpctl) and a standalone C++ executable.

Steps to Reproduce

  1. Set up an environment with the Intel Compute Runtime and xpu-smi installed.
  2. Save the following C++ code as say mem.cpp:
#include <iostream>
#include <vector>
#include <string>
#include <sycl/sycl.hpp>

int main(void) {
    sycl::queue q{sycl::default_selector_v};

    const sycl::device &dev = q.get_device();
    const std::string &dev_name = dev.get_info<sycl::info::device::name>();
    const std::string &driver_ver = dev.get_info<sycl::info::device::driver_version>();

    std::cout << "Device: " << dev_name << " ["  << driver_ver << "]" << std::endl;

    auto global_mem_size = dev.get_info<sycl::info::device::global_mem_size>();

    std::cout << "Global device memory size: " << global_mem_size << " bytes" << std::endl;

    if (dev.has(sycl::aspect::ext_intel_free_memory)) {
         auto free_memory = dev.get_info<sycl::ext::intel::info::device::free_memory>();
         std::cout << "Free memory: " << free_memory << " bytes" << std::endl;
         std::cout << "Implied memory in use: " << global_mem_size - free_memory << " bytes" << std::endl;
    } else {
        std::cout << "Free memory descriptor is not available" << std::endl;
    }

    return 0;
}
  1. Compile the code to obtain the binary:
icpx -fsycl mem.cpp -o mem.x
  1. Execute the compiled binary with the environment variable ZES_ENABLE_SYSMAN set to 1:
export ZES_ENABLE_SYSMAN=1
./mem.x
  1. Compare the output with the results from xpu-smi:
xpu-smi stats -d 0

Observed Behavior

The C++ code consistently reports the same value for global_mem_size and free_memory, implying 0 bytes of used memory, even when memory is being consumed by the GPU. In contrast, xpu-smi correctly reports non-zero GPU memory usage.

Expected Behavior

The free_memory value reported by the Intel Compute Runtime should reflect the actual free memory, showing a decrease when GPU memory is used, consistent with the output from xpu-smi.

Environment Details

  • OS: HiveOS (Based on Ubuntu 20.04 and 22.04)
  • GPU: Intel(R) Arc(TM) A770 Graphics
  • GPU driver versions tested:
    • 1.3.27642
    • 1.3.29735
  • Intel Compute Runtime: Relevant versions for the above drivers
  • Compiler: Intel DPC++/C++ Compiler (icpx)

Additional Information

This issue is tracked in the dpctl repository here. The problem appears to stem from the GPU driver or the Intel Compute Runtime itself, as confirmed by running a standalone C++ executable.

Please let me know if further information or testing is required. Thank you for investigating this issue.

@avimanyu786 avimanyu786 changed the title Incorrect Free Memory Reporting by Intel Compute Runtime for Intel Arc(TM) A770 Graphics Incorrect Free Memory Reporting for Intel Arc(TM) A770 Graphics Aug 1, 2024
@avimanyu786
Copy link
Author

For more added context, xpu-smi fetches the value of XPUM_STATS_MEMORY_USED to report the used GPU memory. I found this when I searched for "GPU Memory Used" in the https://github.com/intel/xpumanager repository.

@eero-t
Copy link

eero-t commented Aug 19, 2024

XPUM xpumd daemon (providing the data for xpu-smi CLI tool) uses compute-runtime L0 Sysman (not SYCL) API: https://spec.oneapi.io/level-zero/latest/sysman/api.html#zesmemorygetstate

@avimanyu786
Copy link
Author

XPUM xpumd daemon (providing the data for xpu-smi CLI tool) uses compute-runtime L0 Sysman (not SYCL) API: https://spec.oneapi.io/level-zero/latest/sysman/api.html#zesmemorygetstate

The link has moved: https://oneapi-src.github.io/level-zero-spec/level-zero/latest/sysman/api.html#zesmemorygetstate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L0 Sysman Issue related to L0 Sysman
Projects
None yet
Development

No branches or pull requests

3 participants