Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do I get the following error : "buffer (1) is not resident in device (0) so migration from device to host fails" ? #412

Closed
Dalhfire opened this issue Oct 5, 2023 · 1 comment
Labels
Generic Questions Issues that are not related to a specific tutorial

Comments

@Dalhfire
Copy link

Dalhfire commented Oct 5, 2023

Hi,

I have followed the following tutorial that introduced me to the kernel : https://github.com/Xilinx/Vitis_Accel_Examples/tree/main/cpp_kernels/simple_vadd

I sucessfully make it work on my Vck190 board and now i tried to do someting really close by modifying the host code and the kernel.

Here's the code of HLs top function krnl_dwtfix

krnl_dwtfix.cpp:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <assert.h>
 
#include </home/ceos-1/Documents/Learn/VitisTutorial/dwt_v0_host/util.hpp>
#include </home/ceos-1/Documents/Learn/VitisTutorial/dwt_v0_host/variables.hpp>
#include "hls_stream.h"
#include "ap_axi_sdata.h"
#include "ap_int.h"
/*--------------------------- DWT 1D INTEGER/FLOAT --------------------------*/
/*                   Version vigule fixe de la DWT float                     */
/*---------------------------------------------------------------------------*/
 
bool first = true;
int cpt_test =0;
 
typedef ap_axis<64, 0, 0, 0> axistream_long;
 
void load_inputs(int64_t *pX, hls::stream<int64_t>& pX_stream) {
mem_rd:
    //Fill pX_stream with the values from pX array
    for (int i = 0; i < 9; i++) {
    //#pragma HLS PIPELINE
    #pragma HLS LOOP_TRIPCOUNT min = 9 max = 9
        pX_stream << pX[i];
    }
}
 
void store_result(int64_t* out, hls::stream<int64_t>& out_stream) {
mem_wr:
    out[0] = out_stream.read();
}
 
static void compute_coef(hls::stream<int64_t>& pX_stream, hls::stream<int64_t>& out_stream) {
    int64_t h[5] = {894119, 395736, -115998, -25008, 39666};
    int64_t pX[9];
 
execute_coef:
    for (int i = 0; i < 9; i++) {
    //#pragma HLS PIPELINE
    #pragma HLS LOOP_TRIPCOUNT min = 9 max = 9
        pX[i] = pX_stream.read();
    }
    int64_t result = 0;
 
compute_result:
    result += h[0] * pX[4];
    for (int i = 1; i < 5; i++) {
        result += h[i] * (pX[4-i] + pX[4+i]);
    }
 
    out_stream << result;
}
 
extern "C" {
/* CalculCoefC
 *  (IN)  h : Low-Pass Coefs de la DWT 
 *  (IN)  pX : vecteur des 2N donnees 
 *  (OUT) pC : vecteur des N donnees Low-Pass de la DWT
 */
    void krnl_dwtfix(int64_t *pX, int64_t *value_out) {
    #pragma HLS INTERFACE m_axi port=pX bundle = gmem0 depth=64
    #pragma HLS INTERFACE m_axi port=value_out bundle = gmem0 depth=64
 
        static hls::stream<int64_t> pX_stream("pX_stream");
        static hls::stream<int64_t> out_stream("output_stream");
        
        // Read pX_stream to local arrays
        #pragma HLS DATAFLOW
        load_inputs(pX, pX_stream);
        compute_coef(pX_stream, out_stream);
        store_result(value_out, out_stream);
    }
}

This one seems to work when launching the testbench the results are good.

And then here's the part in the host code where I tried to launch the kernel :

// Fill the h_stream and pX_stream with the appropriate data
    //fillStreams(pX_axistream, pX_extracted);
 
    // Kernel Part
 
    // These commands will allocate memory on the Device. The cl::Buffer objects can
    // be used to reference the memory locations on the device.
    auto start = std::chrono::steady_clock::now();
    bool found_device = false;
 
    // Creates a vector of DATA_SIZE elements with an initial value of 10 and 32
    // using customized allocator for getting buffer alignment to 4k boundary
 
    std::vector<cl::Device> devices;
    cl_int err;
    cl::Context context;
    cl::CommandQueue q;
    cl::Kernel krnl_dwtfix;
    cl::Program program;
    std::vector<cl::Platform> platforms;
 
    // traversing all Platforms To find Xilinx Platform and targeted
    // Device in Xilinx Platform
    cl::Platform::get(&platforms);
    for (size_t i = 0; (i < platforms.size()) & (found_device == false); i++) {
        cl::Platform platform = platforms[i];
        std::string platformName = platform.getInfo<CL_PLATFORM_NAME>();
        if (platformName == "Xilinx") {
            devices.clear();
            platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices);
            if (devices.size()) {
                found_device = true;
                break;
            }
        }
    }
    if (found_device == false) {
        std::cout << "Error: Unable to find Target Device " << std::endl;
        exit(EXIT_FAILURE);
    }
 
    std::cout << "INFO: Reading " << xclbinFilename << std::endl;
    FILE* fp;
    if ((fp = fopen(xclbinFilename.c_str(), "r")) == nullptr) {
        printf("ERROR: %s xclbin not available please build\n", xclbinFilename.c_str());
        exit(EXIT_FAILURE);
    }
    // Load xclbin
    std::cout << "Loading: '" << xclbinFilename << "'\n";
    std::ifstream bin_file(xclbinFilename, std::ifstream::binary);
    bin_file.seekg(0, bin_file.end);
    unsigned nb = bin_file.tellg();
    bin_file.seekg(0, bin_file.beg);
    char* buf = new char[nb];
    bin_file.read(buf, nb);
 
    // Creating Program from Binary File
    cl::Program::Binaries bins;
    bins.push_back({buf, nb});
    bool valid_device = false;
    for (unsigned int i = 0; i < devices.size(); i++) {
        auto device = devices[i];
        // Creating Context and Command Queue for selected Device
        OCL_CHECK(err, context = cl::Context(device, nullptr, nullptr, nullptr, &err));
        OCL_CHECK(err, q = cl::CommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &err));
        std::cout << "Trying to program device[" << i << "]: " << device.getInfo<CL_DEVICE_NAME>() << std::endl;
        cl::Program program(context, {device}, bins, nullptr, &err);
        if (err != CL_SUCCESS) {
            std::cout << "Failed to program device[" << i << "] with xclbin file!\n";
        } else {
            std::cout << "Device[" << i << "]: program successful!\n";
            OCL_CHECK(err, krnl_dwtfix = cl::Kernel(program, "krnl_dwtfix", &err));
            valid_device = true;
            break; // we break because we found a valid device
        }
    }
    if (!valid_device) {
        std::cout << "Failed to program any device found, exit!\n";
        exit(EXIT_FAILURE);
    }
 
    
 
    OCL_CHECK(err, cl::Buffer buffer_pX(context, CL_MEM_READ_ONLY, sizeof(int64_t)*9, NULL, &err));
    OCL_CHECK(err, cl::Buffer buffer_v(context, CL_MEM_WRITE_ONLY, sizeof(int64_t), NULL, &err));
 
    // set the kernel Arguments
    int narg = 0;
    OCL_CHECK(err, err = krnl_dwtfix.setArg(narg++, buffer_pX));
    OCL_CHECK(err, err = krnl_dwtfix.setArg(narg++, buffer_v));
 
    // We then need to map our OpenCL buffers to get the pointers
 
    int64_t* ptr_pX;
    int64_t* ptr_v;
    OCL_CHECK(err, ptr_pX = (int64_t*)q.enqueueMapBuffer(buffer_pX, CL_TRUE, CL_MAP_WRITE, 0, sizeof(int64_t)*9, NULL, NULL, &err));
    OCL_CHECK(err, ptr_v = (int64_t*)q.enqueueMapBuffer(buffer_v, CL_TRUE, CL_MAP_READ, 0, sizeof(int64_t), NULL, NULL, &err));    
 
    int64_t pX_extracted[9];
    extractValues(pX, pX_extracted); 
 
    // Do not assign new values to ptr_pX and ptr_v
    // Instead, copy the data from pX_extracted to the mapped buffers
    std::memcpy(ptr_pX, pX_extracted, sizeof(int64_t)*9);
    
    // Data will be migrated to kernel space
    OCL_CHECK(err, err = q.enqueueMigrateMemObjects({buffer_pX}, 0 /* 0 means from host*/));
 
    // Launch the Kernel
    OCL_CHECK(err, err = q.enqueueTask(krnl_dwtfix));
 
    // The result of the previous kernel execution will need to be retrieved in
    // order to view the results. This call will transfer the data from FPGA to
    // source_results vector
    OCL_CHECK(err, q.enqueueMigrateMemObjects({buffer_v}, CL_MIGRATE_MEM_OBJECT_HOST));
 
    OCL_CHECK(err, q.finish());
 
    /* if (!last_G_ndwt_out_s.empty()){
    	last_G_ndwt = last_G_ndwt_out_s.read().data;
        fprintf(stdout,"last_G_ndwt : %d",last_G_ndwt);
    }
    fprintf(stdout,"last_G_ndwt : %d",last_G_ndwt);*/
    v = *ptr_v;

The software and hardware emulation works and compile but when I tried to launch the hw_emulation, I got the following messages in the console and then the program stop :

Loading: 'binary_container_1.xclbin'
 
XRT build version: 2.15.0
Build hash: 64c933573e7e50a8aba939a74209590c2b739e8b
Build date: 2023-04-17 09:18:13
Git branch: 2023.1
PID: 579
UID: 0
[Thu Oct  5 10:14:25 2023 GMT]
HOST: 
EXE: /mnt/dwt_v0_host
[XRT] ERROR: buffer (1) is not resident in device (0) so migration from device to host fails
terminate called after throwing an instance of 'xrt_xocl::error'
  what():  event 5 never submitted
[   83.198147] zocl-drm amba_pl@0:zyxclmm_drm:  ffff0008003f7810 kds_del_context: Client pid(579) del context Domain(0) CU(0x0)
[   83.208894] zocl-drm amba_pl@0:zyxclmm_drm:  ffff0008003f7810 kds_del_context: Client pid(579) del context Domain(65535) CU(0xffff)
INFO: Reading binary_container_1.xclbin
Loading: 'binary_container_1.xclbin'
 
Thread 2 "dwt_v0_host" received signal SIGABRT, Aborted.
[Switching to Thread 0xfffff355a120 (LWP 583)]
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44      pthread_kill.c: No such file or directory.
 

By using gdb I found that the error comes from the line :

OCL_CHECK(err, q.enqueueMigrateMemObjects({buffer_v}, CL_MIGRATE_MEM_OBJECT_HOST));

But i don't know why it isn't working, does someone has an idea ?

Thanks in advance for your help,

David

@randyh62
Copy link
Contributor

randyh62 commented Oct 5, 2023

The store_result() of your kernel doesn't seem to line up with your load_inputs(). Maybe just try to initialize the buffer_v from the kernel to see if you can transfer data.

Also, this is not an issue with the Vitis_Tutorials, and so should be raised on the user forums instead.

@imrickysu imrickysu closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2023
@imrickysu imrickysu added the Generic Questions Issues that are not related to a specific tutorial label Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Generic Questions Issues that are not related to a specific tutorial
Projects
None yet
Development

No branches or pull requests

3 participants