Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Npu allocator #437

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
89094a8
Prototype shared memory allocator on Windows using OV-EP
javier-intel Jul 18, 2024
2e4b205
Partially working allocator.
ericcraw Aug 23, 2024
63e8aee
Hard code onnx perf to use RT NPU allocator for inputs
ericcraw Aug 24, 2024
cd88b0c
Fix allocation lookups coming from different level zero contexts
ericcraw Aug 26, 2024
89127f0
Page align OV allocation
ericcraw Aug 26, 2024
d43219f
Allocate input as WC
ericcraw Aug 26, 2024
274e6af
Only set tensors when they have changed.
ericcraw Aug 26, 2024
6feae84
Revert "Allocate input as WC"
ericcraw Aug 26, 2024
c1f3b3e
Hard code onnx perf to use RT NPU for outputs
ericcraw Aug 27, 2024
fea4752
Merge branch 'microsoft:main' into ovep-release-lnl-1.2
sfatimar Aug 27, 2024
e19f326
fix: Fixed model_proto serialized dump in Debug
ankitm3k Aug 27, 2024
524d766
Merge pull request #428 from intel/ankit/debug_fixes_ovep_lnl_1.2
sfatimar Aug 27, 2024
1e3dadd
Revert "Hard code onnx perf to use RT NPU for outputs"
ericcraw Aug 27, 2024
61a2d4a
Hard code onnx perf to use RT NPU for outputs fixed
ericcraw Aug 27, 2024
5800966
Fix onnx_perf_test app crash on tensor destroy
ericcraw Aug 27, 2024
075b14d
Upgrade Openvino version to 2024.3.0
jatinwadhwa921 Aug 28, 2024
59ba9c7
Merge pull request #433 from intel/jatin/upgarde_ov_to_2024_3
sfatimar Aug 28, 2024
5a3c793
Merge remote-tracking branch 'ericcraw/ericcraw/ort_allocator_hacking…
saurabhkale17 Aug 28, 2024
a7f19aa
refactor: remove redundant ort_shape_to_ovshape lambda function
saurabhkale17 Aug 28, 2024
20bca3b
alocate buffer in NPU visible region from perf test application
saurabhkale17 Aug 29, 2024
df617dd
remove redundant code
saurabhkale17 Aug 29, 2024
331679f
fix: Fixed model_proto serialized dump in Debug
ankitm3k Aug 27, 2024
6ed4988
Prototype shared memory allocator on Windows using OV-EP
javier-intel Jul 18, 2024
92652ed
Partially working allocator.
ericcraw Aug 23, 2024
2a06f44
Hard code onnx perf to use RT NPU allocator for inputs
ericcraw Aug 24, 2024
b83f8ac
Fix allocation lookups coming from different level zero contexts
ericcraw Aug 26, 2024
e812ca6
Page align OV allocation
ericcraw Aug 26, 2024
077881a
Allocate input as WC
ericcraw Aug 26, 2024
0060915
Only set tensors when they have changed.
ericcraw Aug 26, 2024
1468d38
Revert "Allocate input as WC"
ericcraw Aug 26, 2024
2334215
Hard code onnx perf to use RT NPU for outputs
ericcraw Aug 27, 2024
97f9e64
Revert "Hard code onnx perf to use RT NPU for outputs"
ericcraw Aug 27, 2024
6517a12
Hard code onnx perf to use RT NPU for outputs fixed
ericcraw Aug 27, 2024
3c2a997
Fix onnx_perf_test app crash on tensor destroy
ericcraw Aug 27, 2024
abe9f67
refactor: remove redundant ort_shape_to_ovshape lambda function
saurabhkale17 Aug 28, 2024
94b55a7
alocate buffer in NPU visible region from perf test application
saurabhkale17 Aug 29, 2024
966c48a
remove redundant code
saurabhkale17 Aug 29, 2024
a6004c5
add command line parameter in perf test for using remote tensors
saurabhkale17 Aug 29, 2024
ef44c87
add command line parameter in perf test for using remote tensors
saurabhkale17 Aug 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions include/onnxruntime/core/framework/allocator.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,15 @@ constexpr const char* HIP = "Hip";
constexpr const char* HIP_PINNED = "HipPinned";
constexpr const char* OpenVINO_CPU = "OpenVINO_CPU";
constexpr const char* OpenVINO_GPU = "OpenVINO_GPU";
constexpr const char* OpenVINO_NPU = "OpenVINO_RT_NPU";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenVINO_NPU is a redefinition of OpenVINO_RT_NPU.
Remove if its not referenced in the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not referenced so removed it in the new pr


// application
// 1. Allocate with ORT::CreateTensor("<custom_allocator_tag>")
// 2. "Manual" allocation

constexpr const char* OpenVINO_RT = "OpenVINO_RT";
constexpr const char* OpenVINO_RT_NPU = "OpenVINO_RT_NPU";
constexpr const char* WIN32_HANDLE = "WIN32_HANDLE";
constexpr const char* WEBGPU_BUFFER = "WebGPU_Buffer";

constexpr size_t kAllocAlignment = 256;
Expand Down
4 changes: 4 additions & 0 deletions onnxruntime/core/framework/allocator.cc
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,10 @@ ORT_API_STATUS_IMPL(OrtApis::CreateMemoryInfo, _In_ const char* name1, enum OrtA
*out = new OrtMemoryInfo(
name1, type, OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, static_cast<OrtDevice::DeviceId>(id1)), id1,
mem_type1);
} else if (strcmp(name1, onnxruntime::OpenVINO_RT_NPU) == 0) {
*out = new OrtMemoryInfo(
name1, type, OrtDevice(OrtDevice::NPU, OrtDevice::MemType::DEFAULT, static_cast<OrtDevice::DeviceId>(id1)), id1,
mem_type1);
} else if (strcmp(name1, onnxruntime::CUDA_PINNED) == 0) {
*out = new OrtMemoryInfo(
onnxruntime::CUDA_PINNED, type, OrtDevice(OrtDevice::CPU, OrtDevice::MemType::CUDA_PINNED, static_cast<OrtDevice::DeviceId>(id1)),
Expand Down
124 changes: 101 additions & 23 deletions onnxruntime/core/providers/openvino/backends/basic_backend.cc
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,6 @@
// Set the inference_num_threads property of the CPU
SetNumThreads(device_config);

#ifndef NDEBUG
if (IsDebugEnabled()) {
std::string file_name = subgraph_context.subgraph_name + "_static.onnx";
std::fstream outfile(file_name, std::ios::out | std::ios::trunc | std::ios::binary);
model_proto.SerializeToOstream(outfile);
}
#endif

try {
std::string dev_prec = global_context.device_type + "_" + global_context_.precision_str;

Expand Down Expand Up @@ -295,16 +287,99 @@
ORT_THROW(msg);
}
} else {
OVTensorPtr graph_input_blob;
auto tensor = context.GetInput(subgraph_context_.input_names.at(input_name));
auto allocator_name = tensor.GetTensorMemoryInfo().GetAllocatorName();
ov_tensor_data_t ov_tensor_key;
ort_tensor_key_t ort_tensor_key{tensor.GetTensorRawData(), allocator_name};
if (const auto& it = ort_ov_tensor_map.find(ort_tensor_key); it != ort_ov_tensor_map.end()) {
ov_tensor_key = it->second;
} else {
// Does this make sense for both types of allocators?
auto input = ie_cnn_network_->get_parameters().at(input_idx);
ov_tensor_key.tensor_ptr = std::make_shared<ov::Tensor>(input->get_element_type(), input->get_shape(),
(void*)tensor.GetTensorRawData());

Check notice on line 300 in onnxruntime/core/providers/openvino/backends/basic_backend.cc

View workflow job for this annotation

GitHub Actions / cpplint

[cpplint] onnxruntime/core/providers/openvino/backends/basic_backend.cc#L300

Using C-style cast. Use reinterpret_cast<void*>(...) instead [readability/casting] [4]
Raw output
onnxruntime/core/providers/openvino/backends/basic_backend.cc:300:  Using C-style cast.  Use reinterpret_cast<void*>(...) instead  [readability/casting] [4]
if (allocator_name == OpenVINO_RT_NPU) {
ov_tensor_key.copy_needed = false;
} else {
ov_tensor_key.copy_needed = true;
}
ort_ov_tensor_map.emplace(ort_tensor_key, ov_tensor_key);

try {
infer_request->SetTensor(input_name, ov_tensor_key.tensor_ptr);
} catch (const char* msg) {
ORT_THROW(msg);
}
}

if (ov_tensor_key.copy_needed) {
const char* ort_tensor_data = tensor.GetTensorData<char>();
size_t tensor_data_size = ov_tensor_key.tensor_ptr->get_byte_size();
auto ort_batch_memory_offset = ort_tensor_data + tensor_data_size * batch_slice_idx;
std::memcpy(ov_tensor_key.tensor_ptr->data(), ort_batch_memory_offset, tensor_data_size);
}
}
input_idx++;
}

// Set the output blob as remote blob
auto graph_output_info = exe_network_.Get().outputs();
auto output_idx = 0;
for (auto output_info_iter = graph_output_info.begin();
output_info_iter != graph_output_info.end(); ++output_info_iter) {
auto output_names = output_info_iter->get_names();
std::string onnx_output_name;
std::string output_name;
bool output_name_found = false;
// using the output name retrieved from ONNX original to match with the output names returned by OV tensors
for (auto it = subgraph_context_.output_names.begin(); it != subgraph_context_.output_names.end(); ++it) {
onnx_output_name = it->first;
if (output_names.find(onnx_output_name) != output_names.end()) {
// Assigning the output_name
output_name = it->first;
output_name_found = true;
break;
}
}
if (!output_name_found) {
ORT_THROW(
log_tag +
"Output names mismatch between OpenVINO and ONNX. [ONNX Output: ] " +
onnx_output_name + " doesn't exist in the list of OpenVINO output tensor names");
}

size_t batch_size = 1;
Ort::UnownedValue tensor = GetOutputTensor(context,
batch_size,
infer_request,
output_name,
subgraph_context_.output_names);
auto allocator_name = tensor.GetTensorMemoryInfo().GetAllocatorName();

ov_tensor_data_t ov_tensor_data;
ort_tensor_key_t ort_tensor_key{tensor.GetTensorRawData(), allocator_name};
if (const auto& it = ort_ov_tensor_map.find(ort_tensor_key); it != ort_ov_tensor_map.end()) {
ov_tensor_data = it->second;
} else {
auto output = ie_cnn_network_->get_results().at(output_idx);
ov_tensor_data.tensor_ptr = std::make_shared<ov::Tensor>(output->get_element_type(), output->get_shape(),
(void*)tensor.GetTensorRawData());

Check notice on line 366 in onnxruntime/core/providers/openvino/backends/basic_backend.cc

View workflow job for this annotation

GitHub Actions / cpplint

[cpplint] onnxruntime/core/providers/openvino/backends/basic_backend.cc#L366

Using C-style cast. Use reinterpret_cast<void*>(...) instead [readability/casting] [4]
Raw output
onnxruntime/core/providers/openvino/backends/basic_backend.cc:366:  Using C-style cast.  Use reinterpret_cast<void*>(...) instead  [readability/casting] [4]
if(allocator_name == OpenVINO_RT_NPU) {

Check notice on line 367 in onnxruntime/core/providers/openvino/backends/basic_backend.cc

View workflow job for this annotation

GitHub Actions / cpplint

[cpplint] onnxruntime/core/providers/openvino/backends/basic_backend.cc#L367

Missing space before ( in if( [whitespace/parens] [5]
Raw output
onnxruntime/core/providers/openvino/backends/basic_backend.cc:367:  Missing space before ( in if(  [whitespace/parens] [5]
ov_tensor_data.copy_needed = false;
} else {
ov_tensor_data.copy_needed = true;
}
ort_ov_tensor_map.emplace(ort_tensor_key, ov_tensor_data);

try {
graph_input_blob = infer_request->GetTensor(input_name);
infer_request->SetTensor(output_name, ov_tensor_data.tensor_ptr);
} catch (const char* msg) {
ORT_THROW(msg);
}
FillInputBlob(std::move(graph_input_blob), batch_slice_idx, std::move(input_name), context, subgraph_context_);
}
input_idx++;
output_idx++;
}

// Start Async inference
infer_request->StartAsync();
} catch (const char* msg) {
Expand Down Expand Up @@ -430,7 +505,6 @@
auto graph_output_info = exe_network_.Get().outputs();
for (auto output_info_iter = graph_output_info.begin();
output_info_iter != graph_output_info.end(); ++output_info_iter) {
OVTensorPtr graph_output_blob;
auto output_names = output_info_iter->get_names();
std::string onnx_output_name;
std::string output_name;
Expand All @@ -454,20 +528,24 @@
" doesn't exist in the "
"list of OpenVINO output tensor names");
}
try {
graph_output_blob = infer_request->GetTensor(output_name);
} catch (const char* msg) {
ORT_THROW(msg);
}

size_t batch_size = 1;
Ort::UnownedValue output_tensor =
GetOutputTensor(context, batch_size, infer_request, std::move(output_name), subgraph_context_.output_names);
auto mem_info = output_tensor.GetTensorMemoryInfo();
if (mem_info.GetAllocatorName() == OpenVINO_GPU) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if this has effect on OpenVINO_GPU IOBuffer

return;
auto allocator_name = output_tensor.GetTensorMemoryInfo().GetAllocatorName();
ov_tensor_data_t ov_tensor_data;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if the declaration in startasyncinference is redundant

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will require the ov_tensor_data for creating the input/output tensor before the inference in startasyncinference

ort_tensor_key_t ort_tensor_key{output_tensor.GetTensorRawData(), allocator_name};
if (const auto& it = ort_ov_tensor_map.find(ort_tensor_key); it != ort_ov_tensor_map.end()) {
ov_tensor_data = it->second;
} else {
size_t batch_slice = 0;
FillOutputBlob(std::move(graph_output_blob), output_tensor, batch_slice);
ORT_THROW(log_tag + "Expected all outputs to have associated OV::Tensor's");
}

if (ov_tensor_data.copy_needed) {
auto ort_tensor_data = output_tensor.GetTensorMutableData<char>();
size_t tensor_data_size = ov_tensor_data.tensor_ptr->get_byte_size();
auto ort_batch_memory_offset = ort_tensor_data /*+ tensor_data_size * batch_size*/;
std::memcpy(ort_batch_memory_offset, ov_tensor_data.tensor_ptr->data(), tensor_data_size);
}
}

Expand Down
9 changes: 9 additions & 0 deletions onnxruntime/core/providers/openvino/backends/basic_backend.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include <string>
#include <condition_variable>
#include <mutex>
#include <map>

#include "core/session/onnxruntime_cxx_api.h"
#include "core/providers/openvino/contexts.h"
Expand All @@ -20,6 +21,11 @@
namespace onnxruntime {
namespace openvino_ep {

struct ov_tensor_data_t {
OVTensorPtr tensor_ptr;
bool copy_needed;
};

class InferRequestsQueue;
class BasicBackend : public IBackend {
public:
Expand Down Expand Up @@ -60,6 +66,9 @@
#if defined IO_BUFFER_ENABLED
OVRemoteContextPtr remote_context_;
#endif

using ort_tensor_key_t = std::pair<const void *, const std::string>;

Check notice on line 70 in onnxruntime/core/providers/openvino/backends/basic_backend.h

View workflow job for this annotation

GitHub Actions / cpplint

[cpplint] onnxruntime/core/providers/openvino/backends/basic_backend.h#L70

Add #include <utility> for pair<> [build/include_what_you_use] [4]
Raw output
onnxruntime/core/providers/openvino/backends/basic_backend.h:70:  Add #include <utility> for pair<>  [build/include_what_you_use] [4]
std::map<ort_tensor_key_t, ov_tensor_data_t> ort_ov_tensor_map;
};

class InferRequestsQueue {
Expand Down
13 changes: 13 additions & 0 deletions onnxruntime/core/providers/openvino/openvino_execution_provider.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "core/providers/openvino/onnx_ctx_model_helper.h"
#include "core/providers/openvino/ov_versions/capability.h"
#include "openvino/core/version.hpp"
#include "core/providers/openvino/ov_allocator.h"

#define MEMCPY_S(dest, src, destsz, srcsz) memcpy(dest, src, std::min(destsz, srcsz))

Expand Down Expand Up @@ -180,4 +181,16 @@
return Status::OK();
}

std::vector<AllocatorPtr> OpenVINOExecutionProvider::CreatePreferredAllocators() {
AllocatorCreationInfo npu_allocator_info {
[this](OrtDevice::DeviceId device_id) {
return std::make_unique<OVRTAllocator>(global_context_->ie_core.Get(), OrtDevice::NPU, device_id, OpenVINO_RT_NPU);

Check notice on line 187 in onnxruntime/core/providers/openvino/openvino_execution_provider.cc

View workflow job for this annotation

GitHub Actions / cpplint

[cpplint] onnxruntime/core/providers/openvino/openvino_execution_provider.cc#L187

Lines should be <= 120 characters long [whitespace/line_length] [2]
Raw output
onnxruntime/core/providers/openvino/openvino_execution_provider.cc:187:  Lines should be <= 120 characters long  [whitespace/line_length] [2]
},
0,
};

// fill in allocator
return std::vector<AllocatorPtr>{CreateAllocator(npu_allocator_info)};
}

} // namespace onnxruntime
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,8 @@ class OpenVINOExecutionProvider : public IExecutionProvider {
return nullptr;
}

std::vector<AllocatorPtr> CreatePreferredAllocators() override;

private:
std::unique_ptr<openvino_ep::GlobalContext> global_context_;
openvino_ep::EPCtxHandler ep_ctx_handle_{};
Expand Down
54 changes: 54 additions & 0 deletions onnxruntime/core/providers/openvino/ov_allocator.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
// Copyright (C) Intel Corporation
Dismissed Show dismissed Hide dismissed
// Licensed under the MIT License

#include "core/providers/openvino/ov_allocator.h"
#include "core/providers/openvino/ov_interface.h"
#include "openvino/runtime/intel_npu/level_zero/level_zero.hpp"
#include "openvino/runtime/intel_npu/properties.hpp"

namespace onnxruntime {

using namespace openvino_ep;

Check notice on line 11 in onnxruntime/core/providers/openvino/ov_allocator.cc

View workflow job for this annotation

GitHub Actions / cpplint

[cpplint] onnxruntime/core/providers/openvino/ov_allocator.cc#L11

Do not use namespace using-directives. Use using-declarations instead. [build/namespaces] [5]
Raw output
onnxruntime/core/providers/openvino/ov_allocator.cc:11:  Do not use namespace using-directives.  Use using-declarations instead.  [build/namespaces] [5]

constexpr size_t default_alignment = 4096;

static inline size_t align_up(size_t size, size_t pow2_alignment) {
return (size + pow2_alignment - 1) & ~(pow2_alignment - 1);
}

OVRTAllocator::OVRTAllocator(ov::Core& core, OrtDevice::DeviceType device_type, OrtDevice::DeviceId device_id, const char* name) : IAllocator(OrtMemoryInfo(name, OrtAllocatorType::OrtDeviceAllocator, OrtDevice(device_type, OrtDevice::MemType::DEFAULT, device_id), device_id, OrtMemTypeCPUInput)), core_(core) {

Check notice on line 19 in onnxruntime/core/providers/openvino/ov_allocator.cc

View workflow job for this annotation

GitHub Actions / cpplint

[cpplint] onnxruntime/core/providers/openvino/ov_allocator.cc#L19

Lines should be <= 120 characters long [whitespace/line_length] [2]
Raw output
onnxruntime/core/providers/openvino/ov_allocator.cc:19:  Lines should be <= 120 characters long  [whitespace/line_length] [2]
if (device_type == OrtDevice::NPU) {
remote_ctx_ = core_.get_default_context("NPU").as<ov::intel_npu::level_zero::ZeroContext>();
} else {
ORT_THROW("Invalid device type");
}
}

void* OVRTAllocator::Alloc(size_t size) {
try {
size_t alloc_size = align_up(size + sizeof(ov::Tensor*) + default_alignment, default_alignment);
ov::Tensor* tensor = new ov::Tensor(remote_ctx_.create_host_tensor(ov::element::Type_t::u8,
{ alloc_size }));
uintptr_t data_ptr = reinterpret_cast<uintptr_t>(tensor->data());

ov::Tensor** ptr = reinterpret_cast<ov::Tensor**>(align_up(data_ptr + sizeof(ov::Tensor*), default_alignment));
ptr[-1] = tensor;

return reinterpret_cast<void*>(ptr);

Check notice on line 38 in onnxruntime/core/providers/openvino/ov_allocator.cc

View workflow job for this annotation

GitHub Actions / cpplint

[cpplint] onnxruntime/core/providers/openvino/ov_allocator.cc#L38

Redundant blank line at the end of a code block should be deleted. [whitespace/blank_line] [3]
Raw output
onnxruntime/core/providers/openvino/ov_allocator.cc:38:  Redundant blank line at the end of a code block should be deleted.  [whitespace/blank_line] [3]
} catch (const ov::Exception& e) {
ORT_THROW(std::string("Alloc failed: ") + e.what());
}
return nullptr;
}

void OVRTAllocator::Free(void* p) {
try {
ov::Tensor** ptr = reinterpret_cast<ov::Tensor**>(p);
delete ptr[-1];
} catch (const ov::Exception& e) {
ORT_THROW(std::string("Free failed: ") + e.what());

Check notice on line 50 in onnxruntime/core/providers/openvino/ov_allocator.cc

View workflow job for this annotation

GitHub Actions / cpplint

[cpplint] onnxruntime/core/providers/openvino/ov_allocator.cc#L50

Add #include <string> for string [build/include_what_you_use] [4]
Raw output
onnxruntime/core/providers/openvino/ov_allocator.cc:50:  Add #include <string> for string  [build/include_what_you_use] [4]
}
}

} // namespace onnxruntime
24 changes: 24 additions & 0 deletions onnxruntime/core/providers/openvino/ov_allocator.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Copyright (C) Intel Corporation
Dismissed Show dismissed Hide dismissed
// Licensed under the MIT License

#pragma once

#include "core/common/inlined_containers.h"
#include "core/framework/allocator.h"
#include "openvino/runtime/remote_context.hpp"


namespace onnxruntime {

class OVRTAllocator : public IAllocator {
public:
OVRTAllocator(ov::Core &core, OrtDevice::DeviceType device_type, OrtDevice::DeviceId device_id, const char* name);
void* Alloc(size_t size) override;
void Free(void* p) override;

private:
ov::Core &core_;
ov::RemoteContext remote_ctx_;
};

} // namespace onnxruntime
Loading
Loading