Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TRT provider option configuration for C# #7179

Closed
wants to merge 17 commits into from

Conversation

chilo-ms
Copy link
Contributor

Description: Enable TRT provider option configuration for C#

      Example for setting:
        SessionOptions.OrtTensorRTProviderOptions trt_options;
        trt_options.device_id = 0;
        trt_options.has_trt_options = 1;
        trt_options.trt_max_workspace_size = (UIntPtr) (1<<30);
        trt_options.trt_fp16_enable = 1;
        trt_options.trt_int8_enable = 1;
        trt_options.trt_int8_calibration_table_name = "C:\calibration.flatbuffers";
        trt_options.trt_int8_use_native_calibration_table = 0;

Motivation and Context

  • Why is this change required? What problem does it solve?
  • If it fixes an open issue, please link to the issue here.

@chilo-ms chilo-ms requested review from stevenlix and jywu-msft March 30, 2021 12:22
@chilo-ms chilo-ms requested a review from a team as a code owner March 30, 2021 12:22
@jywu-msft
Copy link
Member

jywu-msft commented Mar 30, 2021

CI failure is due to failing api doc check.

@jywu-msft
Copy link
Member

@hariharans29 can you also help take a look at this PR? Thanks!

@jywu-msft
Copy link
Member

jywu-msft commented Mar 30, 2021

is it possible to add test cases?

@chilo-ms
Copy link
Contributor Author

chilo-ms commented Apr 1, 2021

I've added some test cases.
Once From/ToProviderOptions() has been implemented on TRT EP, we can read the option back directly to check and which can cover more test cases.

@chilo-ms chilo-ms closed this Apr 5, 2021
@chilo-ms chilo-ms reopened this Apr 5, 2021
@hariharans29
Copy link
Member

hariharans29 commented Apr 5, 2021

One of the open-ended questions that @pranavsharma and me were recently discussing is if we can allow transparent structs in the C API or whether we should make them opaque and have the creation of the struct go through an API. Some things to consider:

  1. Will the presence of transparent structs in the C API make it harder for language bindings to support similar config structs ?
  2. Versioning of the structs (adding a new field).

It is something to discuss and weigh the pros/cons for each approach before we merge this change and release this because if we decide to ship this approach, it might be harder to switch to the other (it will also make it harder on the users of ORT if we keep switching things around too much)

FWIW- I had a PR for OrtCudaProviderOptions that explores the other approach (make provider struct opaque and expose an API for creation) - #6291.

I feel it it is best to agree upon one approach now that we are enabling language bindings for provider config structs.

@chilo-ms
Copy link
Contributor Author

chilo-ms commented Apr 9, 2021

Thanks @hariharans29 for the explanation. Based on what we've discussed offline, we decided to use transparent structs in C API for provider options. I will create another PR for adding CUDA options in order not to make this PR involve too much modifications.

int trt_int8_enable; // enable TensorRT INT8 precision. Default 0 = false, nonzero = true
const char* trt_int8_calibration_table_name; // TensorRT INT8 calibration table name.
int trt_int8_use_native_calibration_table; // use native TensorRT generated calibration table. Default 0 = false, nonzero = true
int trt_max_partition_iterations; // maximum number of iterations allowed in model partitioning for TensorRT.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to add these new options, don't you also need to update TensorrtExecutionProvider constructor to check these options (along side the code that checks the environment variables) ?
please test all these options are working properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, previously I planed to do so in another PR, but I will include the modification in this PR for updating TensorrtExecutionProvider construct and checking theses options can be effective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tested the internal provider options (e.g. max_workspace_size_, int8_use_native_tensorrt_calibration_table_ ....) can be configured via c# provider options by building the nuget package and run c# application.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

jywu-msft
jywu-msft previously approved these changes Apr 13, 2021
@jywu-msft jywu-msft dismissed their stale review April 15, 2021 15:57

revisit

public int trt_min_subgraph_size; // minimum node size in a subgraph after partitioning.
public int trt_dump_subgraphs; // dump the subgraphs that are transformed into TRT engines in onnx format to the filesystem. Default 0 = false, nonzero = true
public int trt_engine_cache_enable; // enable TensorRT engine caching. Default 0 = false, nonzero = true
public IntPtr trt_cache_path; // specify path for TensorRT engine and profile files if engine_cache_enable is enabled, or INT8 calibration table file if trt_int8_enable is enabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trt_cache_path

Need to clarify how to initialize this, since paths in Windows are in UTF-16 and in Linux it is UTF-8.

/// <summary>
/// Holds provider options configuration for creating an InferenceSession.
/// </summary>
public class ProviderOptions : SafeHandle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SafeHandle

Similar comment. Need to decide if we want to keep this as a SafeHandle. If there is a little chance that this will ever hold a native resource, then it would be a burden on a user always dipose of it.

public static SessionOptions MakeSessionOptionWithTensorrtProvider(int deviceId = 0)
{
CheckTensorrtExecutionProviderDLLs();
SessionOptions options = new SessionOptions();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SessionOptions options = new SessionOptions();

Same comment on handling leaking options when the below throws.

public static SessionOptions MakeSessionOptionWithTensorrtProvider(OrtTensorRTProviderOptions trt_options)
{
CheckTensorrtExecutionProviderDLLs();
SessionOptions options = new SessionOptions();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SessionOptions

Ditto.Need to be dispose of on exception

@yuslepukhin
Copy link
Member

        SessionOptions options = new SessionOptions();

There is a leak here too, although this is not a part of your changes.


Refers to: csharp/src/Microsoft.ML.OnnxRuntime/SessionOptions.cs:152 in 532c899. [](commit_id = 532c899, deletion_comment = False)

var tableNamePinned = GCHandle.Alloc(NativeOnnxValueHelper.StringToZeroTerminatedUtf8(trt_options.trt_int8_calibration_table_name), GCHandleType.Pinned);
using (var pinnedSettingsName = new PinnedGCHandle(tableNamePinned))
{
trt_options_native.trt_int8_calibration_table_name = pinnedSettingsName.Pointer;
Copy link
Member

@yuslepukhin yuslepukhin Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinnedSettingsName

But using clause will unpin this and will invalidate the Pointer value.

var cachePathPinned = GCHandle.Alloc(NativeOnnxValueHelper.StringToZeroTerminatedUtf8(trt_options.trt_cache_path), GCHandleType.Pinned);
using (var pinnedSettingsName2 = new PinnedGCHandle(cachePathPinned))
{
trt_options_native.trt_cache_path = pinnedSettingsName2.Pointer;
Copy link
Member

@yuslepukhin yuslepukhin Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinnedSettingsName2

Same thing here. Memory should say pinned for the duration of the native call.This is done, so GC does not move memory while it is accessed from the native code. Thus, pinning should take place just prior to native call that is using this and then unpinned right after the call.

trt_options.trt_int8_enable = 1;
trt_options.trt_int8_use_native_calibration_table = 0;

var session = new InferenceSession(modelPath, SessionOptions.MakeSessionOptionWithTensorrtProvider(trt_options));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session

Session is a disposable class. So this should either be wrapped into a using clause, tyr/finally. OR, if you have trouble managing so many disposables, you can add then to a disposable list which alone would require disposal. See examples in this file. The issue here, people copy this code as examples, and then complain about leaks.

int trt_min_subgraph_size; // minimum node size in a subgraph after partitioning.
int trt_dump_subgraphs; // dump the subgraphs that are transformed into TRT engines in onnx format to the filesystem. Default 0 = false, nonzero = true
int trt_engine_cache_enable; // enable TensorRT engine caching. Default 0 = false, nonzero = true
const char* trt_cache_path; // specify path for TensorRT engine and profile files if engine_cache_enable is enabled, or INT8 calibration table file if trt_int8_enable is enabled.
Copy link
Member

@yuslepukhin yuslepukhin Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pecify path for

Need to document the encoding of the path. We seemed to require UTF-16 on windows and UTF-8 on LInux. So how does this fit into that pattern?

int trt_min_subgraph_size; // minimum node size in a subgraph after partitioning.
int trt_dump_subgraphs; // dump the subgraphs that are transformed into TRT engines in onnx format to the filesystem. Default 0 = false, nonzero = true
int trt_engine_cache_enable; // enable TensorRT engine caching. Default 0 = false, nonzero = true
const char* trt_cache_path; // specify path for TensorRT engine and profile files if engine_cache_enable is enabled, or INT8 calibration table file if trt_int8_enable is enabled.
} OrtTensorRTProviderOptions;
Copy link
Member

@yuslepukhin yuslepukhin Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should watch the extensiblity of such structs and modify all languages that makes use of this C API at the same time.
The size of the struct changes and if this is used in the compiled applications, then it is not binary compatible. We can't extend the structs without versioning the API OR don't use the structs.

int trt_min_subgraph_size = 1;
bool trt_dump_subgraphs = false;
bool trt_engine_cache_enable = false;
std::string trt_cache_path = "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= ""

Why?

// trt_options.trt_int8_enable = 1;
// trt_options.trt_int8_calibration_table_name = "calibration.flatbuffers";
// trt_options.trt_int8_use_native_calibration_table = 0;
public struct OrtTensorRTProviderOptions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrtTensorRTProviderOptions

Here we two have two duplicate structures essentially for the same purpose. Can we unify them?

size_t trt_max_workspace_size; // maximum workspace size for TensorRT.
int trt_fp16_enable; // enable TensorRT FP16 precision. Default 0 = false, nonzero = true
int trt_int8_enable; // enable TensorRT INT8 precision. Default 0 = false, nonzero = true
const char* trt_int8_calibration_table_name; // TensorRT INT8 calibration table name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensorRT INT8 calibration table nam

Need to document string encoding

@chilo-ms chilo-ms mentioned this pull request Apr 28, 2021
@chilo-ms
Copy link
Contributor Author

chilo-ms commented May 6, 2021

This PR is currently pending since we had some concerns regarding exposing provider options struct.

@jywu-msft
Copy link
Member

closing. an updated version replaced this one.

@jywu-msft jywu-msft closed this Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants