Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce test time for TensorRT EP CI #10408

Merged
merged 26 commits into from
Feb 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ed344c7
expand model tests name
chilo-ms Jan 21, 2022
f3d9772
skip cpu/cuda for trt when running onnxruntime_test_all
chilo-ms Jan 21, 2022
3e5267b
only run trt ep for c++ unit test
chilo-ms Jan 21, 2022
0034c55
Update CMAKE_CUDA_ARCHITECTURES for T4
chilo-ms Jan 21, 2022
ed5e28c
Use new t4 agent pool
chilo-ms Jan 22, 2022
dd70a4b
Update YAML for run T4 on Windows
chilo-ms Jan 22, 2022
8ac7ef6
revert code
chilo-ms Jan 22, 2022
0c897b2
Update CMAKE_CUDA_ARCHITECTURES
chilo-ms Jan 22, 2022
0dfab8e
fix wrong value
chilo-ms Jan 23, 2022
333568a
Remove cpu/cuda directly in model tests
chilo-ms Jan 24, 2022
e9f48a2
add only CMAKE_CUDA_ARCHITECTURES=75
chilo-ms Jan 24, 2022
efe6d31
remove expanding model test name to see difference
chilo-ms Jan 25, 2022
e069b3c
revert code
chilo-ms Jan 25, 2022
29c1f52
Add fallback execution provider for unit test
chilo-ms Jan 25, 2022
2c6ccd9
Add fallback execution provider for unit test (cont)
chilo-ms Jan 25, 2022
6d22886
add conditional to add fackback cuda ep
chilo-ms Jan 26, 2022
a5e6a82
Reduction op takes much longer time for TRT 8.2, so we test smaller r…
chilo-ms Jan 26, 2022
3b5a998
use M60
chilo-ms Jan 26, 2022
f0e643c
revert code
chilo-ms Jan 27, 2022
c04ef19
Merge branch 'master' into update_trt_ci
chilo-ms Jan 27, 2022
daa6a1c
revert code
chilo-ms Jan 27, 2022
18c5dd8
add comments
chilo-ms Jan 27, 2022
93013ec
Modify code and add comment
chilo-ms Jan 27, 2022
2a75eb8
modify comment
chilo-ms Jan 27, 2022
8880456
update comment
chilo-ms Jan 29, 2022
42dc63c
add comment
chilo-ms Jan 31, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions cmake/onnxruntime_unittests.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ endif()

set(disabled_warnings)
function(AddTest)
cmake_parse_arguments(_UT "DYN" "TARGET" "LIBS;SOURCES;DEPENDS" ${ARGN})
cmake_parse_arguments(_UT "DYN" "TARGET" "LIBS;SOURCES;DEPENDS;TEST_ARGS" ${ARGN})
list(REMOVE_DUPLICATES _UT_SOURCES)

if (${CMAKE_SYSTEM_NAME} STREQUAL "iOS")
Expand Down Expand Up @@ -93,7 +93,7 @@ function(AddTest)
target_compile_options(${_UT_TARGET} PRIVATE "-Wno-error=uninitialized")
endif()

set(TEST_ARGS)
set(TEST_ARGS ${_UT_TEST_ARGS})
if (onnxruntime_GENERATE_TEST_REPORTS)
# generate a report file next to the test program
if (onnxruntime_BUILD_WEBASSEMBLY)
Expand Down Expand Up @@ -682,13 +682,25 @@ if (onnxruntime_BUILD_WEBASSEMBLY)
endif()
endif()

set(test_all_args)
if (onnxruntime_USE_TENSORRT)
# TRT EP CI takes much longer time when updating to TRT 8.2
# So, we only run trt ep and exclude other eps to reduce CI test time.
#
# The test names of model tests were using sequential number in the past.
# This PR https://github.com/microsoft/onnxruntime/pull/10220 (Please see ExpandModelName function in model_tests.cc for more details)
# made test name contain the "ep" and "model path" information, so we can easily filter the tests using cuda ep or other ep with *cpu__* or *xxx__*.
list(APPEND test_all_args "--gtest_filter=-*cpu__*:*cuda__*" )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on what we want to exclude here would be helpful.

Copy link
Contributor Author

@chilo-ms chilo-ms Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments are added here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where the "cpu__" and "cuda__" in the test names come from - I didn't see them in the code. Explaining that would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the comment more clear.

# The test names of model tests were using sequential number in the past.
# This PR https://github.com/microsoft/onnxruntime/pull/10220 (Please see ExpandModelName function in model_tests.cc for more details) 
# made test name contain the "ep" and "model path" information, so we can easily filter the tests using cuda ep or other ep with *cpu__* or *xxx__*.  

endif ()

AddTest(
TARGET onnxruntime_test_all
SOURCES ${all_tests} ${onnxruntime_unittest_main_src}
LIBS
onnx_test_runner_common ${onnxruntime_test_providers_libs} ${onnxruntime_test_common_libs}
onnx_test_data_proto nlohmann_json::nlohmann_json
DEPENDS ${all_dependencies}
TEST_ARGS ${test_all_args}
)
if (MSVC)
# The warning means the type of two integral values around a binary operator is narrow than their result.
Expand Down
28 changes: 24 additions & 4 deletions onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1553,9 +1553,15 @@ void test_apex_reduce_sum(
}

TEST(ReductionOpTest, ReduceSum_apex_matrix_large) {
#ifdef USE_TENSORRT
// Reduction op takes much longer time for TRT 8.2, so we test smaller range of inputs.
int64_t threshold = 4096;
#else
int64_t threshold = 32768;
#endif
for (int64_t m = 1; m < 2049; m *= 8) {
for (int64_t n = 2; n < 2049; n *= 8) {
if (m * n > 32768) {
if (m * n > threshold) {
continue;
}
test_apex_reduce_sum(m, n);
Expand Down Expand Up @@ -1583,7 +1589,13 @@ TEST(ReductionOpTest, ReduceSum_batch_by_two) {
}

TEST(ReductionOpTest, ReduceSum_batch_by_seq_by_128) {
for (int i = 1; i < 16; i += 1) {
#ifdef USE_TENSORRT
// Reduction op takes much longer time for TRT 8.2, so we test smaller range of inputs.
int i_max = 8;
#else
int i_max = 16;
#endif
for (int i = 1; i < i_max; i += 1) {
test_apex_reduce_sum(i * 128, 128);
test_apex_reduce_sum(i * 512, 128);
test_apex_reduce_sum(i * 128, 768);
Expand Down Expand Up @@ -1612,8 +1624,16 @@ TEST(ReductionOpTest, ReduceSum_bert_selected_batch_size) {

TEST(ReductionOpTest, ReduceSum_apex_more) {
std::srand(0);
for (int64_t m = 1; m < 16; ++m) {
for (int64_t n = 1; n < 16; ++n) {
#ifdef USE_TENSORRT
// Reduction op takes much longer time for TRT 8.2, so we test smaller range of inputs.
int64_t m_max = 8;
int64_t n_max = 8;
#else
int64_t m_max = 16;
int64_t n_max = 16;
#endif
for (int64_t m = 1; m < m_max; ++m) {
for (int64_t n = 1; n < n_max; ++n) {
const auto m_ = 2 * m;
const auto n_ = 2 * n;
test_apex_reduce_sum(m_, n_);
Expand Down
13 changes: 13 additions & 0 deletions onnxruntime/test/providers/provider_test_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -994,6 +994,12 @@ void OpTester::Run(
std::vector<std::string> output_names;
FillFeedsAndOutputNames(feeds, output_names);
// Run the model
#ifdef USE_TENSORRT
// only run trt ep to reduce test time
static const std::string all_provider_types[] = {
kTensorrtExecutionProvider,
};
#else
static const std::string all_provider_types[] = {
kCpuExecutionProvider,
kCudaExecutionProvider,
Expand All @@ -1008,6 +1014,7 @@ void OpTester::Run(
kRocmExecutionProvider,
kCoreMLExecutionProvider,
};
#endif

bool has_run = false;

Expand Down Expand Up @@ -1168,8 +1175,14 @@ void OpTester::Run(
cur_provider = "not set";
}

#ifdef USE_TENSORRT
// We are allowing tests to be run with only TensorRT EP, but TensorRT EP may not support all tests and may be in excluded providers list.
// So, no registered EPs were able to run the model is okay for this situation.
ORT_UNUSED_PARAMETER(has_run);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here to explain this case here.
we are allowing tests to be run with only TensorRT EP, but TensorRT EP may not support all tests and may be in excluded providers list.

#else
EXPECT_TRUE(has_run)
<< "No registered execution providers were able to run the model.";
#endif
}
}
ORT_CATCH(const std::exception& ex) {
Expand Down