Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add verbose and optimization args for parity tests (Gelu, Layernorm, … #9

Merged
merged 1 commit into from
May 30, 2023

Conversation

TedThemistokleous
Copy link

…… (microsoft#14739)

…GPT_Attention)

Some EPs require that onnxruntime and optimum optimizations are turned off in order to run correctly. Allowing this option during test runs allows the EP and library to perform their own optimization and be more representative of actual use case conditions.

Important for EPs like MIGraphX which require optimizations to be offer for certain operations

Allow flags to turn off optimizations and add verbose output to confirm which EP is being used for the inference run and validate fallbacks

Related to: microsoft#14702 & microsoft#14700


Description

Motivation and Context

microsoft#14739)

…GPT_Attention)

Some EPs require that onnxruntime and optimum optimizations are turned
off in order to run correctly. Allowing this option during test runs
allows the EP and library to perform their own optimization and be more
representative of actual use case conditions.

Important for EPs like MIGraphX which require optimizations to be offer
for certain operations

<!-- Describe your changes. -->

Allow flags to turn off optimizations and add verbose output to confirm
which EP is being used for the inference run and validate fallbacks

<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Related to: microsoft#14702 & microsoft#14700

---------

Signed-off-by: Ted Themistokleous <[email protected]>
Co-authored-by: Ted Themistokleous <[email protected]>
@jeffdaily jeffdaily merged commit e5839a1 into rocm5.6_internal_testing May 30, 2023
groenenboomj pushed a commit that referenced this pull request Sep 26, 2023
### Description
Release OrtEnv before main function returns. Before this change, OrtEnv
is deleted when C/C++ runtime destructs all global variables in ONNX
Runtime's core framework.
The callstack is like this:
```
  * frame #0: 0x00007fffee39f5a6 libonnxruntime.so.1.16.0`onnxruntime::Environment::~Environment(this=0x00007fffee39fbf2) at environment.h:20:7
    frame #1: 0x00007fffee39f614 libonnxruntime.so.1.16.0`std::default_delete<onnxruntime::Environment>::operator()(this=0x00007ffff4c30e50, __ptr=0x0000000005404b00) const at unique_ptr.h:85:2
    frame #2: 0x00007fffee39edca libonnxruntime.so.1.16.0`std::unique_ptr<onnxruntime::Environment, std::default_delete<onnxruntime::Environment>>::~unique_ptr(this=0x5404b00) at unique_ptr.h:361:17
    frame #3: 0x00007fffee39e2ab libonnxruntime.so.1.16.0`OrtEnv::~OrtEnv(this=0x00007ffff4c30e50) at ort_env.cc:43:1
    frame #4: 0x00007fffee39fa96 libonnxruntime.so.1.16.0`std::default_delete<OrtEnv>::operator()(this=0x00007fffefff8f78, __ptr=0x00007ffff4c30e50) const at unique_ptr.h:85:2
    frame #5: 0x00007fffee39f394 libonnxruntime.so.1.16.0`std::unique_ptr<OrtEnv, std::default_delete<OrtEnv>>::~unique_ptr(this=0x7ffff4c30e50) at unique_ptr.h:361:17
    frame #6: 0x00007ffff78574b5 libc.so.6`__run_exit_handlers + 261
    frame #7: 0x00007ffff7857630 libc.so.6`exit + 32
    frame #8: 0x00007ffff783feb7 libc.so.6`__libc_start_call_main + 135
    frame #9: 0x00007ffff783ff60 libc.so.6`__libc_start_main@@GLIBC_2.34 + 128
    frame #10: 0x0000000000abbdee node`_start + 46
```
After this change, OrtEnv will be deleted before the main function
returns and nodejs is still alive.
jagadish-amd pushed a commit to jagadish-amd/onnxruntime that referenced this pull request Sep 10, 2024
… transient connection exceptions. (microsoft#21612)

### Description
Improve docker commands to make docker image layer caching works.
It can make docker building faster and more stable.
So far, A100 pool's system disk is too small to use docker cache.
We won't use pipeline cache for docker image and remove some legacy
code.

### Motivation and Context
There are often an exception of
```
64.58 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
286.4 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)
```
Because Onnxruntime pipeline have been sending too many requests to
download Nodejs in docker building.
Which is the major reason of pipeline failing now

In fact, docker image layer caching never works.
We can always see the scrips are still running
```
ROCm#9 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
ROCm#9 0.234 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
ROCm#9 0.235 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
ROCm#9 0.235 /tmp/scripts/install_centos.sh: line 1: !/bin/bash: No such file or directory
ROCm#9 0.235 ++ '[' '!' -f /etc/yum.repos.d/microsoft-prod.repo ']'
ROCm#9 0.236 +++ tr -dc 0-9.
ROCm#9 0.236 +++ cut -d . -f1
ROCm#9 0.238 ++ os_major_version=8
....
ROCm#9 60.41 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
ROCm#9 60.59 + return 0
...
```

This PR is improving the docker command to make image layer caching
work.
Thus, CI won't send so many redundant request of downloading NodeJS.
```
ROCm#9 [2/5] ADD scripts /tmp/scripts
ROCm#9 CACHED

ROCm#10 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
ROCm#10 CACHED

ROCm#11 [4/5] RUN adduser --uid 1000 onnxruntimedev
ROCm#11 CACHED

ROCm#12 [5/5] WORKDIR /home/onnxruntimedev
ROCm#12 CACHED
```

###Reference
https://docs.docker.com/build/drivers/

---------

Co-authored-by: Yi Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants