-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EfficientNet inference yields incorrect results on GPU #519
Comments
Hi, @liamnr2! Welcome to GitHub and thanks for reporting this issue. Seemingly random results are hard to test for so it is very valuable that you found some. Unfortunately, the RX470 uses a Polaris 10 chip, which shares the gfx803 compile target with a bunch of other popular GPUs. For a list of the affected GPUs see #479. There have been quite a number of issues with this compile target, only some of which could be resolved, yet. For a full list see the gfx803 tag. To find the cause of this behavior, we need to reproduce these issues with various combinations of hardware and software. I can try and help with creating a reproducing procedure. A wild guess would be to try downgrading rocm-opencl, which has helped with gfx803 in some cases: |
Procedure for reproduction:
|
I can reproduce this issue. Using GPU 0 fails:
Using GPU 1 fails:
Using GPU 2 fails:
Using GPU 3 fails:
These results seem indeed random or undeterministic:
Using CPU works fine:
I am using R9 Fury X and R9 Nano GPUs, latest Ubuntu Kernel and ROCm 2.5.27:
Downgrading the ROCm-opencl does not help in my case:
|
This issue persists with
|
Still a problem with ROCm 2.6. As an observation, setting MIOPEN_DEBUG_GCN_ASM_KERNELS=0 improves the results - there is still jitter, but far less so. With EfficientNet-B7 it's minimal, but still there. EfficientNet-B0, MIOPEN_DEBUG_GCN_ASM_KERNELS=1:
EfficientNet-B0, MIOPEN_DEBUG_GCN_ASM_KERNELS=0:
EfficientNet-B7, MIOPEN_DEBUG_GCN_ASM_KERNELS=1:
EfficientNet-B7, MIOPEN_DEBUG_GCN_ASM_KERNELS=0:
|
FYI, not that it helps you, but it works correctly on gfx900 (Vega 10) with rocm2.6-tf1.14-python3. |
Hi @ekuznetsov139. thanks for the data point. While you are at it, could you rerun the test with Regards, |
It works correctly with that tag as well. Though in both cases there is something odd: processing takes a very long time (around 1 minute) and GPU usage is near zero all that time. (It definitely uses the GPU, I've confirmed with HIP_TRACE_API.) Not sure if it's an anomaly or it's just that EfficientNet is not being very efficient. |
Hi, to add another data point, I can confirm this working using gfx900 (Vega 64, Vega 10). So the issue seems to affect gfx803, only. Having an eye on the card's GPUTach, I also noticed long idle times during the test run. |
I found that the issue is caused by the ASM 1x1 kernel on gfx803: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/src/kernels/conv1x1u.s
Recently, to avoid issues like this one all ASM convolution kernels have been disabled on gfx803 (See ROCm/MIOpen@ce51a4c) But this also significantly reduces gfx803 performance (for ResNet-50 it is almost twice slower, see #173 (comment)). I have a workload that becomes 10x slower on gfx803 after disabling asm kernels. I hope AMD can fix the bugs in ASM kernels and re-enable them on gfx803. |
Thanks for reaching out. |
I'm using rocm 2.5, tensorflow-rocm 1.13.3 and python 3.6 with a RX 470.
When running the simple EfficientNet-B0 inference example here:
https://github.com/qubvel/efficientnet/blob/master/examples/inference_example.ipynb
the inference of the example image yields incorrect and non-deterministic results. Some examples:
[[('n01773549', 'barn_spider', 0.4544877), ('n01776313', 'tick', 0.14279026), ('n03271574', 'electric_fan', 0.06995272), ('n01774750', 'tarantula', 0.059890375), ('n01531178', 'goldfinch', 0.04341215)]]
[[('n01776313', 'tick', 0.52216125), ('n01773549', 'barn_spider', 0.24521892), ('n03271574', 'electric_fan', 0.17396057), ('n01774750', 'tarantula', 0.015509106), ('n03982430', 'pool_table', 0.008740869)]]
[[('n02497673', 'Madagascar_cat', 0.24683656), ('n03976657', 'pole', 0.20120004), ('n03710721', 'maillot', 0.078447856), ('n01773549', 'barn_spider', 0.046732053), ('n01774750', 'tarantula', 0.04341184)]]
When forcing to run on the CPU via CUDA_VISIBLE_DEVICES=, it yields the expected result:
[[('n02510455', 'giant_panda', 0.8347932), ('n02134084', 'ice_bear', 0.015602067), ('n02509815',
'lesser_panda', 0.0045535103), ('n02133161', 'American_black_bear', 0.0024719117), ('n02132136', 'brown_bear', 0.0020707578)]]
The text was updated successfully, but these errors were encountered: