EfficientNet inference yields incorrect results on GPU #519

liamnr2 · 2019-06-24T23:18:55Z

I'm using rocm 2.5, tensorflow-rocm 1.13.3 and python 3.6 with a RX 470.

When running the simple EfficientNet-B0 inference example here:
https://github.com/qubvel/efficientnet/blob/master/examples/inference_example.ipynb

the inference of the example image yields incorrect and non-deterministic results. Some examples:
[[('n01773549', 'barn_spider', 0.4544877), ('n01776313', 'tick', 0.14279026), ('n03271574', 'electric_fan', 0.06995272), ('n01774750', 'tarantula', 0.059890375), ('n01531178', 'goldfinch', 0.04341215)]]

[[('n01776313', 'tick', 0.52216125), ('n01773549', 'barn_spider', 0.24521892), ('n03271574', 'electric_fan', 0.17396057), ('n01774750', 'tarantula', 0.015509106), ('n03982430', 'pool_table', 0.008740869)]]

[[('n02497673', 'Madagascar_cat', 0.24683656), ('n03976657', 'pole', 0.20120004), ('n03710721', 'maillot', 0.078447856), ('n01773549', 'barn_spider', 0.046732053), ('n01774750', 'tarantula', 0.04341184)]]

When forcing to run on the CPU via CUDA_VISIBLE_DEVICES=, it yields the expected result:
[[('n02510455', 'giant_panda', 0.8347932), ('n02134084', 'ice_bear', 0.015602067), ('n02509815', 'lesser_panda', 0.0045535103), ('n02133161', 'American_black_bear', 0.0024719117), ('n02132136', 'brown_bear', 0.0020707578)]]

The text was updated successfully, but these errors were encountered:

Bengt · 2019-06-28T23:56:35Z

Hi, @liamnr2!

Welcome to GitHub and thanks for reporting this issue. Seemingly random results are hard to test for so it is very valuable that you found some.

Unfortunately, the RX470 uses a Polaris 10 chip, which shares the gfx803 compile target with a bunch of other popular GPUs. For a list of the affected GPUs see #479.

There have been quite a number of issues with this compile target, only some of which could be resolved, yet. For a full list see the gfx803 tag.

To find the cause of this behavior, we need to reproduce these issues with various combinations of hardware and software. I can try and help with creating a reproducing procedure.

A wild guess would be to try downgrading rocm-opencl, which has helped with gfx803 in some cases:

#300 (comment)
#302 (comment)

Bengt · 2019-06-29T01:06:36Z

Procedure for reproduction:

docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/dockerx:/dockerx rocm/tensorflow:rocm2.5-tf1.13-python3
python3 -m pip install scikit-image numpy keras efficientnet pytest
wget https://upload.wikimedia.org/wikipedia/commons/f/fe/Giant_Panda_in_Beijing_Zoo_1.JPG
wget https://gist.githubusercontent.com/Bengt/308c7d05dc755f1bfe0aeda9220e4eed/raw//test_efficientnet_gfx803.py
HIP_VISIBLE_DEVICES=0 python3 -m pytest -s test_efficientnet_gfx803.py
HIP_VISIBLE_DEVICES=-1 python3 -m pytest -s test_efficientnet_gfx803.py

Bengt · 2019-06-29T11:21:06Z

I can reproduce this issue.

Using GPU 0 fails:

# HIP_VISIBLE_DEVICES=0 python3 -m pytest -s test_efficientnet_gfx803.py
[...]
>       assert actual == expected
E       AssertionError: assert ['bow_tie', '... 'guinea_pig'] == ['giant_panda'... 'brown_bear']
E         At index 0 diff: 'bow_tie' != 'giant_panda'

Using GPU 1 fails:

# HIP_VISIBLE_DEVICES=1 python3 -m pytest -s test_efficientnet_gfx803.py
[...]
>       assert actual == expected
E       AssertionError: assert ['crutch', 't...ra', 'sorrel'] == ['giant_panda'... 'brown_bear']
E         At index 0 diff: 'crutch' != 'giant_panda'

Using GPU 2 fails:

# HIP_VISIBLE_DEVICES=2 python3 -m pytest -s test_efficientnet_gfx803.py
[...]
>       assert actual == expected
E       AssertionError: assert ['jersey', 'w...an_coonhound'] == ['giant_panda'... 'brown_bear']
E         At index 0 diff: 'jersey' != 'giant_panda'

Using GPU 3 fails:

# HIP_VISIBLE_DEVICES=3 python3 -m pytest -s test_efficientnet_gfx803.py
[...]
>       assert actual == expected
E       AssertionError: assert ['bolo_tie', ...analog_clock'] == ['giant_panda'... 'brown_bear']
E         At index 0 diff: 'bolo_tie' != 'giant_panda'

These results seem indeed random or undeterministic:

# HIP_VISIBLE_DEVICES=3 python3 -m pytest -s test_efficientnet_gfx803.py
[...]
>       assert actual == expected
E       AssertionError: assert ['oxygen_mask...er', 'maraca'] == ['giant_panda'... 'brown_bear']
E         At index 0 diff: 'oxygen_mask' != 'giant_panda'

Using CPU works fine:

# HIP_VISIBLE_DEVICES=-1 python3 -m pytest -s test_efficientnet_gfx803.py
[...]
===================== 1 passed, 2 warnings in 9.32 seconds =====================

I am using R9 Fury X and R9 Nano GPUs, latest Ubuntu Kernel and ROCm 2.5.27:

$ lspci -v | grep VGA
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ca) (prog-if 00 [VGA controller])
0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ca) (prog-if 00 [VGA controller])
42:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ca) (prog-if 00 [VGA controller])
43:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev c8) (prog-if 00 [VGA controller])
$ uname -r
4.15.0-54-generic
$ $ dpkg -l | grep rocm | grep stack
ii  rocm-dev                                      2.5.27                                       amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-dkms                                     2.5.27                                       amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-libs                                     2.5.27                                       amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-utils                                    2.5.27                                       amd64        Radeon Open Compute (ROCm) Runtime software stack

Downgrading the ROCm-opencl does not help in my case:

cd ~ && mkdir rocm1.9.2-opencl && cd rocm1.9.2-opencl &&
wget https://www.dropbox.com/s/rtwe1zrpuphbyqm/rocm-opencl-1.2.0-2018111340_amd64.deb && 
wget https://www.dropbox.com/s/6gp2g5zju66i4e9/rocm-opencl-dev-1.2.0-2018111340_amd64.deb && 
dpkg -i rocm-opencl*.deb &&
rm -rf ~/.cache &&
cd  -
# HIP_VISIBLE_DEVICES=0 python3 -m pytest -s test_efficientnet_gfx803.py
[..]
>       assert actual == expected
E       AssertionError: assert ['artichoke',... 'sea_urchin'] == ['giant_panda'... 'brown_bear']
E         At index 0 diff: 'artichoke' != 'giant_panda'

Bengt · 2019-06-29T12:01:48Z

This issue persists with rocm/tensorflow:rocm1.9.2-tf1.12-python3:

# HIP_VISIBLE_DEVICES=0 python3 -m pytest -s test_efficientnet_gfx803.py
[...]
>       assert actual == expected
E       AssertionError: assert ['garter_snak...r', 'echidna'] == ['giant_panda'... 'brown_bear']
E         At index 0 diff: 'garter_snake' != 'giant_panda'

liamnr2 · 2019-08-14T11:30:48Z

Still a problem with ROCm 2.6.

As an observation, setting MIOPEN_DEBUG_GCN_ASM_KERNELS=0 improves the results - there is still jitter, but far less so. With EfficientNet-B7 it's minimal, but still there.

EfficientNet-B0, MIOPEN_DEBUG_GCN_ASM_KERNELS=1:

[[('n02510455', 'giant_panda', 0.77773875), ('n02132136', 'brown_bear', 0.01460326), ('n02134084', 'ice_bear', 0.009905247), ('n02133161', 'American_black_bear', 0.009050588), ('n02096585', 'Boston_bull', 0.0070677395)]]
[[('n06359193', 'web_site', 0.058782034), ('n03291819', 'envelope', 0.05005152), ('n04118776', 'rule', 0.044108856), ('n03998194', 'prayer_rug', 0.04122383), ('n04409515', 'tennis_ball', 0.034883693)]]
[[('n03706229', 'magnetic_compass', 0.14726971), ('n04238763', 'slide_rule', 0.11163757), ('n04118776', 'rule', 0.094091), ('n02708093', 'analog_clock', 0.027822705), ('n02794156', 'barometer', 0.02083086)]]
[[('n04118776', 'rule', 0.1692506), ('n03706229', 'magnetic_compass', 0.0850252), ('n04238763', 'slide_rule', 0.06351954), ('n02708093', 'analog_clock', 0.024399932), ('n03857828', 'oscilloscope', 0.019325882)]]
[[('n04238763', 'slide_rule', 0.06601893), ('n04118776', 'rule', 0.057043314), ('n03706229', 'magnetic_compass', 0.043703355), ('n04357314', 'sunscreen', 0.04076335), ('n03929660', 'pick', 0.035940796)]]
[[('n06359193', 'web_site', 0.049492065), ('n04118776', 'rule', 0.049231295), ('n03998194', 'prayer_rug', 0.048374362), ('n03291819', 'envelope', 0.035772696), ('n07248320', 'book_jacket', 0.033176217)]]
[[('n04238763', 'slide_rule', 0.12657635), ('n03706229', 'magnetic_compass', 0.10579053), ('n04118776', 'rule', 0.054984488), ('n04357314', 'sunscreen', 0.04215321), ('n03047690', 'clog', 0.031784806)]]
[[('n04118776', 'rule', 0.32308587), ('n04238763', 'slide_rule', 0.14665197), ('n03706229', 'magnetic_compass', 0.044921804), ('n04357314', 'sunscreen', 0.026250241), ('n02708093', 'analog_clock', 0.023987856)]]
[[('n04118776', 'rule', 0.08757503), ('n03706229', 'magnetic_compass', 0.06836976), ('n04238763', 'slide_rule', 0.06297214), ('n02708093', 'analog_clock', 0.03041393), ('n04039381', 'racket', 0.02478665)]]
[[('n04238763', 'slide_rule', 0.073604986), ('n04357314', 'sunscreen', 0.057176016), ('n04118776', 'rule', 0.055984076), ('n03706229', 'magnetic_compass', 0.05034835), ('n03929660', 'pick', 0.029202135)]]

EfficientNet-B0, MIOPEN_DEBUG_GCN_ASM_KERNELS=0:

[[('n02510455', 'giant_panda', 0.80664486), ('n02134084', 'ice_bear', 0.006699027), ('n02132136', 'brown_bear', 0.0057221507), ('n02509815', 'lesser_panda', 0.004147317), ('n02120079', 'Arctic_fox', 0.0035862043)]]
[[('n02510455', 'giant_panda', 0.75878745), ('n02134084', 'ice_bear', 0.008354737), ('n02132136', 'brown_bear', 0.007207209), ('n02509815', 'lesser_panda', 0.004130219), ('n02120079', 'Arctic_fox', 0.0040210793)]]
[[('n02510455', 'giant_panda', 0.7587877), ('n02134084', 'ice_bear', 0.008354739), ('n02132136', 'brown_bear', 0.0072072037), ('n02509815', 'lesser_panda', 0.0041302163), ('n02120079', 'Arctic_fox', 0.0040210765)]]
[[('n02510455', 'giant_panda', 0.76415765), ('n02134084', 'ice_bear', 0.008157566), ('n02132136', 'brown_bear', 0.0061342083), ('n02509815', 'lesser_panda', 0.0036074982), ('n02120079', 'Arctic_fox', 0.0035751157)]]
[[('n02510455', 'giant_panda', 0.75936085), ('n02134084', 'ice_bear', 0.008365493), ('n02132136', 'brown_bear', 0.007142773), ('n02509815', 'lesser_panda', 0.004107962), ('n02120079', 'Arctic_fox', 0.0040129614)]]
[[('n02510455', 'giant_panda', 0.75878924), ('n02134084', 'ice_bear', 0.00835698), ('n02132136', 'brown_bear', 0.0072079534), ('n02509815', 'lesser_panda', 0.004130396), ('n02120079', 'Arctic_fox', 0.0040213186)]]
[[('n02510455', 'giant_panda', 0.7603499), ('n02134084', 'ice_bear', 0.009082864), ('n02132136', 'brown_bear', 0.006688087), ('n02120079', 'Arctic_fox', 0.0040302738), ('n02509815', 'lesser_panda', 0.0038609721)]]
[[('n02510455', 'giant_panda', 0.7493819), ('n02132136', 'brown_bear', 0.008669576), ('n02134084', 'ice_bear', 0.008599169), ('n02509815', 'lesser_panda', 0.0042907814), ('n02120079', 'Arctic_fox', 0.0039218697)]]
[[('n02510455', 'giant_panda', 0.73992616), ('n02134084', 'ice_bear', 0.008566578), ('n02132136', 'brown_bear', 0.0071503706), ('n02120079', 'Arctic_fox', 0.005537635), ('n02133161', 'American_black_bear', 0.0039643333)]]
[[('n02510455', 'giant_panda', 0.48032713), ('n02114548', 'white_wolf', 0.024954954), ('n02120079', 'Arctic_fox', 0.016971268), ('n02395406', 'hog', 0.015805786), ('n02132136', 'brown_bear', 0.00848116)]]

EfficientNet-B7, MIOPEN_DEBUG_GCN_ASM_KERNELS=1:

[[('n02093256', 'Staffordshire_bullterrier', 1.0), ('n15075141', 'toilet_tissue', 0.0), ('n02319095', 'sea_urchin', 0.0), ('n02395406', 'hog', 0.0), ('n02391049', 'zebra', 0.0)]]
[[('n15075141', 'toilet_tissue', nan), ('n02319095', 'sea_urchin', nan), ('n02395406', 'hog', nan), ('n02391049', 'zebra', nan), ('n02389026', 'sorrel', nan)]]
[[('n03482405', 'hamper', 1.0), ('n15075141', 'toilet_tissue', 0.0), ('n02319095', 'sea_urchin', 0.0), ('n02391049', 'zebra', 0.0), ('n02389026', 'sorrel', 0.0)]]
[[('n13044778', 'earthstar', 1.0), ('n02317335', 'starfish', 6.7773486e-22), ('n04033901', 'quill', 3.4295856e-33), ('n02391049', 'zebra', 0.0), ('n02389026', 'sorrel', 0.0)]]
[[('n03379051', 'football_helmet', 1.0), ('n15075141', 'toilet_tissue', 0.0), ('n02281787', 'lycaenid', 0.0), ('n02389026', 'sorrel', 0.0), ('n02364673', 'guinea_pig', 0.0)]]
[[('n07892512', 'red_wine', 1.0), ('n02317335', 'starfish', 0.0), ('n02391049', 'zebra', 0.0), ('n02389026', 'sorrel', 0.0), ('n02364673', 'guinea_pig', 0.0)]]
[[('n04447861', 'toilet_seat', 1.0), ('n15075141', 'toilet_tissue', 0.0), ('n02317335', 'starfish', 0.0), ('n02391049', 'zebra', 0.0), ('n02389026', 'sorrel', 0.0)]]
[[('n03314780', 'face_powder', 1.0), ('n15075141', 'toilet_tissue', 0.0), ('n02281787', 'lycaenid', 0.0), ('n02389026', 'sorrel', 0.0), ('n02364673', 'guinea_pig', 0.0)]]
[[('n02804610', 'bassoon', 1.0), ('n02841315', 'binoculars', 2.5764785e-13), ('n04099969', 'rocking_chair', 5.0266332e-29), ('n02328150', 'Angora', 0.0), ('n02317335', 'starfish', 0.0)]]
[[('n03887697', 'paper_towel', 1.0), ('n15075141', 'toilet_tissue', 0.0), ('n02281787', 'lycaenid', 0.0), ('n02389026', 'sorrel', 0.0), ('n02364673', 'guinea_pig', 0.0)]]

EfficientNet-B7, MIOPEN_DEBUG_GCN_ASM_KERNELS=0:

[[('n02510455', 'giant_panda', 0.8399878), ('n02509815', 'lesser_panda', 0.003146674), ('n02133161', 'American_black_bear', 0.002262074), ('n02134084', 'ice_bear', 0.0014058463), ('n02132136', 'brown_bear', 0.0013730429)]]
[[('n02510455', 'giant_panda', 0.8399878), ('n02509815', 'lesser_panda', 0.003146674), ('n02133161', 'American_black_bear', 0.002262073), ('n02134084', 'ice_bear', 0.0014058443), ('n02132136', 'brown_bear', 0.0013730436)]]
[[('n02510455', 'giant_panda', 0.8399879), ('n02509815', 'lesser_panda', 0.0031466729), ('n02133161', 'American_black_bear', 0.0022620677), ('n02134084', 'ice_bear', 0.0014058452), ('n02132136', 'brown_bear', 0.0013730424)]]
[[('n02510455', 'giant_panda', 0.8399878), ('n02509815', 'lesser_panda', 0.0031466756), ('n02133161', 'American_black_bear', 0.0022620752), ('n02134084', 'ice_bear', 0.001405845), ('n02132136', 'brown_bear', 0.0013730436)]]
[[('n02510455', 'giant_panda', 0.8399879), ('n02509815', 'lesser_panda', 0.0031466743), ('n02133161', 'American_black_bear', 0.00226207), ('n02134084', 'ice_bear', 0.0014058452), ('n02132136', 'brown_bear', 0.0013730417)]]
[[('n02510455', 'giant_panda', 0.839891), ('n02509815', 'lesser_panda', 0.003151415), ('n02133161', 'American_black_bear', 0.0022747808), ('n02134084', 'ice_bear', 0.0014124429), ('n02132136', 'brown_bear', 0.0013766055)]]
[[('n02510455', 'giant_panda', 0.8399878), ('n02509815', 'lesser_panda', 0.0031466784), ('n02133161', 'American_black_bear', 0.002262073), ('n02134084', 'ice_bear', 0.0014058456), ('n02132136', 'brown_bear', 0.0013730442)]]
[[('n02510455', 'giant_panda', 0.83998775), ('n02509815', 'lesser_panda', 0.0031466782), ('n02133161', 'American_black_bear', 0.002262075), ('n02134084', 'ice_bear', 0.0014058475), ('n02132136', 'brown_bear', 0.0013730454)]]
[[('n02510455', 'giant_panda', 0.8399878), ('n02509815', 'lesser_panda', 0.0031466756), ('n02133161', 'American_black_bear', 0.002262073), ('n02134084', 'ice_bear', 0.001405845), ('n02132136', 'brown_bear', 0.0013730442)]]
[[('n02510455', 'giant_panda', 0.8399878), ('n02509815', 'lesser_panda', 0.0031466756), ('n02133161', 'American_black_bear', 0.002262072), ('n02134084', 'ice_bear', 0.0014058456), ('n02132136', 'brown_bear', 0.0013730429)]]

ekuznetsov139 · 2019-08-29T22:14:08Z

FYI, not that it helps you, but it works correctly on gfx900 (Vega 10) with rocm2.6-tf1.14-python3.
[[('n02510455', 'giant_panda', 0.83479327), ('n02134084', 'ice_bear', 0.015601887), ('n02509815', 'lesser_panda', 0.0045534954), ('n02133161', 'American_black_bear', 0.0024719073), ('n02132136', 'brown_bear', 0.002070747)]]

Bengt · 2019-08-29T23:15:26Z

Hi @ekuznetsov139.

thanks for the data point. While you are at it, could you rerun the test with rocm/tensorflow:rocm2.7-tf1.14-dev? That seems to be the current focus of development.

Regards,
Bengt

ekuznetsov139 · 2019-08-29T23:39:49Z

It works correctly with that tag as well.

Though in both cases there is something odd: processing takes a very long time (around 1 minute) and GPU usage is near zero all that time. (It definitely uses the GPU, I've confirmed with HIP_TRACE_API.) Not sure if it's an anomaly or it's just that EfficientNet is not being very efficient.

Bengt · 2019-09-02T11:04:41Z

Hi, to add another data point, I can confirm this working using gfx900 (Vega 64, Vega 10). So the issue seems to affect gfx803, only. Having an eye on the card's GPUTach, I also noticed long idle times during the test run.

huanzhang12 · 2020-03-30T02:04:26Z

I found that the issue is caused by the ASM 1x1 kernel on gfx803: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/src/kernels/conv1x1u.s
On gfx803, I can obtain the same result as on gfx906 by disabling this ASM 1x1 kernel:

MIOPEN_DEBUG_CONV_DIRECT_ASM_1X1U=0 python3 -m pytest -s test_efficientnet_gfx803.py

Recently, to avoid issues like this one all ASM convolution kernels have been disabled on gfx803 (See ROCm/MIOpen@ce51a4c) But this also significantly reduces gfx803 performance (for ResNet-50 it is almost twice slower, see #173 (comment)). I have a workload that becomes 10x slower on gfx803 after disabling asm kernels. I hope AMD can fix the bugs in ASM kernels and re-enable them on gfx803.

ROCmSupport · 2021-03-03T05:05:42Z

Thanks for reaching out.
gfx8 is not a supported config now.
We are not supporting gfx8 devices officially with ROCm and request you to follow our supported hardware section @ ROCm docs: https://github.com/RadeonOpenCompute/ROCm#Hardware-and-Software-Support

sunway513 added the gfx803 issue specific to gfx803 GPUs label Jul 8, 2019

Bengt mentioned this issue Sep 2, 2019

GPU does not clock down after test load #621

Closed

huanzhang12 mentioned this issue Apr 1, 2020

AsmConv1x1U may produce wrong results on gfx803 ROCm/MIOpen#135

Closed

ROCmSupport closed this as completed Mar 3, 2021

styler00dollar mentioned this issue Apr 4, 2021

hipErrorNoBinaryForGpu: Coudn't find binary for current devices! pytorch/pytorch#53738

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EfficientNet inference yields incorrect results on GPU #519

EfficientNet inference yields incorrect results on GPU #519

liamnr2 commented Jun 24, 2019 •

edited

Loading

Bengt commented Jun 28, 2019

Bengt commented Jun 29, 2019 •

edited

Loading

Bengt commented Jun 29, 2019

Bengt commented Jun 29, 2019

liamnr2 commented Aug 14, 2019

ekuznetsov139 commented Aug 29, 2019

Bengt commented Aug 29, 2019

ekuznetsov139 commented Aug 29, 2019

Bengt commented Sep 2, 2019 •

edited

Loading

huanzhang12 commented Mar 30, 2020

ROCmSupport commented Mar 3, 2021

EfficientNet inference yields incorrect results on GPU #519

EfficientNet inference yields incorrect results on GPU #519

Comments

liamnr2 commented Jun 24, 2019 • edited Loading

Bengt commented Jun 28, 2019

Bengt commented Jun 29, 2019 • edited Loading

Bengt commented Jun 29, 2019

Bengt commented Jun 29, 2019

liamnr2 commented Aug 14, 2019

ekuznetsov139 commented Aug 29, 2019

Bengt commented Aug 29, 2019

ekuznetsov139 commented Aug 29, 2019

Bengt commented Sep 2, 2019 • edited Loading

huanzhang12 commented Mar 30, 2020

ROCmSupport commented Mar 3, 2021

liamnr2 commented Jun 24, 2019 •

edited

Loading

Bengt commented Jun 29, 2019 •

edited

Loading

Bengt commented Sep 2, 2019 •

edited

Loading