Add GFX803 to TF-ROCm Continuous Integration #479

Bengt · 2019-05-30T11:23:22Z

Describe the feature and the current behavior/state.

Currently:

[...] GFX803 is not included in the TF-ROCm CI systems [...].
#431 (comment)

Feature:

Add at least one variant of GFX803 GPUs to the TF-ROCm CI systems.

Who will benefit with this feature?

There have already been uncaught regressions, which are still present in the master branch:

https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues?q=is%3Aissue+is%3Aopen+label%3Agfx803

At the time of writing, these regressions are accountable for over one third (7 out of 20) of all issues in this repository.

These could be caught by regression tests, like this one:

Yes, this is a good [regression] test case[...].
#432 (comment)

Everyone using an gfx803 GPU would benefit from not having these and possibly many other regressions in the future.

The gfx803 chips have a large install base and its support is thus important to AMD's reputation, so AMD would benefit, too.

As listed in the LLVM docs, there are three chips under the gfx803 target, namely polaris10, polaris11 and fiji, because they are only node shrinks, this target also includes polaris30, polaris20, polaris21:

https://github.com/llvm-mirror/llvm/blob/f26b156fd2f58f49d3190a45c07e25c15b0bc0ae/lib/Support/TargetParser.cpp#L91

This means (unless I am still missing some) the following graphics cards are affected:

Fiji
- Fiji XT
  - Radeon Instinct MI8
  - Radeon R9 Fury X
  - Radeon R9 Fury
  - Radeon R9 Nano
- Capsaicin XT
  - FirePro S9300x2
  - Radeon Pro Duo 2016
Polaris 30
- Radeon RX 590
Polaris 20
- Radeon Pro 580
- Radeon RX 580
- Radeon Pro 575
- Radeon Pro 570
- Radeon RX 570
Polaris 10
- Radeon Instinct MI6
- Radeon Pro Duo 2017
- Radeon Pro WX 7100
- Radeon Pro WX 7100 Mobile
- Radeon RX 480
- Radeon Pro WX 5100
- Radeon RX 470
Polaris 21
- Radeon Pro 560X
- Radeon Pro 560
- Radeon Pro 555X
- Radeon Pro 555
Polaris 11
- Radeon Pro WX 4100
- Radeon Pro WX 4170 Mobile
- Radeon Pro WX 4150 Mobile
- Radeon Pro WX 4130 Mobile
- Radeon RX 560D
- Radeon RX 460

Note that these GPUs span from the mobile parts, via the low-end 460, over mid range GPUs like the 470/80 to the then-high-end Fury-X, as well as the workstation Pro Duos to the server-grade FirePro / Instinct cards. So users of virtually any dGPU market segment would benefit.

Using https://gpu.userbenchmark.com/ as a rough estimation of popularity, 5 of the top 10 AMD GPUs and 9 of the top 20 are affected. Therefore, many users that already have compatible AMD hardware are bound to have a frustrating experience when they try to use their officially supported GPUs with Tensorflow-ROCm and then run into the unfixed regressions.

The text was updated successfully, but these errors were encountered:

gaetanbahl · 2019-06-05T15:47:25Z

I have a R9 Fury. I can confirm that I am affected by bugs/regressions and may need to switch to Nvidia cards because of this.

Bengt · 2019-06-05T17:11:41Z

Yes, I feel the same way. I like my Fiji cards and would like my next set of GPUs to be AMDs, too. However, AMD is tolerating massive regressions to officially supported hardware for half a year now. That does make any further investment in AMD GPUs seem futile. Gaétan Bahl <[email protected]> schrieb am Mi., 5. Juni 2019, 17:47:

…

I have a R9 Fury. I can confirm that I am affected by bugs/regressions and may need to switch to Nvidia cards because of this. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#479>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAC2N62J7N5W6SKN5VPNH3PY7NZBANCNFSM4HRDR5DQ> .

sunway513 · 2019-06-08T06:20:17Z

Hi @Bengt , firstly thank you for the suggestions, we appreciate your effort to conclude the GFX803 impact and understood your concerns.
Due to the limited resources, the QA and CI coverages on GFX803 boards are not as comprehensive as GFX900 (Vega10) and GFX906 (Vega20) targets.
I will convey your message to the team and see if anything we can do to improve it.

Specific to the quoted issues, we have pushed out a set of OpenCL fixes for GFX803 targets in ROCm2.5 release, we believe the following two issues should have been fixed:
#301
#302
Please kindly try it out with ROCm2.5 and let us know your feedback.

Canadauni · 2019-06-09T22:24:32Z

It looks like issues relating to memory allocation that you've listed have been solved for me. Running a cnn no longer locks up my display as they were in ROCm2.4. However, running the mnist cnn example from keras shows a failure to converge as described in #432.

sunway513 · 2019-06-10T03:06:25Z

Hi @Canadauni , thank you for confirming the memory allocation issues have been fixed!
For issue #432, we've been using our internal ticket system to track, we will update on the issue thread when there's progress.

thegatsbylofiexperience · 2019-07-11T02:34:17Z

On this note, I'd also like to say, I recently bought a RX580 to start doing some deep learning on, Is the "New Era of Open GPU Computing" filled with promises of things working but nothing actually does?

It might seem like good business sense to put more resources in the higher end cards (from a management perspective that makes complete sense). From a buyer perspective, its the opposite, we start with the cheapest to see if something works and then move up the chain when/if it does. If it doesn't we can move on.

The fact of the matter is right now, I regret my purchase decision. Would I upgrade to a vega gpu in the future based on my current experiences? the answer is no.

I am sorry to lecture on this point, but there is a business case here to have better support, I hope you pass this on.

dagamayank · 2019-07-11T14:30:57Z

@dbouius-AMD

gaetanbahl · 2019-07-13T09:51:25Z

On this note, I'd also like to say, I recently bought a RX580 to start doing some deep learning on, Is the "New Era of Open GPU Computing" filled with promises of things working but nothing actually does?

It might seem like good business sense to put more resources in the higher end cards (from a management perspective that makes complete sense). From a buyer perspective, its the opposite, we start with the cheapest to see if something works and then move up the chain when/if it does. If it doesn't we can move on.

The fact of the matter is right now, I regret my purchase decision. Would I upgrade to a vega gpu in the future based on my current experiences? the answer is no.

I am sorry to lecture on this point, but there is a business case here to have better support, I hope you pass this on.

Yup, that sums it up nicely. I would have bought a Radeon VII if my Fury was supported correctly. I went with a 2080 instead.

I still have my Fury and can still help with GFX803 support testing if needed, though.

Bengt · 2020-06-05T09:57:22Z

Linus Torvalds recently switched to an RX 580. That underlines the relevance of its support:

https://t3n.de/news/threadripper-linus-torvalds-arbeitsrechner-hardware-1287213/

ROCmSupport · 2021-03-03T05:05:49Z

Thanks for reaching out.
gfx8 is not a supported config now.
We are not supporting gfx8 devices officially with ROCm and request you to follow our supported hardware section @ ROCm docs: https://github.com/RadeonOpenCompute/ROCm#Hardware-and-Software-Support

sunway513 added the gfx803 issue specific to gfx803 GPUs label Jun 10, 2019

Bengt mentioned this issue Jun 28, 2019

EfficientNet inference yields incorrect results on GPU #519

Closed

Bengt mentioned this issue Jul 13, 2019

Document functional hardware status ROCm/ROCm#844

Closed

Bengt mentioned this issue Aug 23, 2019

ROCm 2.7 does not support Hawaii. Readme needs updating ROCm/ROCm#871

Closed

Bengt mentioned this issue Apr 6, 2020

TensorFlow on RX 470 8GB ROCm/ROCm#1073

Closed

Bengt mentioned this issue Nov 2, 2020

Support Ubuntu 20.10 (Groovy Gorilla) ROCm/ROCm#1263

Closed

ROCmSupport closed this as completed Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GFX803 to TF-ROCm Continuous Integration #479

Add GFX803 to TF-ROCm Continuous Integration #479

Bengt commented May 30, 2019

gaetanbahl commented Jun 5, 2019

Bengt commented Jun 5, 2019 via email

sunway513 commented Jun 8, 2019

Canadauni commented Jun 9, 2019 •

edited

Loading

sunway513 commented Jun 10, 2019

thegatsbylofiexperience commented Jul 11, 2019

dagamayank commented Jul 11, 2019

gaetanbahl commented Jul 13, 2019

Bengt commented Jun 5, 2020

ROCmSupport commented Mar 3, 2021

Add GFX803 to TF-ROCm Continuous Integration #479

Add GFX803 to TF-ROCm Continuous Integration #479

Comments

Bengt commented May 30, 2019

gaetanbahl commented Jun 5, 2019

Bengt commented Jun 5, 2019 via email

sunway513 commented Jun 8, 2019

Canadauni commented Jun 9, 2019 • edited Loading

sunway513 commented Jun 10, 2019

thegatsbylofiexperience commented Jul 11, 2019

dagamayank commented Jul 11, 2019

gaetanbahl commented Jul 13, 2019

Bengt commented Jun 5, 2020

ROCmSupport commented Mar 3, 2021

Canadauni commented Jun 9, 2019 •

edited

Loading