-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GFX803 to TF-ROCm Continuous Integration #479
Comments
I have a R9 Fury. I can confirm that I am affected by bugs/regressions and may need to switch to Nvidia cards because of this. |
Yes, I feel the same way. I like my Fiji cards and would like my next set
of GPUs to be AMDs, too. However, AMD is tolerating massive regressions to
officially supported hardware for half a year now. That does make any
further investment in AMD GPUs seem futile.
Gaétan Bahl <[email protected]> schrieb am Mi., 5. Juni 2019, 17:47:
… I have a R9 Fury. I can confirm that I am affected by bugs/regressions and
may need to switch to Nvidia cards because of this.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#479>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAC2N62J7N5W6SKN5VPNH3PY7NZBANCNFSM4HRDR5DQ>
.
|
Hi @Bengt , firstly thank you for the suggestions, we appreciate your effort to conclude the GFX803 impact and understood your concerns. Specific to the quoted issues, we have pushed out a set of OpenCL fixes for GFX803 targets in ROCm2.5 release, we believe the following two issues should have been fixed: |
It looks like issues relating to memory allocation that you've listed have been solved for me. Running a cnn no longer locks up my display as they were in ROCm2.4. However, running the mnist cnn example from keras shows a failure to converge as described in #432. |
Hi @Canadauni , thank you for confirming the memory allocation issues have been fixed! |
On this note, I'd also like to say, I recently bought a RX580 to start doing some deep learning on, Is the "New Era of Open GPU Computing" filled with promises of things working but nothing actually does? It might seem like good business sense to put more resources in the higher end cards (from a management perspective that makes complete sense). From a buyer perspective, its the opposite, we start with the cheapest to see if something works and then move up the chain when/if it does. If it doesn't we can move on. The fact of the matter is right now, I regret my purchase decision. Would I upgrade to a vega gpu in the future based on my current experiences? the answer is no. I am sorry to lecture on this point, but there is a business case here to have better support, I hope you pass this on. |
Yup, that sums it up nicely. I would have bought a Radeon VII if my Fury was supported correctly. I went with a 2080 instead. I still have my Fury and can still help with GFX803 support testing if needed, though. |
Linus Torvalds recently switched to an RX 580. That underlines the relevance of its support: https://t3n.de/news/threadripper-linus-torvalds-arbeitsrechner-hardware-1287213/ |
Thanks for reaching out. |
Describe the feature and the current behavior/state.
Currently:
Feature:
Add at least one variant of GFX803 GPUs to the TF-ROCm CI systems.
Who will benefit with this feature?
There have already been uncaught regressions, which are still present in the master branch:
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues?q=is%3Aissue+is%3Aopen+label%3Agfx803
At the time of writing, these regressions are accountable for over one third (7 out of 20) of all issues in this repository.
These could be caught by regression tests, like this one:
Everyone using an gfx803 GPU would benefit from not having these and possibly many other regressions in the future.
The gfx803 chips have a large install base and its support is thus important to AMD's reputation, so AMD would benefit, too.
As listed in the LLVM docs, there are three chips under the
gfx803
target, namelypolaris10
,polaris11
andfiji
, because they are only node shrinks, this target also includespolaris30
,polaris20
,polaris21
:https://github.com/llvm-mirror/llvm/blob/f26b156fd2f58f49d3190a45c07e25c15b0bc0ae/lib/Support/TargetParser.cpp#L91
This means (unless I am still missing some) the following graphics cards are affected:
Note that these GPUs span from the mobile parts, via the low-end 460, over mid range GPUs like the 470/80 to the then-high-end Fury-X, as well as the workstation Pro Duos to the server-grade FirePro / Instinct cards. So users of virtually any dGPU market segment would benefit.
Using https://gpu.userbenchmark.com/ as a rough estimation of popularity, 5 of the top 10 AMD GPUs and 9 of the top 20 are affected. Therefore, many users that already have compatible AMD hardware are bound to have a frustrating experience when they try to use their officially supported GPUs with Tensorflow-ROCm and then run into the unfixed regressions.
The text was updated successfully, but these errors were encountered: