-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kfd not supported on this ASIC for vega frontier edition #57
Comments
Modinfo is showing the original amdgpu module, not the one installed by dkms. The kernel log shows that it's running the original driver as well (based on the amdgpu driver version). The kernel version reported by dkms status and uname is also slightly off (3.10.0-862.el7.x86_64 vs. 3.10.0-862.14.4.el7.x86_64). Do you have multiple kernels installed? |
If I had to take a guess: you may have run Could you run |
@jlgreathouse , shamefully I didn't realize that there is an existing Now rocminfo works fine. Also I could compile and run all HIP-Examples. There are a few issues though.
When I have some more time I'm gonna try running some tensorflow benchmarks and try to compare with other results I find on the internet. It is a hobby project for me and I would prefer to wait a little bit if ROCm can be fixed instead of messing my system with proprietary drivers. But maybe I can clone the system to another HDD for a try of amdgpu pro to see if there is any difference. Any help with the above issues is appreciated. Let me know if I can provide more debug info or if you like me to file issues separately. Thanks a lot! |
Nothing shameful about this. The ROCm stack has a lot of moving pieces, and it's not like we've written documentation for everything. But yes, most distributions come with the upstream version of
I just attempted this on a CentOS 7.5 installation. Mind you, I'm working with the ROCm 1.9.1 release that we put out late last week:
So some of those missing firmware images are expected. In particular, for our yet-to-be-released "Vega 12" GPUs, and the not-yet-supported-in-ROCM "Vega M" GPUs. That said, your installation seems to be missing many more firmware images. Could you show me what files (if any?) exist in The fact that your DKMS install is missing so many firmware files implies to me that something went wrong during the initial download/installation. If you do
I suspect that this issue is because you are trying to load the
I'm unable to offer help with individual applications. If the developer or users of this application can point out where in the ROCm software stack they believe a problem is happening, we're happy to investigate those potential issues. However, "my application is not performing as well as I would like" is a bit too general. There are thousands of apps out there, and we can't promise to personally help optimize every one of them.
Likely a dupe of this issue. Keep an eye on that one. |
Hi @akostadinov Related to the GPU fan being capped at 40% on your GPU. I have given a relatively long description of this effect, why it happens, and some potential steps you can take to bypass it in this response. |
The error in the dmesg is a display issue so it shouldn't affect the performance of a compute application, if that helps. |
Thank you all for chiming in! Here is log after clean 2.0-89.el7 installation and reboot (as suggested by @jlgreathouse): dkms_install.log List of firmware: firmware_list.txt In
Seems like firmwares are there but not found for some reason. Any advice? |
Could you please give me the exact list of directions you're using to do this installation? I cannot reproduce this problem on CentOS 7.5 or CentOS 7.6 in either ROCm 1.9.1 or ROCm 2.0. How are you getting the On m CentOS box, I do see some firmware images missing (primarily for not-yet-supported GPUs like Vega 12 and VegaM, and for older GPUs that we do not support, like Kabini and Bonaire). I can, however, find the Vega10 firmware even though your log shows that your build did not. I see in your log that you are manually using One thing that may be worth trying if you're installing on a fresh system. Our Experimental ROC project has scripts for installing ROCm from scratch on various Linux distributions. I've tested all of these on various system configurations, and each has yielded a working ROCm installation. For instance, on your setup (as described earlier), you can run the scripts in |
@jlgreathouse , hi, I did clean install as you suggested. If I haven't paster earlier, here is repo that I use
What I see in So as explained in my previous comment I play with
|
I figured out what the issue was. Some earlier package version has installed These files do not exist in latest package versions thus it shouldn't be an issue any more. Apparently I'll close this issue now and create new issues if I hit anything else. Thank you all! FYI a quick |
I can't help myself thanking you for the amazing work I see done for reaching this 2.0-89.el7 release. It is rock solid the whole afternoon and evening. Overdrive controls are working so nicely with cli and sysfs interface. This is really a solid base for doing serious work. I'm sure it is only time to make this platform most popular. High quality open source platform, the feeling to actually own your gear instead of having to play by the rules of somebody... |
@akostadinov thank you very much for your feedback, and for your work in tracking down the firmware problem you were describing. Having this information here will definitely be helpful in the future if this problem pops up again! |
This is on a very clean, just installed RHEL 7.5, Core 2 Duo 2GHz, Gygabyte GA-P35-DS3L, Vega Frontier Edition Air cooled vs rocm-dkms-1.9.211-1.x86_64. I just did a clean-install of RHEL + yum update + follow ROCm installation document.
But things are not working well. I see in dmesg.txt:
To get module loaded I had to add the following modprobe option and rebuild initrd (dracut -f):
Some other output as advised in ROCm/ROCm#415:
moved from ROCm/ROCm#572
The text was updated successfully, but these errors were encountered: