-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use libdrm_amdgpu as an alternative GPU load information source #925
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a vega card to physically test this fork, but the general structure looks good and I didn't find anything basic like spelling errors 😃
It works properly on my machine, and I don't have any other Vega cards to test it on. Actually, it should also work on most non-Vega AMD GPUs |
Also you'd need drm master authorization, with secondary GPUs at least. |
This should not be needed, if one uses the renderD node - seems like currently the card one is being used. If the card node is already opened you can the fd with |
@bsolos overall I would encourage you to open a bug at the AMDGPU gitlab, clearly describe the issue (exact kernel, version, distro, use-case, etc) + CC Alex Deucher aka @agd5f This is a great workaround, but the upstream gpu metrics should really be fixed. |
I didn't open an issue there because https://gitlab.freedesktop.org/drm/amd/-/issues/1932 is already open. It seemed like there was no progress in the last 10 months, so I thought that this workaround might be beneficial to MangoHud. Should I still open a new issue? |
One should not assume that devs don't care about issues, just because there's no update. Sometimes they have higher/other priorities, sometimes it fall through the cracks. By opening/prodding you'll increase visibility and raise severity. If you can test kernel patches, it's more likely that devs will try to get fixed faster. Sitting quietly does not help, I'm afraid. |
Wouldn't something like:
https://www.kernel.org/doc/html/latest/gpu/drm-usage-stats.html
make more sense then polling hardware registers? Plus it's cross-vendor.
Alex
…On Mon, Mar 6, 2023 at 12:05 PM bsolos ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/amdgpu_libdrm.cpp
<#925 (comment)>
:
> @@ -51,6 +52,13 @@ static int libdrm_initialize() {
return -1;
}
+ char *renderD = drmGetRenderDeviceNameFromFd(fd);
+ fd = open(renderD, O_RDWR);
Sorry, I've never really worked with libdrm before. Will fix shortly
—
Reply to this email directly, view it on GitHub
<#925 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVKS5D2PTIWVGQAQJKITL43W2YKPDANCNFSM6AAAAAAU42PTOE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Seems like the load sensor isn't supported on the hardware level |
Rebase onto master
Rebase onto master
The other problem with polling registers is that it keeps the GPU awake
using more power. The driver has to disable gfxoff when you read back
registers.
Alex
…On Mon, Mar 6, 2023 at 12:55 PM Alex Deucher ***@***.***> wrote:
Wouldn't something like:
https://www.kernel.org/doc/html/latest/gpu/drm-usage-stats.html
make more sense then polling hardware registers? Plus it's cross-vendor.
Alex
On Mon, Mar 6, 2023 at 12:05 PM bsolos ***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In src/amdgpu_libdrm.cpp
> <#925 (comment)>
> :
>
> > @@ -51,6 +52,13 @@ static int libdrm_initialize() {
> return -1;
> }
>
> + char *renderD = drmGetRenderDeviceNameFromFd(fd);
> + fd = open(renderD, O_RDWR);
>
> Sorry, I've never really worked with libdrm before. Will fix shortly
>
> —
> Reply to this email directly, view it on GitHub
> <#925 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AVKS5D2PTIWVGQAQJKITL43W2YKPDANCNFSM6AAAAAAU42PTOE>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
|
@agd5f perhaps a in-tree kernel AMDGPU doc outlining the preferred options and their caveats will be great. Something people can keep an eye on, as things evolve - say team introduces new method do fetch X, or method Y has issues (aka gfxoff issue mentioned), approach Z might be deprecated (ETA, reason), etc. |
@bsolos |
This makes sense now. It seems like finding what is the correct way is much more difficult than I thought I use the register-polling approach because that's what radeontop does, and it works |
For reference on how to use the fdinfo interface see:
https://www.spinics.net/lists/intel-gfx/msg294401.html
…On Tue, Mar 7, 2023 at 9:39 AM bsolos ***@***.***> wrote:
This makes sense now. It seems like finding what is the correct way is
much more difficult than I thought
—
Reply to this email directly, view it on GitHub
<#925 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVKS5D4NGX6GOY7LJECQU3TW25CD5ANCNFSM6AAAAAAU42PTOE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@bsolos the site/link is down see https://patchwork.freedesktop.org/series/102175/ @agd5f does that interface provide system-wise statistics? it seems to be per-client and per-fd, where mangohud exposes the total system data. Technically one could iterating over |
Yes, it's per client. Similar to top for the CPU.
Alex
…On Tue, Mar 7, 2023 at 4:13 PM Emil Velikov ***@***.***> wrote:
@bsolos <https://github.com/bsolos> the site/link is down see
https://patchwork.freedesktop.org/series/102175/
@agd5f <https://github.com/agd5f> does that interface provide system-wise
statistics? it seems to be per-client and per-fd, where mangohud exposes
the total system data. Technically one could iterating over
/proc/foo/fdinfo for the total, assuming they have permissions - yet
mangohud should not be run as root.
—
Reply to this email directly, view it on GitHub
<#925 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVKS5D4K2XUZJ3ATIBPG3KTW26QHHANCNFSM6AAAAAAU42PTOE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Use libdrm_amdgpu to calculate the GPU load. This should resolve #923.
The GPU load calculation method was inspired by radeontop, but no code in this PR was copied from there.