Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU tracing and providers #124

Open
Lectem opened this issue Oct 2, 2018 · 22 comments
Open

GPU tracing and providers #124

Lectem opened this issue Oct 2, 2018 · 22 comments

Comments

@Lectem
Copy link

Lectem commented Oct 2, 2018

The GPU tracing option only adds the DX provider, while GPU View uses a lot more providers/states, would you be interested in having GPUView presets added ?

From the batch file log.cmd of GpuView, we can see the following providers can be used:


set TRACE_DXVA2=a0386e75-f70c-464c-a9ce-33c44e091623:0xffff:5
set TRACE_MMCSS=f8f10121-b617-4a56-868b-9df1b27fe32c:0xffff:5
set TRACE_WMDRM=6e03dd26-581b-4ec5-8f22-601a4de5f022:0xffff:5
set TRACE_WME=8f2048e0-f260-4f57-a8d1-932376291682
set TRACE_WMP=681069c4-b785-466a-bc63-4aa616644b68:0xffff:5
set TRACE_DVD=75d4a1bb-7cc6-44b1-906d-d5e05be6d060:0xffff:5
set TRACE_DSHOW=28cf047a-2437-4b24-b653-b9446a419a69
set TRACE_MF=f404b94e-27e0-4384-bfe8-1d8d390b0aa3+362007f7-6e50-4044-9082-dfa078c63a73:0x000000000000ffff:0x5
set TRACE_AE=a6a00efd-21f2-4a99-807e-9b3bf1d90285:0x000000000000ffff:0x5
set TRACE_HME=63770680-05F1-47E0-928A-9ACFDCF52147:0xffff:5
set TRACE_HDDVD=779d8cdc-666b-4bf4-a367-9df89d6901e8:0xffff:5
set TRACE_DWMAPIGUID=71dd85bc-d474-4974-b0f6-93ffc5bfbd04::6
set TRACE_SCHEDULEGUID=8cc44e31-7f28-4f45-9938-4810ff517464:0xffff:6
set TRACE_DX=DX
set TRACE_WARP=ee685ec4-8270-4b08-9e4e-8b356f48f92f:0
set TRACE_DXGI=CA11C036-0102-4A2D-A6AD-F03CFED5D3C9:0xf:6:'stack'
set TRACE_DXGIDEBUG=F1FF64EF-FAF3-5699-8E51-F6EC2FBD97D1:0xffffffffffffffff
set TRACE_DXGIDEBUG_LIGHT=%TRACE_DXGIDEBUG%:4
set TRACE_DXGIDEBUG_NORMAL=%TRACE_DXGIDEBUG%:4:'stack'
set TRACE_DXGIDEBUG_VERBOSE=%TRACE_DXGIDEBUG%:6:'stack'
set TRACE_D3D12=5d8087dd-3a9b-4f56-90df-49196cdc4f11:0xffffffffffffffff:6:'stack'
set TRACE_D3D11=db6f6ddb-ac77-4e88-8253-819df9bbf140:0xffffffffffffffff:6:'stack'
set TRACE_D3D10LEVEL9=7E7D3382-023C-43cb-95D2-6F0CA6D70381:0x1
set TRACE_DXC=802ec45a-1e99-4b83-9920-87c98277ba9d
set TRACE_DXC_ALL=%TRACE_DXC%:0x10FFFF:5:'stack'
set TRACE_DXC_LONGHAUL=%TRACE_DXC%:0x800:5
set TRACE_DXC_MIN=%TRACE_DXC%:0x800:5
set TRACE_DXC_LIGHT=%TRACE_DXC%:0x826:5
set TRACE_DXC_NORMAL=%TRACE_DXC%:0x100236:5
set TRACE_DXC_STACKS=%TRACE_DXC%:0x208041:5:'stack'
set TRACE_UMD=a688ee40-d8d9-4736-b6f9-6b74935ba3b1:0xffff:5
set TRACE_DWM=a42c77db-874f-422e-9b44-6d89fe2bd3e5:0x000000007fffffff:0x5
set TRACE_DWM2=8c9dd1ad-e6e5-4b07-b455-684a9d879900:0xFFFF:6
set TRACE_DWM3=9e9bba3c-2e38-40cb-99f4-9e8281425164:0xFFFF:6
set TRACE_CODEC=ea6d6e3b-7014-4ab1-85db-4a50cda32a82:0xffff:5
set TRACE_KMFD=E7C7EDF9-D0E4-4338-8AE3-BCA3C5B4B4A3
set TRACE_UMFD=a70bc228-e778-4061-86fa-debb03fda64a

set TRACE_TESTFRAMEWORK=31293f4f-f7bb-487d-8b3b-f537b827352f
set TRACE_TEST=42C4E0C1-0D92-46f0-842C-1E791FA78D52
set TRACE_SC=30336ed4-e327-447c-9de0-51b652c86108
set TRACE_XAML=531A35AB-63CE-4BCF-AA98-F88C7A89E455:0xffff
set TRACE_WIN32K=8c416c79-d49b-4f01-a467-e56d3aa8234c:0xffff
set TRACE_D2D=dcb453db-c652-48be-a0f8-a64459d5162e:0:5
set TRACE_D2DSCENARIOS=712909c0-6e57-4121-b639-87c8bf9004e0

set TRACE_D3D9_PRESENT=783ACA0A-790E-4d7f-8451-AA850511C6B9:0xf:6
set TRACE_DXGI_PRESENT=CA11C036-0102-4A2D-A6AD-F03CFED5D3C9:0xf:6
set TRACE_D3D11_PRESENT=db6f6ddb-ac77-4e88-8253-819df9bbf140:0xffffffffffffffff:6
set TRACE_DXC_ALL_PRESENT=%TRACE_DXC%:0x10FFFF:5
set TRACE_DXC_PRESENT=%TRACE_DXC%:0x208041:5

I know that this batch file uses 3 different files however, and then merges them, I am unsure if it would be supported by UIforETW

This log.cmd file can be found under C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\gpuview\log.cmd

@randomascii
Copy link
Contributor

UIforETW already uses multiple ETL files and merges them - three of them in the case of heap tracing. However that is mostly just an implementation detail - tracing from multiple providers can be done to one or many ETL files (kernel and user-mode providers are separate, but that is the only enforced separation).

The real question is what is the benefit? I'm not at all opposed to increasing the set of DX providers, especially if it helps diagnose some issues, but randomly adding providers without understanding what data they are supplying and how high their data rate will be comes with some risks.

Any ideas which of these add value?

Note that if there is a specific rare scenario that needs investigation then custom providers can be added in the settings dialog. The GPU tracing option should have a reasonable set of defaults for GPU tracing.

@Lectem
Copy link
Author

Lectem commented Oct 3, 2018

I don't know what all of those exactly do, but some are kind of important such as the TRACE_*_PRESENT + TRACE_DWM ones, as you can see the DWM swapchains events, which are useful when tracking VSync issues.
TRACE_DXGI is also important as it will track swapchain resizing and management of directx devices.
In the case of applications using hardware accelerated video decoding, TRACE_DXVA2 TRACE_DSHOW.

This is also why I was talking about adding the presets of log.cmd, which are the following:

  • MIN
  • PRESENT
  • LIGHT
  • LONGHAUL
  • NORMAL
  • VERBOSE

It also mentions GPUView, at a minimum really needs LOADER, PROC_THREAD and CSWITCH., but I suspect those might already be used by UIforETW.

Based on the MIN preset, we would probably need the following providers:
In the capture ETL file

  • TRACE_DXC_MIN
  • TRACE_UMD
  • TRACE_DXGI
  • TRACE_D3D12
  • TRACE_D3D11
  • TRACE_D3D10LEVEL9

In the kernel etl file

  • LOADER
  • PROC_THREAD
  • CSWITCH
  • DISPATCHER
  • POWER

@randomascii
Copy link
Contributor

The kernel providers listed are indeed already recorded - they are key to many types of analysis.

Those six providers seem reasonable, although I'll need to test to make sure they don't have too high a data rate (whatever that means). I've also seen problems with OS dependencies on the graphics providers before - I want UIforETW to continue working on Windows 7.

@NomDeMorte
Copy link

NomDeMorte commented Dec 21, 2018

Any news on this? It would be amazing to be able to use this instead of the log.cmd which comes with GPUView, especially given the keylogger in UIforETW.
Perhaps a method where we could add custom providers in the UI, would suffice? That way, you don't have to add anything that might break your app but it gives us the flexibility to use it for much more.

Oh, and to answer your question about why this is useful: https://graphics.stanford.edu/~mdfisher/GPUView.html and https://www.anandtech.com/show/6857/amd-stuttering-issues-driver-roadmap-fraps/4 show good examples.

@randomascii
Copy link
Contributor

You can already add custom providers in the Settings dialog. There are places to enter extra kernel flags/providers, kernel stack walks, and extra user mode providers.

But, that is not the most intuitive or friendly way of configuring GPU providers. It would be reasonable to change the GPU Tracing checkbox to a Combo box dropdown with choices like None, Min, Medium, and Full. If None was the default and Min (roughly) matched the current settings then Medium and Full/Maximum could include whatever they want since they are opt-in.

But I haven't done any work on this and I don't know when I might. A pull request would be welcomed.

@NomDeMorte
Copy link

Sorry for the delay in replying - I took a break over Christmas, lost my browser tabs and the rest is (not in my browser) history. Anyway Merry Christmas and a Happy New Year to you! LOL

So, It took me a little (a lot of) fiddling to get this working but I did get there in the end. Essentially, I collected all the variables in the log.cmd, concatenated them all, removed the duplicates and redundancies, and was left with the string of provider GUIDs. I believe there are some redundancies here with the provider names used by the GUI, and there are definitely some conflicts, but that is resolved by disabling the setting in the GUI and just letting the GUIDs I've extracted define those options.

This is the configuration I'm using:

image

image

Extra kernel mode flags: LOADER+PROC_THREAD+CSWITCH+DISPATCHER+POWER

user mode providers: 802ec45a-1e99-4b83-9920-87c98277ba9d:0x10FFFF:5:'stack'+a688ee40-d8d9-4736-b6f9-6b74935ba3b1:0xffff:5+CA11C036-0102-4A2D-A6AD-F03CFED5D3C9:0xf:6:'stack'+5d8087dd-3a9b-4f56-90df-49196cdc4f11:0xffffffffffffffff:6:'stack'+db6f6ddb-ac77-4e88-8253-819df9bbf140:0xffffffffffffffff:6:'stack'+7E7D3382-023C-43cb-95D2-6F0CA6D70381:0x1+DX:0x9+531A35AB-63CE-4BCF-AA98-F88C7A89E455:0xffff:4+ee685ec4-8270-4b08-9e4e-8b356f48f92f:0:1

Two important settings to note:

GPU tracing from the main UI must be unticked. Something about it conflicts with the GUIDs provided. I will be honest and say I can't remember what, but this was necessary.

Tracing to file is mandatory. Circular memory buffers do not work. This option has also been removed from the log.cmd, I cannot find any documentation on it, so I don't honestly know why it isn't working.

Hope this helps! Is there somewhere I should look in the codebase, to see if perhaps I could submit a patch?

@randomascii
Copy link
Contributor

Thanks for looking in to this. If you want to create a PR that fixes up the GPU tracing options to use the new-and-improved settings then that would be great. If you search for bGPUTracing_ in https://github.com/google/UIforETW/blob/master/UIforETW/UIforETWDlg.cpp then you should find all of the relevant code.

I seem to remember that the required providers are different on Windows 7 so we would either need to test there or make no changes for that OS.

I guess the GPU tracing option needs to be disabled for circular-buffer tracing, although it does seem quite odd that that combination doesn't work.

The 'stack' options tend to be quite expensive, in terms of CPU and buffer size, so I wonder if those should be omitted. I guess it depends on how frequently the stacks are actually used. I doubt that GPUView depends on them.

@jeche
Copy link

jeche commented Jul 8, 2019 via email

@randomascii
Copy link
Contributor

What provider level is recommended for general tracing? We don't need the most intense recording available, just enough to diagnose some basic CPU issues. The main thing I watch for is being able to see GPU utilization by process in WPA - the current settings seem to work for that.

@NomDeMorte
Copy link

NomDeMorte commented Jul 8, 2019

It looks like you included the DXC_ALL provider in the additional user mode providers. I'd caution against setting these

Thanks for the heads up. It's entirely possible that I'm doing this entirely wrong. I'm a dev but ETW is not in my skillset at all...

Is it the flags or the 'stack' option that cause this?
Or just the DxgKrnl provider in general? (Doubt this as it's stock to use it, but nothing would surprise me)

I guess the issue here is that the logger handling is different for the batch file and for UIforETW. I knew it was a square peg in a round hole but I figured so long as I kept the user and kernel providers separated I'd be OK - obviously I'm going to need to be more careful.

Bear with me while I butcher this - If there's documentation explaining the difference between enabling the provider and capturing its state, I can't find it,

UIforETW uses the same approach as the batch file does for the 'longhaul' switch:
Enables kernel providers
Starts a session with user providers (802ec45a-1e99-4b83-9920-87c98277ba9d:0x800:5 -same dxgkrnl with different flags)
Captures state of the same user providers above to that session

GPUView's log.cmd for all other normal modes is quite different:
Enables kernel providers
Starts a session with user providers
Captures state of a second set of user providers to that same session - some are the same provider as the first session with different flags
Starts a session with a third state of user providers (why is this separate to the first session?)

What I've done, is mash all the user providers together, and where the same provider was used twice it failed, so I took the one with the most verbose flags, which was always the second set that was intended to be used with the -capturestate switch

I could just omit the duplicates by taking the ones with the less verbose flags from the first set... But perhaps that will cause problems by using them with the capturestate switch? Either way, it's not going to match what the batch file does.

Or perhaps it would be worthwhile, for compatibility with GPUView, for UIforETW to allow two sets of custom user providers, one for the session with the -on switch and one for the -capturestate switch?

Please pardon me if I'm being annoying, I'm aware I'm out of my field here.

@randomascii
Copy link
Contributor

Captures state of a second set of user providers to that same session - some are the same provider as the first session with different flags

I have no idea why that is done. The whole -capturestate dance is something I do not understand. I think that you can safely do a -capturestate on all of the user-mode providers and that is what I would default to doing unless that doesn't work.

log.cmd seems to like using multiple tracing sessions. Lots of tracing systems seem to like that. It can let you specify different amounts of buffering for different providers but that seems like a bad idea because then they wrap at different times. UIforETW uses a kernel session and a user-mode session and I believe that that is the best design. As with -capturestate I think the best thing to do is to put them all in one session as long as it works.

If the same provider is specified twice with different flags then the -start flags are probably the relevant ones. Doing a -capturestate with a more inclusive set of flags probably causes the extra bits to be ignored. That is, it is probably a bug.

Alternately you could OR the two flag sets together, but that would mean capturing more data. Using the -start settings for -capturestate as well seems more sane.

I think that log.cmd is old, crufty, buggy and inconsistent.

@randomascii
Copy link
Contributor

From reading the docs it appears that -capturestate is only needed with -buffering. That is, it is only needed with circular-buffer tracing. If log.cmd doesn't handle that then -capturestate is unnecessary (and UIforETW should be cleaned up to not use -capturestate in that case).

https://docs.microsoft.com/en-us/windows-hardware/test/wpt/start

@NomDeMorte
Copy link

Hey I fixed the notification hurrah! Thanks for working with me on this

I think that log.cmd is old, crufty, buggy and inconsistent.

Sure seems that way. Honestly I think it's trying to do something beyond a batch file's effective scope. It's almost like they need some kind of UI for ETW ;)

It also seems as though the whole thing was built to handle the restrictons of a far older PC, where memory and disk space were at a far greater premium - which would explain the conservative buffering at the expense of buffer wrapping out of sync. ironically, log.cmd uses the same buffer size for everything so having the two sets of usermode sessions each with a differnt set of capturestates seems entirely redundant.

Accordingly, I agree that it would be good to make it work the UIforETW way of having one usermode session and one kernel.

Do you know of any documentation of what the flags actually mean? This might give me a more educated perspective on what flags are actually needed from the perspective of analysing 3d applications (games). The MS doco says to "See providers for a list of valid flags." And of course those commands output a list of providers with no mention of the flags at all.

I think that you can safely do a -capturestate on all of the user-mode providers and that is what I would default to doing unless that doesn't work.

Well, this is what I also thought, but @jeche seems to know of a pitfall there. I'm not sure if it would be OK if I used differnt flags, or dropped the 'state' parameter - which I honestly do not know the prupose of thanks to the aforementioned lack of documentation.

Again, thanks for your help with this. The whole toolchain is basically voodoo magic to me.

@randomascii
Copy link
Contributor

I have never seen documentation on the flags. In the case of Chrome's ETW provider we mapped some flag bits to chrome://tracing categories and you can see that mapping by looking in the Chrome or UIforETW source code, or you can just use the list of categories in the settings box.

One way of decoding the flags is to record a trace with all flags and then look in WPA under Trace->System Configuration-> Trace Statistics in the Keyword column. That Keyword is ANDed with the flags and if the result is non-zero then the event is emitted.

In the case of GPU tracing just copying the flags used for the -start command in the MIN configuration is probably best. Ignore the -capturestate case because I think that is irrelevant for tracing to file, should use exactly the same flags, and should (probably) be done in a separate place - but that is a UIforETW bug (issue #136).

I think @jeche was saying that some DXC_ALL options got pulled in instead of DXC_MIN, leading to overly aggressive tracing that could affect behavior. I didn't read anything about -capturestate into that.

And yeah, batch files for general-purpose ETW tracing are a mess.

@Lectem
Copy link
Author

Lectem commented Jul 9, 2019

Hi,
Sorry for not answering this thread earlier, I asked the DX team and the answer I got was the following:

GPUView needs -capturestate applied to some of its providers, and log.cmd uses a different session for those providers to avoid overhead caused by -capturestate to the providers that don't need it
FWIW, -capturestate maps to the EnableTraceEx2 API with the EVENT_CONTROL_CODE_CAPTURE_STATE control code, which causes all providers to get a callback asking them to emit extra events
GPUView requires this for the graphics kernel provider, but it's not free in terms of runtime overhead to send that callback to all providers, and it's not free in terms of storage for all those extra emitted events, which is why it's only applied selectively

It seems that the latest version of GPUView now ships with a log.wprp that can be used with Windows Performance Recorder which is apparently the official GUI for capturing ETW traces. (I'm now trying to install the latest Windows ADK to check this)

@NomDeMorte
Copy link

OK so it's not just me that failed to find the flags doco, I feel better now.

That flag masking keyworks is an outstanding tip, thank you. How do you even know that? :)

I feel that it would be good to have that drop-down list you mentioned, or at least take the default configuration of NORMAL if we have to choose just one - practically all of the documentation around just says to run log.cmd with no parameters, and it's also the configuration used by nvidia for tracing game performance so I expect it should be sufficiently lightweight as to not interfere even with that load. The only documentation I've seen that refers to using MIN is this blog - although that's about the best GPUView documentation available so it does carry quite some weight. I do know that VERBOSE lives up to it's name and I doubt it would be necessary. For reference, GPUview describes them as:

Log m | Minimum | Log only the minimum basic events
Log l | Light | Log only a few more than the minimum events
Log | Normal | Log the most interesting events
Log v | Verbose | Log everything that GPUView is aware of

I had some other things to add but lectem has just dropped a massive knowledge bomb so I'll come back to that later hah

Looks like having three sessions rather than two might be in order after all...
(And that I'm going to have to upgrade to 1903 to get the new ADK, dang)

@jenatali
Copy link

jenatali commented Jul 9, 2019

FYI, we (the DirectX team) found that Normal logging can have somewhat significant impact for some applications, largely due to stack capture. If all you're looking for are performance metrics, then I'd suggest using log.cmd with the "Present" mode (there's an equivalent mode in the wprp file as well). This is the minimum set of providers that will let you do full performance analysis, tracing application framerates, as well as other metrics like latencies. Check out PresentMon for a tool that can consume that minimal set of providers and produce performance metrics.

@NomDeMorte
Copy link

Sorry for not answering this thread earlier

Please, don't be sorry, that information is golden. Thanks so much!

As I mentioned above, this new information explains why three sessions is necessary and on top of that now we actually know what -capturestate does and why the flags don't conflict.

It's great to see that the toolchain is being updated. I know that one major issue with the log file was handling different locales and I didn't see any easy way around that without heading down the GUI route. I'm sure this will be well received in those countries.

@randomascii is having three sessions in UIforETW out of the question? As good a tool as WPR is, it's not as useful for me as UIforETW is, I really make use of the key/mouse logging.

@jenatali Thanks to you too! Man this thread is officially the single best knowledge base for GPUView online now, not even joking.

It's interesting to know about that performance impact, the only clue I had was Matt's blog (linked above) where he mentioned that the impact was very low - but that was quite some time ago, so it's good to get some more updated information. Since some of what I'm doing is bug hunting (especially related to networking and memory and their effect on the game), I'm essentially trying to capture a mystery, so my previous approach has been 'capture it all and filter it later" but obviously I'm going to need to lay off the gas a little. That being said, I can't help but wonder if using present mode would be sufficient for such a purpose.

I used PresentMon and OCAT in the past and stumbled across UIforETW when I was looking for something that would allow me to trace keyboard inputs.

You guys are a goldmine.

@randomascii
Copy link
Contributor

How do you even know that? :)

I used to work on the team that produces the ETW toolset so I got some training. That plus years of use and exploring.

It is not at all impossible to do three ETW streams in UIforETW, I just wasn't clear on why it was necessary. It's just a bit more code to create them, and I guess a few new failure modes, but nothing too bad.

It would also be great to have an explanation (from @jenatali or @Lectem) on when the --capturestate commands should be issued (at the start or the end) and whether GPU tracing really is incompatible with circular-buffer tracing.

Any advice for dealing with Windows 7 would be useful also, although I'm about ready to stop caring about that.

The GPU tracing check-box could easily be changed to a drop-box with a few different options if that is helpful.

@jenatali
Copy link

jenatali commented Jul 9, 2019

I'm attaching the WPRP version of the log file, since it looks like it inadvertently didn't make it into the public ADK.
log_wprp.zip

As far as where the capturestate commands should go, log.cmd puts them at the beginning, but we have a circular logging version that puts them at the end. The high-level is that GPUView needs an event that either came from object creation, or from a capturestate command, for objects that are referenced by other events. If you try to emit a capturestate at the beginning of a circular log, then it can get erased by the time you finish the trace. But if you capturestate at the end, then you have to walk through the entire trace before you can really start parsing the results (which GPUView does anyway).

One of my explicit goals with PresentMon was to avoid requiring a capturestate, and to be able to do everything with realtime events.

For Win7 support, the only difference is that the graphics kernel provider is a different GUID. You can see the PresentMon change for adding Win7 support.

@NomDeMorte
Copy link

@randomascii I figured that the best way I could help with this, would be to get out of the way ;) But, it's been a little while, so I thought maybe I should ask if there is anything I can do to help?

@randomascii
Copy link
Contributor

This isn't a high priority task for me so it could be months before I get around to it, if ever. If somebody else wants to create a PR then I'd be happy to look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants