Add profiling support to the HAL API for use in benchmarks #45

benvanik · 2019-10-13T20:14:23Z

We'll want a way to expose various metrics from the HAL implementations in a way that avoids excessive sequencer work. This could be accomplished by a begin/end profiling API and a resulting profile that contains cumulative, sampled, or averaged counter values per-backend. On Vulkan this may mean some vendor-specific performance counters in addition to timestamps inserted into command buffers.

benvanik · 2020-11-22T02:04:21Z

1+ years on and progress is being made on this. New threading system will have statistics built-in and help me define the API we want to expose for this (likely something similar to feedback buffers).

benvanik · 2021-03-30T03:09:16Z

Maybe this year? :P
I think the first step is to get tracy recording Vulkan times and the task system appearing as a dedicated execution context in tracy as well. That will unblock performance investigations. I'm still not sure we have solid use cases for programmatic fine-grained profiling yet though those may be useful in parameter searches (though with all the usual caveats of applicability of timings they may not be).

ThomasRaoux · 2021-03-30T05:20:51Z

Do you have some pointers on what would be needed to do to have tracy recoding Vulkan times and what would the result look like? On Android we have pretty much no useful tools for profiling within a command buffer. I hacked up some timestamp queries to be able to get a breakdown for mobile Bert but it is obviously not a sustainable solution. So this is probably our best bet to be able to do at least some basic profiling on phones and I can help out with implementation.

benvanik · 2021-03-30T15:01:55Z

I've got an old set of changes that enables Vulkan in tracy that I'll revive and get working. The issue I ran into last time (and what prevented me from committing it) was that tracy could not at the time render disjoint or overlapping zones, meaning that if there was any asynchronous or overlapping execution it would pad every zone out such that they were perfectly nested. I remember seeing that fiber support was getting added to tracy (in some form), and if it has landed then we can use that to allow the out-of-order zones. Otherwise, the tracy support only produces useful results if there's single dispatches between global barriers such that no two dispatches ever overlap and that's not very useful for anything but microbenchmarks (which could still be useful, but not general-purpose and with all the caveats of applicability microbenchmarks on GPUs have).

benvanik · 2021-03-30T15:12:14Z

Fiber support doesn't seem to have landed, but what we really want is wolfpld/tracy#149 - that's how I accomplished this in wtf and it worked really well. Unfortunately it looks like it's not planned work so I'm not sure what to do there.

With the new feature allowing multiple GPU context tracks I can at least split up queues such that queues can overlap, but within each queue the numbers you'll be getting will not account for overlap :(

benvanik · 2021-03-30T23:41:48Z

I'm going to avoid doing any HAL work here and instead just add Vulkan support directly to the Vulkan HAL. When we want programmatic queries we'll need to add explicit APIs to the HAL but for just seeing timing in tracy we can avoid that.

TLDR: configure with `-DIREE_ENABLE_RENDERDOC_PROFILING=ON`, pass the `--device_profiling_mode=queue` flag to the IREE tools, and launch the tools from the RenderDoc UI in order to get a capture (or use `renderdoccmd capture`): ![image](https://user-images.githubusercontent.com/75337/197648585-b34bd661-cfd1-4fbb-a6f9-2b73bec81b6a.png) Things are set up to allow for other profiling modes in the future but how best to integrate those is TBD. We can figure out how to scale this with other tooling and on other backends but the rough shape of the API should be compatible with the various backend APIs we target (D3D/Metal/CUDA/Vulkan/perf/etc). Note that because RenderDoc will also capture D3D the cmake flag is generic but both the Vulkan and D3D HAL implementations will need to load it themselves (no real code worth sharing as D3D naturally only needs the Windows API query path). Docs have notes that I've verified on Windows. Someone looking to use this on Android will need to figure that out and can add what they find. Fixes #45. Forty five. Wow.

powderluv · 2022-10-25T02:26:45Z

this was the year!!! thanks @benvanik

Automatically created Co-authored-by: OpenXLA Dep Roller <[email protected]>

benvanik added the enhancement ➕ New feature or request label Oct 13, 2019

benvanik added this to the Benchmarking Infrastructure milestone Oct 13, 2019

benvanik added the runtime Relating to the IREE runtime library label Mar 19, 2020

benvanik self-assigned this Nov 22, 2020

benvanik added the hal/api IREE's public C hardware abstraction layer API label Nov 22, 2020

benvanik removed this from the Benchmarking Infrastructure milestone Nov 22, 2020

benvanik mentioned this issue Nov 22, 2020

Add Vulkan tracing via Tracy #1937

Closed

benvanik mentioned this issue Mar 30, 2021

Add an iree_hal_local_executable_t shim that records profiling information #5256

Closed

allieculp added this to IREE Codegen May 2, 2022

allieculp removed this from IREE Codegen May 3, 2022

allieculp added this to IREE - ARCHIVED (do not update) May 18, 2022

benvanik added the backlog label Jun 23, 2022

GMNGeoffrey added this to IREE Jun 27, 2022

GMNGeoffrey moved this to Done in IREE Jun 28, 2022

allieculp removed the status in IREE Jun 28, 2022

GMNGeoffrey removed this from IREE Jun 28, 2022

benvanik mentioned this issue Oct 24, 2022

Adding HAL profiling API and RenderDoc support for Vulkan. #10893

Merged

benvanik closed this as completed in #10893 Oct 25, 2022

benvanik moved this to Done in IREE - ARCHIVED (do not update) Oct 25, 2022

GMNGeoffrey mentioned this issue Mar 29, 2023

Add attention op as transform dialect op #12739

Merged

dpackwood mentioned this issue Sep 8, 2023

RaiseSpecialOps (iree-flow-raise-special-ops) causes compiler crash for some input #14933

Closed

stellaraccident pushed a commit that referenced this issue Sep 24, 2023

Update nightly dependencies (#45)

542ccf2

Automatically created Co-authored-by: OpenXLA Dep Roller <[email protected]>

gabeweisz mentioned this issue Mar 25, 2024

Failure : unimplemented: found unhandled case of expansion/collapse in aten.view #16887

Open

deng-ShiFu mentioned this issue Jul 3, 2024

Failed to compile Transformer model #17801

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add profiling support to the HAL API for use in benchmarks #45

Add profiling support to the HAL API for use in benchmarks #45

benvanik commented Oct 13, 2019

benvanik commented Nov 22, 2020

benvanik commented Mar 30, 2021

ThomasRaoux commented Mar 30, 2021

benvanik commented Mar 30, 2021

benvanik commented Mar 30, 2021

benvanik commented Mar 30, 2021

powderluv commented Oct 25, 2022

Add profiling support to the HAL API for use in benchmarks #45

Add profiling support to the HAL API for use in benchmarks #45

Comments

benvanik commented Oct 13, 2019

benvanik commented Nov 22, 2020

benvanik commented Mar 30, 2021

ThomasRaoux commented Mar 30, 2021

benvanik commented Mar 30, 2021

benvanik commented Mar 30, 2021

benvanik commented Mar 30, 2021

powderluv commented Oct 25, 2022