[RFC] First class Triton support in OpenXLA Nvgpu #54

ezhulenev · 2023-05-04T17:51:58Z

[RFC] First class Triton support in OpenXLA-Nvgpu

We want to improve the state of Triton and OpenXLA integration, and make jax-triton more user and compiler friendly.

Please let us know what you think!

stellaraccident · 2023-05-04T18:07:51Z

Neat. My main comment is a meta one: the openxla-nvgpu project is still pretty young and even missing CI and full/proper build support/integration. I'm open to moving fast, but we also need to prioritize some project infrastructure work to hold everything together.

benvanik · 2023-05-04T20:04:56Z

There's other ways of doing this that are much better integrated and should all work today - I'll respond on the doc but the short of it is custom dispatches (ala samples/custom_dispatch/cuda/) are sufficient and well-supported - custom modules and other things should not be required.

ezhulenev · 2023-05-04T20:34:48Z

👍 good point, I think we can start with custom dispatches. Although if we want to push Triton compilation to run time and bundle it with auto tuning (tile selection mostly?), then we'll not be able to do it as a custom dispatch?

benvanik · 2023-05-04T21:22:13Z

Ah, so you're intending to use the sample compiled IREE program but vary the triton kernels without recompiling the program?

ezhulenev · 2023-05-04T21:26:06Z

I think we'll have both strategies:

Triton IR is fully defined at compile time and we just produce a PTX from it (that's what I want to start with)
Triton IR is parametrized (I think it's not representable in triton IR today), and at run time we tune it (that's how XLA:GPU uses triton for matmuls today to pick the best tiling strategy)

benvanik · 2023-05-04T21:31:02Z

Cool. For #1 the custom dispatch way should work. For #2 there are some other ways that are potentially easier. Executable specialization constants can be used to parameterize executables when they are loaded but they may be slightly trickier to integrate with black boxes - may still be interesting to reuse that mechanism with a custom executable type at runtime though. Another option would be to have your custom module return a !hal.executable and schedule work as normal, but at that point it's probably best to use streamable custom calls instead - you'd take your params as push constants, do whatever you needed, and then launch the kernel against the stream.

Initial implementation of the First class triton integration: #54 Requires Triton + patches from https://github.com/ezhulenev/triton/commits/openxla-triton ``` git submodule update --remote third_party/triton ``` Run tests: ``` ctest --test-dir build -R triton ```

ezhulenev changed the title ~~[RFC] [RFC] First class Triton support in OpenXLA Nvgpu~~ [RFC] First class Triton support in OpenXLA Nvgpu May 4, 2023

ezhulenev mentioned this issue May 14, 2023

[Triton] Add end-to-end example for dispatching Triton functions #76

Merged

allieculp mentioned this issue May 18, 2023

[Epic] Production integration of cuBLAS, cuDNN, and Triton #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] First class Triton support in OpenXLA Nvgpu #54

[RFC] First class Triton support in OpenXLA Nvgpu #54

ezhulenev commented May 4, 2023 •

edited

Loading

stellaraccident commented May 4, 2023

benvanik commented May 4, 2023

ezhulenev commented May 4, 2023

benvanik commented May 4, 2023

ezhulenev commented May 4, 2023

benvanik commented May 4, 2023

[RFC] First class Triton support in OpenXLA Nvgpu #54

[RFC] First class Triton support in OpenXLA Nvgpu #54

Comments

ezhulenev commented May 4, 2023 • edited Loading

stellaraccident commented May 4, 2023

benvanik commented May 4, 2023

ezhulenev commented May 4, 2023

benvanik commented May 4, 2023

ezhulenev commented May 4, 2023

benvanik commented May 4, 2023

ezhulenev commented May 4, 2023 •

edited

Loading