Merge OpenAI Triton commit `4f6f768` #2634

whitneywhtsang · 2024-11-05T17:39:22Z

This PR change the Triton base from 86a2ac7 to 4f6f768 (Oct 30).
Pass rate: 97.41%->96.31% (#2633)

Please do not squash and merge this PR.

Fixes #4879

This commit removes special cases for MFMA -> Dot Operand LDS shortcuts. Now it is supported by common linear layout infrastructure. No tests are added, mfma-shortcut.mlir already testing this.

`scaled_dot` is not yet implemented on `gfx11` and `gfx12` so disable unit tests for now.

The string representation allows PyTorch Inductor to serialize/derserialize the `AttrsDescriptor` to the `@triton.heuristics` block in the generated code.

Allows for upcasting in DotOp encoding in RF. This lowering path is not currently in use; pending triton-lang/triton#5003

…indows (#5014) The `-A` argument is not compatible with the Ninja generator. Signed-off-by: Anatoly Myachev <[email protected]>

In the passing we also improve a few other things: - Now `scaled_dot` accepts both uint8/uint16 fp8/bf16 as inputs (before you had to cast it to uint8, which was weird when extending it to bf16). - Add `scaled_dot` to the docs and improve the docs overall (have not render them, might need a few further tweaks)

…Py HALF-related code (#5010) Closes #4992

Example: ```python # On Windows >>> sysconfig.get_config_var("EXE") '.exe' # On Linux >>> sysconfig.get_config_var("EXE") '' ``` --------- Signed-off-by: Anatoly Myachev <[email protected]>

This PR adds more restrictions about when should we apply the sched-load optimizations and un-revert triton-lang/triton#4823. We will only apply the optimization when all of the following is satisfied: 1. pureMatmulProblem, i.e. 1 `tt.dot` in the main loop 2. two `tt.load`s in the main loop 3. 2nd `tt.load` is ahead of the `tt.dot` 4. 1st user of 2nd `tt.load` is after the `tt.dot` 5. tile size is large enough, i.e. nonKDim >= 128 and kDim >= 64

CRobeck and others added 13 commits October 29, 2024 08:46

[Proton] Adding Sorting of Kernels (#4987)

30e7a2e

[FRONTEND] Fix transpose with tuple dims (#5006)

ef61488

Fixes #4879

[AMD] remove redundant LDS bypass checks (#5002)

69f656c

This commit removes special cases for MFMA -> Dot Operand LDS shortcuts. Now it is supported by common linear layout infrastructure. No tests are added, mfma-shortcut.mlir already testing this.

[AMD] Skip scaled_dot tests for gfx11 and gfx12 (#5008)

bf6bd0b

`scaled_dot` is not yet implemented on `gfx11` and `gfx12` so disable unit tests for now.

Add string representation for AttrsDescriptor (#4888)

ebce7f3

The string representation allows PyTorch Inductor to serialize/derserialize the `AttrsDescriptor` to the `@triton.heuristics` block in the generated code.

[BACKEND][NVIDIA] Add Lowering for Shared-to-MMAv3-DotOp Copy (#5009)

cfddb09

Allows for upcasting in DotOp encoding in RF. This lowering path is not currently in use; pending triton-lang/triton#5003

Don't specify -A x64 option and reuse cmake build type config for W…

0591b37

…indows (#5014) The `-A` argument is not compatible with the Ninja generator. Signed-off-by: Anatoly Myachev <[email protected]>

[INTERPRETER] Make sure interpreter works with float16 by reusing Num…

61eb94e

…Py HALF-related code (#5010) Closes #4992

Allow windows cuda files to be used in setup.py (#5015)

018c139

Example: ```python # On Windows >>> sysconfig.get_config_var("EXE") '.exe' # On Linux >>> sysconfig.get_config_var("EXE") '' ``` --------- Signed-off-by: Anatoly Myachev <[email protected]>

Fix formatting in docs for triton.language.dot (#5020)

ef319c8

Ignore autotune runs failed with PTXAS error (#5017)

6693ddd

whitneywhtsang requested a review from anmyachev November 5, 2024 17:39

whitneywhtsang self-assigned this Nov 5, 2024

anmyachev approved these changes Nov 5, 2024

View reviewed changes

Merge commit '4f6f76874ff623562903d5452d499cae3d40d448'

49a52a2

whitneywhtsang force-pushed the whitneywhtsang/merge branch from b8fdbe3 to 49a52a2 Compare November 5, 2024 23:24

whitneywhtsang marked this pull request as ready for review November 6, 2024 00:08

whitneywhtsang merged commit 49a52a2 into main Nov 6, 2024
4 checks passed

whitneywhtsang deleted the whitneywhtsang/merge branch November 6, 2024 00:08

whitneywhtsang changed the title ~~Merge OpenAI Triton commit ebce7f3~~ Merge OpenAI Triton commit 4f6f768 Nov 6, 2024

whitneywhtsang mentioned this pull request Nov 12, 2024

Merge OpenAI Triton till Nov 8th #2577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge OpenAI Triton commit `4f6f768` #2634

Merge OpenAI Triton commit `4f6f768` #2634

whitneywhtsang commented Nov 5, 2024 •

edited

Loading

Merge OpenAI Triton commit 4f6f768 #2634

Merge OpenAI Triton commit 4f6f768 #2634

Conversation

whitneywhtsang commented Nov 5, 2024 • edited Loading

Merge OpenAI Triton commit `4f6f768` #2634

Merge OpenAI Triton commit `4f6f768` #2634

whitneywhtsang commented Nov 5, 2024 •

edited

Loading