Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge OpenAI Triton commit 4f6f768 #2634

Merged
merged 14 commits into from
Nov 6, 2024
Merged

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Nov 5, 2024

This PR change the Triton base from 86a2ac7 to 4f6f768 (Oct 30).
Pass rate: 97.41%->96.31% (#2633)

Please do not squash and merge this PR.

CRobeck and others added 13 commits October 29, 2024 08:46
This commit removes special cases for MFMA -> Dot Operand
LDS shortcuts. Now it is supported by common linear layout
infrastructure.

No tests are added, mfma-shortcut.mlir already testing this.
`scaled_dot` is not yet implemented on `gfx11` and `gfx12`
so disable unit tests for now.
The string representation allows PyTorch Inductor to
serialize/derserialize the `AttrsDescriptor` to the `@triton.heuristics`
block in the generated code.
Allows for upcasting in DotOp encoding in RF.
This lowering path is not currently in use; pending
triton-lang/triton#5003
…indows (#5014)

The `-A` argument is not compatible with the Ninja generator.

Signed-off-by: Anatoly Myachev <[email protected]>
In the passing we also improve a few other things:
- Now `scaled_dot` accepts both uint8/uint16 fp8/bf16 as inputs (before
you had to cast it to uint8, which was weird when extending it to bf16).
- Add `scaled_dot` to the docs and improve the docs overall (have not
render them, might need a few further tweaks)
Example:

```python
# On Windows
>>> sysconfig.get_config_var("EXE")
'.exe'

# On Linux
>>> sysconfig.get_config_var("EXE")
''
```

---------

Signed-off-by: Anatoly Myachev <[email protected]>
This PR adds more restrictions about when should we apply
the sched-load optimizations and un-revert
triton-lang/triton#4823.

We will only apply the optimization when all of the following is
satisfied:
1. pureMatmulProblem, i.e. 1 `tt.dot` in the main loop
2. two `tt.load`s in the main loop
3. 2nd `tt.load` is ahead of the `tt.dot`
4. 1st user of 2nd `tt.load` is after the `tt.dot`
5. tile size is large enough, i.e. nonKDim >= 128 and kDim >= 64
@whitneywhtsang whitneywhtsang self-assigned this Nov 5, 2024
@whitneywhtsang whitneywhtsang marked this pull request as ready for review November 6, 2024 00:08
@whitneywhtsang whitneywhtsang merged commit 49a52a2 into main Nov 6, 2024
4 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge branch November 6, 2024 00:08
@whitneywhtsang whitneywhtsang changed the title Merge OpenAI Triton commit ebce7f3 Merge OpenAI Triton commit 4f6f768 Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.