Introduce TritonStructured dialect and triton-to-structured pass #82

haishanzzzz · 2024-01-10T05:04:57Z

This PR introduces TritonStructured dialect and triton-to-structured pass. Please see #81 for background.

This PR is broken into 5 commits:

Introduce TritonStructured dialect, which includes three ops: tts.make_tptr, tts.load, and tts.store
Update the current MaskAnalysis, which removed unused functions that produces different flavors of subviews
Minor updates to OpFoldResultsUtils
Introduce triton-to-structured pass
Adds LIT tests

A few notes:

triton-to-structured does not use DialectConversion but rather manually walks the IR to perform rewriting. The reason is this pass does not try to legalize Triton operations (in fact it allows them to live after the pass) but rather opportunistically rewrite certain ops if the analysis succeeds. Legality marking in DialectConversion does not provide a way to set certain instances of an op as legal, which makes it a poor fit for our purpose.
Compared to triton-to-linalg, triton-to-structured is more robust when encountering IRs that it cannot analyze. Specifically, it emitRemark when analysis fails, emitWarning when it may (although very unlikely) produce wrong results, and only fail when a logic error is detected. This graceful failure can be demonstrated by manually rerunning wraparound_unsupported_add_offset.mlir.
LIT tests are directly borrowed from the current tests for triton-to-linalg. Unrelated tests are removed and output is manually examined to verify correctness (except for those ones generated for Triton tutorials).
tt.make_tptr will only be used to handle tensor of pointers.
When pointer analysis fails, or when the original code contains scalar pointer accesses, tt.addptr will be left in the output of the pass. They should be converted to memref operations in structured-to-memref.
Analysis code is unfortunately duplicated for the time being. Please bear with the awkward names.

A main follow up to this PR is to get consensus on how we'd like to handle block pointers in this pass.

There are a few other misc. items on the wishlist for the new code:

Cache pointer and mask analysis results to avoid introducing duplicated ops
More testing on wrap around behavior
Pretty printer, verifier, canonicalizer for newly introduced ops
Investigate why tt.reduce make --remove-dead-values pass fail

include/triton-shared/Analysis/OpFoldResultUtils.h

nhat-nguyen · 2024-01-10T14:08:21Z

Thank you @haishanzzzz, I'll start taking a look :)

lib/AnalysisStructured/PtrAnalysis.cpp

include/triton-shared/AnalysisStructured/PtrAnalysis.h

haishanzzzz · 2024-01-10T18:14:05Z

@haishanzzzz please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree company="Meta"

manbearian · 2024-01-12T18:54:20Z

@haishanzzzz please , can you tell me more about this:

Investigate why tt.reduce make --remove-dead-values pass fail

lib/AnalysisStructured/PtrAnalysis.cpp

haishanzzzz · 2024-01-13T00:58:15Z

@haishanzzzz please , can you tell me more about this:

Investigate why tt.reduce make --remove-dead-values pass fail

This is technically not related to this PR, but if I just do triton-opt --remove-dead-values with tt.reduce things actually fail.

triton-opt --remove-dead-values reducemax_32_256_bf16.mlir                                                                                                                                                                    16:57:35
reducemax_32_256_bf16.mlir:35:7: error: null operand found
      tt.reduce.return %22 : bf16
      ^
reducemax_32_256_bf16.mlir:35:7: note: see current operation: "tt.reduce.return"(<<NULL VALUE>>) : (<<NULL TYPE>>) -> ()

Is this what you see too?

lib/Analysis/OpFoldResultUtils.cpp

lib/AnalysisStructured/MaskAnalysis.cpp

include/triton-shared/Dialect/TritonStructured/IR/TritonStructuredDialect.td

lib/Dialect/TritonStructured/IR/TritonStructuredOps.cpp

lib/Conversion/TritonToStructured/TritonToStructuredPass.cpp

include/triton-shared/AnalysisStructured/PtrAnalysis.h

lib/AnalysisStructured/PtrAnalysis.cpp

include/triton-shared/AnalysisStructured/PtrAnalysis.h

lib/AnalysisStructured/PtrAnalysis.cpp

include/triton-shared/AnalysisStructured/PtrAnalysis.h

lib/AnalysisStructured/PtrAnalysis.cpp

nhat-nguyen · 2024-01-15T21:21:02Z

@haishanzzzz I reviewed the changes and left some comments, overall, I think the approach is good and makes the code much simpler. Love it.

nhat-nguyen · 2024-01-18T18:21:13Z

@haishanzzzz Would you mind updating the description with the decision around keeping tt.addptr for scalars after this pass for structured-to-memref to process later?

haishanzzzz · 2024-01-18T22:37:24Z

@haishanzzzz Would you mind updating the description with the decision around keeping tt.addptr for scalars after this pass for structured-to-memref to process later?

Updated the description to include more info on relying on tt.addptr for scalar pointers, and that we should expect to see tt.addptr at the output of the pass if analysis fails or for scalar pointers.

haishanzzzz · 2024-01-18T22:39:02Z

@nhat-nguyen @manbearian Please let me know if there are other things I can do before closing this PR.

nhat-nguyen

@haishanzzzz This looks good to me. Thank you for addressing all the comments. You just need to update this with the main branch, once done we will be able to land this.

* Introduce TritonStructured dialect * Updated mask analysis * Update OpFoldResultUtils * triton-to-structured pass * LIT tests * Address review comments * Revert header name change * Remove copied version of mask analysis

haishanzzzz and others added 6 commits January 9, 2024 20:21

introduce TritonStructured dialect

095d419

Updated mask analysis

c697bfc

Update OpFoldResultUtils

db396c6

triton-to-structured pass

7df7930

LIT tests

e5a8b0c

Merge branch 'main' into haishanz/triton-to-structured

fd02f38

haishanzzzz commented Jan 10, 2024

View reviewed changes

include/triton-shared/Analysis/OpFoldResultUtils.h Outdated Show resolved Hide resolved

nhat-nguyen self-requested a review January 10, 2024 14:08

nhat-nguyen reviewed Jan 10, 2024

View reviewed changes

lib/AnalysisStructured/PtrAnalysis.cpp Show resolved Hide resolved

nhat-nguyen reviewed Jan 10, 2024

View reviewed changes

include/triton-shared/AnalysisStructured/PtrAnalysis.h Outdated Show resolved Hide resolved

haishanzzzz commented Jan 12, 2024

View reviewed changes

lib/AnalysisStructured/PtrAnalysis.cpp Show resolved Hide resolved