Port InsertGpuAlloc, SetSPIRVCapabilities & SetAbiAttrPass from main #310

nbpatel · 2022-09-05T18:42:49Z

Please review these guidelines to help with the review process:

Have you provided a meaningful PR description?
Have you added a test, a reproducer, or a reference to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
If this PR is a work in progress, are you filing the PR as a draft?
Have you organized your commits logically and ensured each can be built by itself?

This PR ports the InsertGpuAllocs, SetSPIRVCapabilities & SetAbiAttrPass from main branch. The passes have test cases and a doc attached to them.

Hardcode84

Sorry, but we need all these tests in main first

fschlimb

Wow, we are getting some real functionality into refactor! God to see comprehensive (user-)docs and tests!

We should follow the upstream conventions and have the docu in the .td files (yes, hey have some separate files in docs, but that's the exception). If we follow this process we get the standard layout, formatting and a TOC maintained for free.

One reason for the refactor was to make sure that all code is in a state which allows non-authors to easily understand it. The code here has almost no comments, that's not good. I know this is how it is on main. However here, at least a summary of the functional flow in each file as well as of each (non-trivial) function should be added. I don't think a meaningful review can be done without.

fschlimb · 2022-09-06T07:20:01Z

@nbpatel if you install the "precommit" to your local repository (from where you push) - as mentioned in the RADME - you will get most of the formatting issues reported by pre-commit-check automatically corrected during your commit there. If you have issues using it, we should fix it (you might raise an issue for that).

silee2 · 2022-09-07T17:16:45Z

tools/imex-opt/imex-opt.cpp

@@ -38,3 +39,22 @@ int main(int argc, char **argv) {
  return ::mlir::asMainReturnCode(
      ::mlir::MlirOptMain(argc, argv, "Imex optimizer driver\n", registry));
 }
+


Would be better if registration could be done through "::imex::registerAllPasses()" automatically by using files like Passes.td Passes.h but could be done in a follow up PR.

I agree, the conversions already go into AllPasses thingy.

silee2

LGTM

Jianhui-Li · 2022-09-07T19:30:39Z

lib/Transforms/InsertGpuAllocs.cpp

+        return {{store.memref()}};
+      } else if (auto call = mlir::dyn_cast<mlir::func::CallOp>(op)) {
+        mlir::SmallVector<mlir::Value, 4> ret;
+        for (auto arg : call.operands()) {


what is the reason we care call op's operands, not others?

because there might be a func.call whose operands would be memrefs. basically we are just looking for all the memref producers

This is something you could add as a comment :)

because there might be a func.call whose operands would be memrefs. basically, we are just looking for all the memref producers

I am a bit confused by this statement. Store doesn't produce memref. There are so many other dialects/ops producing memrefs. Why we only care load/store/call op? What exact analysis and transformation we do in this pass for these three ops?

getMemRef() is a very generic name, and the implementation just takes three op categories. I think it should be modified to be a narrower name exactly describing what it is.

I guess this is to find the use of memref (produced or consumed) within the device kernel, whose memory buffers are likely being prepared by outside the device kernel. That's why it looks for load/store.

For call, the parameter may or may not be defined outside the device kernel. I guess they are collected and then verified later.

Jianhui-Li · 2022-09-07T19:31:09Z

lib/Transforms/InsertGpuAllocs.cpp

+        }
+        return std::move(ret);
+      } else {
+        op->emitError("Uhhandled mem op in gpu region");


what does "emitError" mean? Is this really an error?

Jianhui-Li · 2022-09-07T19:39:16Z

lib/Transforms/InsertGpuAllocs.cpp

+    // Traverse through all the memory access ops under GPU launch Op
+    // and add device memory allocation appropriately.
+    if (func.walk([&](mlir::Operation *op) {
+              if (!op->getParentOfType<mlir::gpu::LaunchOp>())


This seems a significant limitation. Any plan to support gpu.launch_func? Any code or design changes needed to support gpu.launch_func?

how would it be any different after the kernels are outlined? currently this pass runs before the kernels get outlined. The drawback I see is making sure the gpu level tests have to have the gpu.launch blocks ... format...
maybe I am missing something

how would it be any different after the kernels are outlined?
I think that may involve cross device func and host code analysis, since the memref used inside device kernel needs to be gpu.alloc in the host code. I would expect the code needs to handle launch_func. I don't know how important it is so far for our use cases, but I think we need to tell user upfront that we are not handling gpu.launch_func.

currently this pass runs before the kernels get outlined.
Maybe you can put this in the pass introduction. So user knows the limitation.

This pass was specifically written to be run before kernel outlining, it is much harder to work on outlined kernel and generally not worth it.

Jianhui-Li · 2022-09-07T19:44:13Z

lib/Transforms/InsertGpuAllocs.cpp

+                      continue;
+                    if (mlir::isa<mlir::memref::AllocOp>(op)) {
+                      gpuBufferAllocs.insert({op, {}});
+                    } else if (mlir::isa<mlir::func::CallOp>(op)) {


Is call op treated like scfdialect? If so, maybe list that three conditions together.

Jianhui-Li · 2022-09-07T19:44:55Z

lib/Transforms/InsertGpuAllocs.cpp

+                      gpuGetMemrefGlobalParams.insert({op, {}});
+                      continue;
+                    }
+                    if (op->getDialect() == scfDialect ||


What are the purpose of making these two conditions parallel? Seems unrelated to each other.

Also why only scfDialect? There are so many other dialects. like linalg/Affine loop could produce memref also?

This pass should be run after all linalg/affine ops were translated to scf, supporting them is too much work without any actual benefits.

Jianhui-Li · 2022-09-07T20:36:58Z

lib/Transforms/InsertGpuAllocs.cpp

+      auto param = block.getArgument(it.first);
+      it.second = getAccessType(param);
+
+      it.second.hostRead = true;


Does this mean that the function always assumes the input/output is in the host side? This is not TRUE for XLA usage, where the caller controls the runtime resource, so the input/output passed would be in device side for most of time.

Are there a use case where there are two consecutive function calls, the first one produce a buffer for the second one to use, can the intermediate buffer stay in the device memory? How the current code would be restructured to support that need?

I don't mean that this PR needs to fix these limitations. But would be useful if we list these considerations as future development opportunities.

yes, you are right, the function assumes that the input/output is in host side. we dont have a test case of the above mentioned case. maybe in the future I will do a subsequent PR to address this?

Jianhui-Li · 2022-09-07T20:38:58Z

lib/Transforms/InsertGpuAllocs.cpp

+      alloc->replaceAllUsesWith(gpuAlloc);
+      alloc.erase();
+      if (access.hostRead || access.hostWrite)
+        gpuAlloc->setAttr(imex::getAllocSharedAttrName(),


why in this case, there is no deallocop, like in the parameter and global case? "builder.createmlir::gpu::DeallocOp(loc, llvm::None, allocResult);"?

If a func first alloc a temp memory and then release it? We only replace the memref.alloc, not memref.dealloc?
Don't we need negative test cases also?

Good catch. I will add the code for adding the gpu.dealloc in this case

Jianhui-Li · 2022-09-07T20:47:10Z

lib/Transforms/InsertGpuAllocs.cpp

+                    if (op->getDialect() == scfDialect ||
+                        mlir::isa<mlir::ViewLikeOpInterface>(op))
+                      continue;
+                    if (mlir::isa<mlir::memref::AllocOp>(op)) {


What about handling variants of Allocop?
memref.alloc (::mlir::memref::AllocOp)
memref.alloca (::mlir::memref::AllocaOp)
memref.alloca_scope (::mlir::memref::AllocaScopeOp)
memref.alloca_scope.return (::mlir::memref::AllocaScopeReturnOp)

these are not handled as of now in this PR, only memref::allocOp is handled

Are these cases important to handle? Or we choose to tell users upfront that we are not handling this? I would prefer the functionality to cover broad cases if the efforts are reasonable.

We don't want to find out these corner cases. We should either implement them or documents them.

It is fine for this PR not to implementation. But would be good to know the limitation and the plan.

added a limitations section in the doc description

Jianhui-Li · 2022-09-07T20:59:21Z

lib/Transforms/InsertGpuAllocs.cpp

+      builder.setInsertionPoint(alloc);
+      auto gpuAlloc = builder.create<mlir::gpu::AllocOp>(
+          loc, alloc.getType(), /*asyncToken*/ nullptr,
+          /*asyncDependencies*/ llvm::None, alloc.dynamicSizes(),


Why the getType and size are different than the global and parameters?

for (auto i : llvm::seq(0u, rank)) { if (memrefType.isDynamicDim(i)) { auto op = builder.create<mlir::memref::DimOp>(loc, param, i); dims.push_back(op); filter.insert(op); } } auto allocType = mlir::MemRefType::get( memrefType.getShape(), memrefType.getElementType(), mlir::MemRefLayoutAttrInterface{}, memrefType.getMemorySpace()); auto gpuAlloc = builder.create<mlir::gpu::AllocOp>( loc, allocType, /*asyncToken*/ nullptr, /*asyncDependencies*/ llvm::None, dims, /*symbolOperands*/ llvm::None);

because here you directly have a memref alloc to convert to gpu alloc and in global and params case you dont have it, you add the allocs and subsequent mem copy's

Jianhui-Li · 2022-09-07T20:59:52Z

lib/Transforms/InsertGpuAllocs.cpp

+    // This is the case where the inputs are passed as arguments to the
+    // function. This code will add the IR for memeory allocation on the device
+    // with gpu.alloc and insert a memref.copy from host to device
+    for (auto it : gpuBufferParams) {


Is it possible for global and parameter to share code as much as possible if most of handling are same?

fschlimb · 2022-09-08T13:54:08Z

docs/Transforms/InsertGpuAllocs.md

+
+The InsertGpuAllocs pass, as the name suggests, inserts the gpu allocs in the IR. Memref alloc is an operation in the memref dialect that can be used to allocate the memory on the host side and or on the device side. The MLIR IR is a mix of host and device code.
+To distinguish between host side memory allocation and device side memory allocation, we convert all the memref.allocs that refer to device (gpu) side memory allocations and references, into gpu.alloc, which is an operation of the upstream GPU dialect. This distinction helps in lowering to llvm and calling the appropriate memory allocation operation at runtime.
+The pass traverses all the memref (load/store) operations inside the gpu launch op in the IR and checks for its aliases and its defining op. If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device.


I suggest moving this last line into the file with the implementation. This is good example of the kind of high-level description of the flow of the implementation which I suggest to have in all files. A "user" of this pass does not care about such details, but the reader of the code greatly benefits from it.

All the limitations should be documented here. We don't want user to find it out during debugging process.

If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device.

Also for this sentence, I wonder how the upstream vulkun runner work, since the upstream MLIR doesn't have this Pass.

fschlimb · 2022-09-08T13:55:19Z

docs/Transforms/InsertGpuAllocs.md

+  return
+}
+```
+


Examples are great. Can you provide a smaller example? Much of this is not really needed to show what's happening and are rather distracting.

Are there any ways to remove or reduce these lines?
%3 = memref.cast %0 : memref<8xf32> to memref<?xf32>
%4 = memref.cast %1 : memref<8xf32> to memref<?xf32>
%5 = memref.cast %2 : memref<8xf32> to memref<?xf32>
call @fillResource1DFloat(%3, %cst_0) : (memref<?xf32>, f32) -> ()
call @fillResource1DFloat(%4, %cst) : (memref<?xf32>, f32) -> ()
call @fillResource1DFloat(%5, %cst_1) : (memref<?xf32>, f32) -> ()

fschlimb · 2022-09-08T15:06:42Z

docs/Transforms/SetSPIRVAbiAttribute.md

+
+
+The SetSPIRVAbiAttribute pass, adds a kernel attribute called spv.entry_point_abi to the kernel function. Since SPIR-V programs themselves are not enough for running workloads on GPU; a companion host application is needed to manage the resources referenced by SPIR-V programs and dispatch the workload. It is also quite possible that both those programs are written by different frond-end languages.Hence the need to add the entry point abi.
+spv.entry_point_abi is a struct attribute that should be attached to the entry function. Some of the lowering passes expect this attribute to perform the lowering.


I do not understand how this can be useful if all it does is adding a constant attribute. If it gets added to all entry functions, how does it make a difference. I guess I am missing something fundamental.

fschlimb · 2022-09-08T15:12:39Z

lib/Transforms/InsertGpuAllocs.cpp

+
+mlir::StringRef getAllocSharedAttrName() { return "gpu.alloc_shared"; }
+
+struct InsertGPUAllocs


some summary in doxygen style would auto-generate docs.

fschlimb · 2022-09-08T15:13:39Z

lib/Transforms/InsertGpuAllocs.cpp

+      return false;
+    };
+
+    // Traverse through all the memory access ops under GPU launch Op


Since you asked: until here, there is no comment or anything saying what happens in the code.

Agree. Some reasonable comments are needed. Usually the questions from readers/reviewers are good hints where a comment might be needed.

added more comments in the code

fschlimb · 2022-09-08T15:18:50Z

lib/Transforms/InsertGpuAllocs.cpp

+
+    // GetMemrefGlobal Op Case:
+    // This is the case where the inputs are globals contants and accessed using
+    // memref.get_global op. This code will add the IR for memeory allocation on


typo: memory

fschlimb · 2022-09-08T15:21:20Z

lib/Transforms/SetSPIRVCapabalities.cpp

@@ -0,0 +1,61 @@
+//===- SetSPIRVCapabalities.cpp - SetSPIRVCapabalities Pass  -------*- C++


filename has a typo

fschlimb · 2022-09-08T15:22:41Z

lib/Transforms/SetSPIRVCapabalities.cpp

+struct SetSPIRVCapabilitiesPass
+    : public mlir::PassWrapper<SetSPIRVCapabilitiesPass,
+                               mlir::OperationPass<mlir::ModuleOp>> {
+  void runOnOperation() override {


Is this intel-specific? If so, shouldn't this be called accordingly?

fschlimb · 2022-09-08T15:26:49Z

test/Transforms/set-spirv-capability.mlir

+
+module attributes {gpu.container_module} {
+
+// CHECK: module attributes {gpu.container_module, spv.target_env = #spv.target_env<#spv.vce<v1.0, [Addresses, Float16Buffer, Int64, Int16, Int8, Kernel, Linkage, Vector16, GenericPointer, Groups, Float16, Float64, AtomicFloat32AddEXT, ExpectAssumeKHR], [SPV_EXT_shader_atomic_float_add, SPV_KHR_expect_assume]>, #spv.resource_limits<>>} {


Any particular reason why this test is so big? Most of the code here is not checked, so why is it needed?
It might be worthwhile generally reducing the suggested tests. Minimal tests are easier to maintain and understand.

fschlimb · 2022-09-08T15:28:16Z

tools/imex-opt/CMakeLists.txt

@@ -8,4 +8,4 @@ set(LIBS
 add_llvm_executable(imex-opt imex-opt.cpp)

 llvm_update_compile_flags(imex-opt)
-target_link_libraries(imex-opt PRIVATE ${LIBS})
+target_link_libraries(imex-opt PRIVATE ${LIBS} IMEXTransforms)


Why is IMEXTransforms not in ${LIBS}. It should be.

Jianhui-Li · 2022-09-08T16:51:22Z

docs/Transforms/InsertGpuAllocs.md

+
+The InsertGpuAllocs pass, as the name suggests, inserts the gpu allocs in the IR. Memref alloc is an operation in the memref dialect that can be used to allocate the memory on the host side and or on the device side. The MLIR IR is a mix of host and device code.
+To distinguish between host side memory allocation and device side memory allocation, we convert all the memref.allocs that refer to device (gpu) side memory allocations and references, into gpu.alloc, which is an operation of the upstream GPU dialect. This distinction helps in lowering to llvm and calling the appropriate memory allocation operation at runtime.
+The pass traverses all the memref (load/store) operations inside the gpu launch op in the IR and checks for its aliases and its defining op. If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device.


All the limitations should be documented here. We don't want user to find it out during debugging process.

If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device.

Also for this sentence, I wonder how the upstream vulkun runner work, since the upstream MLIR doesn't have this Pass.

Jianhui-Li · 2022-09-08T17:05:53Z

lib/Transforms/InsertGpuAllocs.cpp

+        return {{store.memref()}};
+      } else if (auto call = mlir::dyn_cast<mlir::func::CallOp>(op)) {
+        mlir::SmallVector<mlir::Value, 4> ret;
+        for (auto arg : call.operands()) {


because there might be a func.call whose operands would be memrefs. basically, we are just looking for all the memref producers

I am a bit confused by this statement. Store doesn't produce memref. There are so many other dialects/ops producing memrefs. Why we only care load/store/call op? What exact analysis and transformation we do in this pass for these three ops?

getMemRef() is a very generic name, and the implementation just takes three op categories. I think it should be modified to be a narrower name exactly describing what it is.

Jianhui-Li · 2022-09-08T17:07:35Z

lib/Transforms/InsertGpuAllocs.cpp

+      return false;
+    };
+
+    // Traverse through all the memory access ops under GPU launch Op


Agree. Some reasonable comments are needed. Usually the questions from readers/reviewers are good hints where a comment might be needed.

Jianhui-Li · 2022-09-08T17:12:04Z

lib/Transforms/InsertGpuAllocs.cpp

+    // Traverse through all the memory access ops under GPU launch Op
+    // and add device memory allocation appropriately.
+    if (func.walk([&](mlir::Operation *op) {
+              if (!op->getParentOfType<mlir::gpu::LaunchOp>())


how would it be any different after the kernels are outlined?
I think that may involve cross device func and host code analysis, since the memref used inside device kernel needs to be gpu.alloc in the host code. I would expect the code needs to handle launch_func. I don't know how important it is so far for our use cases, but I think we need to tell user upfront that we are not handling gpu.launch_func.

currently this pass runs before the kernels get outlined.
Maybe you can put this in the pass introduction. So user knows the limitation.

Jianhui-Li · 2022-09-08T17:19:44Z

lib/Transforms/InsertGpuAllocs.cpp

+                      return mlir::WalkResult::interrupt();
+                    }
+
+                  } else {


this is the gpu params case,
Here you may want to put an assert statement to assert your assumption.

The logic is not straightforward -

Here you assume that if alias.getDefiningOp() do return a null pointer, then the assumption is that alias can give Parentblocks where you can find the arguments. What if the loop body has nested loop inside? Do you write test case to verify?

Jianhui-Li · 2022-09-08T17:28:23Z

lib/Transforms/InsertGpuAllocs.cpp

+                    if (op->getDialect() == scfDialect ||
+                        mlir::isa<mlir::ViewLikeOpInterface>(op))
+                      continue;
+                    if (mlir::isa<mlir::memref::AllocOp>(op)) {


Are these cases important to handle? Or we choose to tell users upfront that we are not handling this? I would prefer the functionality to cover broad cases if the efforts are reasonable.

We don't want to find out these corner cases. We should either implement them or documents them.

It is fine for this PR not to implementation. But would be good to know the limitation and the plan.

Jianhui-Li · 2022-09-08T17:29:29Z

lib/Transforms/InsertGpuAllocs.cpp

+              return mlir::WalkResult::advance();
+            })
+            .wasInterrupted()) {
+      signalPassFailure();


Does this pass work with Vulkan runner?

Jianhui-Li · 2022-09-08T17:51:50Z

lib/Transforms/InsertGpuAllocs.cpp

+        return {{store.memref()}};
+      } else if (auto call = mlir::dyn_cast<mlir::func::CallOp>(op)) {
+        mlir::SmallVector<mlir::Value, 4> ret;
+        for (auto arg : call.operands()) {


I guess this is to find the use of memref (produced or consumed) within the device kernel, whose memory buffers are likely being prepared by outside the device kernel. That's why it looks for load/store.

For call, the parameter may or may not be defined outside the device kernel. I guess they are collected and then verified later.

Jianhui-Li · 2022-09-08T17:57:10Z

lib/Transforms/InsertGpuAllocs.cpp

+          if (mlir::isa<mlir::func::CallOp>(user)) {
+            bool onDevice = user->getParentOfType<mlir::gpu::LaunchOp>();
+            (onDevice ? ret.deviceRead : ret.hostRead) = true;
+            (onDevice ? ret.deviceWrite : ret.hostWrite) = true;


What is the use for "deviceRead " and "deviceWrite"? Do the code do any optimization for these tags?

Jianhui-Li · 2022-09-08T17:59:15Z

lib/Transforms/InsertGpuAllocs.cpp

+      auto rank = static_cast<unsigned>(memrefType.getRank());
+      filter.clear();
+      dims.clear();
+      for (auto i : llvm::seq(0u, rank)) {


Does this cover shape with unknown rank, unkown dim, and known rank/dim? Do you have test cases for those?

it works with known rank and unknown dims and obviously with static shapes ...not other cases..I will add a test case for dynamic dim case

drprajap · 2022-09-08T22:36:58Z

include/imex/Transforms/Transforms.h

+#include <llvm/ADT/SmallBitVector.h>
+#include <memory>
+#include <mlir/Analysis/BufferViewFlowAnalysis.h>
+#include <mlir/Conversion/ArithmeticToSPIRV/ArithmeticToSPIRV.h>
+#include <mlir/Conversion/ControlFlowToSPIRV/ControlFlowToSPIRV.h>
+#include <mlir/Conversion/FuncToSPIRV/FuncToSPIRV.h>
+#include <mlir/Conversion/GPUToSPIRV/GPUToSPIRV.h>
+#include <mlir/Conversion/MathToSPIRV/MathToSPIRV.h>
+#include <mlir/Conversion/SCFToSPIRV/SCFToSPIRV.h>
+#include <mlir/Dialect/Affine/IR/AffineOps.h>
+#include <mlir/Dialect/ControlFlow/IR/ControlFlowOps.h>
+#include <mlir/Dialect/Func/IR/FuncOps.h>
+#include <mlir/Dialect/GPU/Transforms/ParallelLoopMapper.h>
+#include <mlir/Dialect/GPU/Transforms/Passes.h>
+#include <mlir/Dialect/MemRef/IR/MemRef.h>
+#include <mlir/Dialect/SCF/IR/SCF.h>
+#include <mlir/Dialect/SPIRV/IR/SPIRVDialect.h>
+#include <mlir/Dialect/SPIRV/IR/SPIRVOps.h>
+#include <mlir/Dialect/SPIRV/IR/TargetAndABI.h>
+#include <mlir/Dialect/SPIRV/Transforms/SPIRVConversion.h>
+#include <mlir/Pass/Pass.h>
+#include <mlir/Target/SPIRV/Serialization.h>
+#include <mlir/Transforms/DialectConversion.h>
+#include <mlir/Transforms/GreedyPatternRewriteDriver.h>


it is better to move all these includes in cpp file to limit the inclusion scope..

Jianhui-Li · 2022-09-13T22:31:27Z

lib/Transforms/InsertGpuAllocs.cpp

+              if (!isMemReadWriteOp(op))
+                return mlir::WalkResult::advance();
+
+              auto memref = getMemReadWriteOp(op);


getMemReadWriteOp() sounds like returning an OP not memref.

…sable workflows (intel#310) * Fixed Nightly CI workflow Signed-off-by: Gregory Shimansky <[email protected]> * Added secrets inherit when calling reusable workflows Signed-off-by: Gregory Shimansky <[email protected]> * Added missing rependency for sending report Signed-off-by: Gregory Shimansky <[email protected]> * Fixed conditional operator Signed-off-by: Gregory Shimansky <[email protected]> * Fixed runs-on for generate_report Signed-off-by: Gregory Shimansky <[email protected]> * Removed report because it cannot be revived Signed-off-by: Gregory Shimansky <[email protected]> * Execute cpu and gpu builds in parallel Signed-off-by: Gregory Shimansky <[email protected]> --------- Signed-off-by: Gregory Shimansky <[email protected]>

nbpatel added 8 commits August 30, 2022 10:39

Add InsertGpuAlloc Pass and doc

1404560

Register insert gpu alloc pass and add a test case

cbee430

Add SetSPIRVCapabilitiesPass and a test case

6e5b25c

Add AbiAttrsPass pass and a test case

cd0292f

Add doc for SetSpirvCapabilities and test case for InsertGpuAlloc

40d184d

Address the memref.get_global issue in insertGpuAlloc Pass

de88b71

Create new folder for tests for InsertGpuAllocs Pass

00a0088

Cleanup

49ebebc

nbpatel requested review from fschlimb and silee2 September 5, 2022 18:42

Hardcode84 self-requested a review September 5, 2022 19:03

Hardcode84 suggested changes Sep 5, 2022

View reviewed changes

fschlimb requested changes Sep 6, 2022

View reviewed changes

pre-commit check

2ba7439

Hardcode84 self-requested a review September 7, 2022 15:57

silee2 reviewed Sep 7, 2022

View reviewed changes

silee2 approved these changes Sep 7, 2022

View reviewed changes

nbpatel requested review from drprajap and Dewei-Wang-sh September 7, 2022 17:37

Jianhui-Li requested changes Sep 7, 2022

View reviewed changes

fschlimb reviewed Sep 8, 2022

View reviewed changes

Jianhui-Li requested changes Sep 8, 2022

View reviewed changes

Address feedback

73e655b

nbpatel force-pushed the nishant_add_passes branch from eb25c93 to 73e655b Compare September 8, 2022 20:57

drprajap reviewed Sep 8, 2022

View reviewed changes

nbpatel added 3 commits September 9, 2022 11:26

Add more test cases for InsertGpuAlloc pass

8df6f15

Remove useGpuDealloc flag

30697e6

Code refactor

55101c9

Jianhui-Li approved these changes Sep 13, 2022

View reviewed changes

Merge branch 'refactor' into refactor_add_passes

7acb4d1

nbpatel force-pushed the nishant_add_passes branch from dd9573f to 7acb4d1 Compare September 14, 2022 22:48

nbpatel merged commit e5fdf12 into refactor Sep 15, 2022

nbpatel deleted the nishant_add_passes branch October 4, 2022 21:35



		The SetSPIRVAbiAttribute pass, adds a kernel attribute called spv.entry_point_abi to the kernel function. Since SPIR-V programs themselves are not enough for running workloads on GPU; a companion host application is needed to manage the resources referenced by SPIR-V programs and dispatch the workload. It is also quite possible that both those programs are written by different frond-end languages.Hence the need to add the entry point abi.
		spv.entry_point_abi is a struct attribute that should be attached to the entry function. Some of the lowering passes expect this attribute to perform the lowering.


		mlir::StringRef getAllocSharedAttrName() { return "gpu.alloc_shared"; }

		struct InsertGPUAllocs

		@@ -0,0 +1,61 @@
		//===- SetSPIRVCapabalities.cpp - SetSPIRVCapabalities Pass -------*- C++


		module attributes {gpu.container_module} {

		// CHECK: module attributes {gpu.container_module, spv.target_env = #spv.target_env<#spv.vce<v1.0, [Addresses, Float16Buffer, Int64, Int16, Int8, Kernel, Linkage, Vector16, GenericPointer, Groups, Float16, Float64, AtomicFloat32AddEXT, ExpectAssumeKHR], [SPV_EXT_shader_atomic_float_add, SPV_KHR_expect_assume]>, #spv.resource_limits<>>} {

Port InsertGpuAlloc, SetSPIRVCapabilities & SetAbiAttrPass from main #310

Port InsertGpuAlloc, SetSPIRVCapabilities & SetAbiAttrPass from main #310

Conversation

nbpatel commented Sep 5, 2022 • edited Loading

Hardcode84 left a comment • edited Loading

Choose a reason for hiding this comment

fschlimb left a comment

Choose a reason for hiding this comment

fschlimb commented Sep 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

silee2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jianhui-Li Sep 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nbpatel Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nbpatel Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fschlimb Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

Jianhui-Li Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fschlimb Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nbpatel commented Sep 5, 2022 •

edited

Loading

Hardcode84 left a comment •

edited

Loading

Jianhui-Li Sep 7, 2022 •

edited

Loading

nbpatel Sep 8, 2022 •

edited

Loading

nbpatel Sep 8, 2022 •

edited

Loading

fschlimb Sep 8, 2022 •

edited

Loading

Jianhui-Li Sep 8, 2022 •

edited

Loading

fschlimb Sep 8, 2022 •

edited

Loading