Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port InsertGpuAlloc, SetSPIRVCapabilities & SetAbiAttrPass from main #310

Merged
merged 14 commits into from
Sep 15, 2022

Conversation

nbpatel
Copy link
Contributor

@nbpatel nbpatel commented Sep 5, 2022

Please review these guidelines to help with the review process:

  • Have you provided a meaningful PR description?
  • Have you added a test, a reproducer, or a reference to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • If this PR is a work in progress, are you filing the PR as a draft?
  • Have you organized your commits logically and ensured each can be built by itself?

This PR ports the InsertGpuAllocs, SetSPIRVCapabilities & SetAbiAttrPass from main branch. The passes have test cases and a doc attached to them.

@nbpatel nbpatel requested review from fschlimb and silee2 September 5, 2022 18:42
@Hardcode84 Hardcode84 self-requested a review September 5, 2022 19:03
Copy link
Contributor

@Hardcode84 Hardcode84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but we need all these tests in main first

Copy link
Contributor

@fschlimb fschlimb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, we are getting some real functionality into refactor! God to see comprehensive (user-)docs and tests!

We should follow the upstream conventions and have the docu in the .td files (yes, hey have some separate files in docs, but that's the exception). If we follow this process we get the standard layout, formatting and a TOC maintained for free.

One reason for the refactor was to make sure that all code is in a state which allows non-authors to easily understand it. The code here has almost no comments, that's not good. I know this is how it is on main. However here, at least a summary of the functional flow in each file as well as of each (non-trivial) function should be added. I don't think a meaningful review can be done without.

@fschlimb
Copy link
Contributor

fschlimb commented Sep 6, 2022

@nbpatel if you install the "precommit" to your local repository (from where you push) - as mentioned in the RADME - you will get most of the formatting issues reported by pre-commit-check automatically corrected during your commit there. If you have issues using it, we should fix it (you might raise an issue for that).

@Hardcode84 Hardcode84 self-requested a review September 7, 2022 15:57
@@ -38,3 +39,22 @@ int main(int argc, char **argv) {
return ::mlir::asMainReturnCode(
::mlir::MlirOptMain(argc, argv, "Imex optimizer driver\n", registry));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better if registration could be done through "::imex::registerAllPasses()" automatically by using files like Passes.td Passes.h but could be done in a follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the conversions already go into AllPasses thingy.

Copy link
Contributor

@silee2 silee2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return {{store.memref()}};
} else if (auto call = mlir::dyn_cast<mlir::func::CallOp>(op)) {
mlir::SmallVector<mlir::Value, 4> ret;
for (auto arg : call.operands()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason we care call op's operands, not others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because there might be a func.call whose operands would be memrefs. basically we are just looking for all the memref producers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something you could add as a comment :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because there might be a func.call whose operands would be memrefs. basically, we are just looking for all the memref producers

I am a bit confused by this statement. Store doesn't produce memref. There are so many other dialects/ops producing memrefs. Why we only care load/store/call op? What exact analysis and transformation we do in this pass for these three ops?

getMemRef() is a very generic name, and the implementation just takes three op categories. I think it should be modified to be a narrower name exactly describing what it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is to find the use of memref (produced or consumed) within the device kernel, whose memory buffers are likely being prepared by outside the device kernel. That's why it looks for load/store.

For call, the parameter may or may not be defined outside the device kernel. I guess they are collected and then verified later.

}
return std::move(ret);
} else {
op->emitError("Uhhandled mem op in gpu region");
Copy link
Contributor

@Jianhui-Li Jianhui-Li Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "emitError" mean? Is this really an error?

// Traverse through all the memory access ops under GPU launch Op
// and add device memory allocation appropriately.
if (func.walk([&](mlir::Operation *op) {
if (!op->getParentOfType<mlir::gpu::LaunchOp>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a significant limitation. Any plan to support gpu.launch_func? Any code or design changes needed to support gpu.launch_func?

Copy link
Contributor Author

@nbpatel nbpatel Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would it be any different after the kernels are outlined? currently this pass runs before the kernels get outlined. The drawback I see is making sure the gpu level tests have to have the gpu.launch blocks ... format...
maybe I am missing something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would it be any different after the kernels are outlined?
I think that may involve cross device func and host code analysis, since the memref used inside device kernel needs to be gpu.alloc in the host code. I would expect the code needs to handle launch_func. I don't know how important it is so far for our use cases, but I think we need to tell user upfront that we are not handling gpu.launch_func.

currently this pass runs before the kernels get outlined.
Maybe you can put this in the pass introduction. So user knows the limitation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass was specifically written to be run before kernel outlining, it is much harder to work on outlined kernel and generally not worth it.

continue;
if (mlir::isa<mlir::memref::AllocOp>(op)) {
gpuBufferAllocs.insert({op, {}});
} else if (mlir::isa<mlir::func::CallOp>(op)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is call op treated like scfdialect? If so, maybe list that three conditions together.

gpuGetMemrefGlobalParams.insert({op, {}});
continue;
}
if (op->getDialect() == scfDialect ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the purpose of making these two conditions parallel? Seems unrelated to each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why only scfDialect? There are so many other dialects. like linalg/Affine loop could produce memref also?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass should be run after all linalg/affine ops were translated to scf, supporting them is too much work without any actual benefits.

auto param = block.getArgument(it.first);
it.second = getAccessType(param);

it.second.hostRead = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that the function always assumes the input/output is in the host side? This is not TRUE for XLA usage, where the caller controls the runtime resource, so the input/output passed would be in device side for most of time.

Are there a use case where there are two consecutive function calls, the first one produce a buffer for the second one to use, can the intermediate buffer stay in the device memory? How the current code would be restructured to support that need?

I don't mean that this PR needs to fix these limitations. But would be useful if we list these considerations as future development opportunities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, the function assumes that the input/output is in host side. we dont have a test case of the above mentioned case. maybe in the future I will do a subsequent PR to address this?

alloc->replaceAllUsesWith(gpuAlloc);
alloc.erase();
if (access.hostRead || access.hostWrite)
gpuAlloc->setAttr(imex::getAllocSharedAttrName(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why in this case, there is no deallocop, like in the parameter and global case? "builder.createmlir::gpu::DeallocOp(loc, llvm::None, allocResult);"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a func first alloc a temp memory and then release it? We only replace the memref.alloc, not memref.dealloc?
Don't we need negative test cases also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I will add the code for adding the gpu.dealloc in this case

if (op->getDialect() == scfDialect ||
mlir::isa<mlir::ViewLikeOpInterface>(op))
continue;
if (mlir::isa<mlir::memref::AllocOp>(op)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not handled as of now in this PR, only memref::allocOp is handled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these cases important to handle? Or we choose to tell users upfront that we are not handling this? I would prefer the functionality to cover broad cases if the efforts are reasonable.

We don't want to find out these corner cases. We should either implement them or documents them.

It is fine for this PR not to implementation. But would be good to know the limitation and the plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a limitations section in the doc description

builder.setInsertionPoint(alloc);
auto gpuAlloc = builder.create<mlir::gpu::AllocOp>(
loc, alloc.getType(), /*asyncToken*/ nullptr,
/*asyncDependencies*/ llvm::None, alloc.dynamicSizes(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the getType and size are different than the global and parameters?

  for (auto i : llvm::seq(0u, rank)) {
    if (memrefType.isDynamicDim(i)) {
      auto op = builder.create<mlir::memref::DimOp>(loc, param, i);
      dims.push_back(op);
      filter.insert(op);
    }
  }
  auto allocType = mlir::MemRefType::get(
      memrefType.getShape(), memrefType.getElementType(),
      mlir::MemRefLayoutAttrInterface{}, memrefType.getMemorySpace());
  auto gpuAlloc = builder.create<mlir::gpu::AllocOp>(
      loc, allocType, /*asyncToken*/ nullptr,
      /*asyncDependencies*/ llvm::None, dims,
      /*symbolOperands*/ llvm::None);

Copy link
Contributor Author

@nbpatel nbpatel Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because here you directly have a memref alloc to convert to gpu alloc and in global and params case you dont have it, you add the allocs and subsequent mem copy's

// This is the case where the inputs are passed as arguments to the
// function. This code will add the IR for memeory allocation on the device
// with gpu.alloc and insert a memref.copy from host to device
for (auto it : gpuBufferParams) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for global and parameter to share code as much as possible if most of handling are same?


The InsertGpuAllocs pass, as the name suggests, inserts the gpu allocs in the IR. Memref alloc is an operation in the memref dialect that can be used to allocate the memory on the host side and or on the device side. The MLIR IR is a mix of host and device code.
To distinguish between host side memory allocation and device side memory allocation, we convert all the memref.allocs that refer to device (gpu) side memory allocations and references, into gpu.alloc, which is an operation of the upstream GPU dialect. This distinction helps in lowering to llvm and calling the appropriate memory allocation operation at runtime.
The pass traverses all the memref (load/store) operations inside the gpu launch op in the IR and checks for its aliases and its defining op. If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving this last line into the file with the implementation. This is good example of the kind of high-level description of the flow of the implementation which I suggest to have in all files. A "user" of this pass does not care about such details, but the reader of the code greatly benefits from it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the limitations should be documented here. We don't want user to find it out during debugging process.

If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device.

Also for this sentence, I wonder how the upstream vulkun runner work, since the upstream MLIR doesn't have this Pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return
}
```

Copy link
Contributor

@fschlimb fschlimb Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples are great. Can you provide a smaller example? Much of this is not really needed to show what's happening and are rather distracting.

Copy link
Contributor

@Jianhui-Li Jianhui-Li Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any ways to remove or reduce these lines?
%3 = memref.cast %0 : memref<8xf32> to memref<?xf32>
%4 = memref.cast %1 : memref<8xf32> to memref<?xf32>
%5 = memref.cast %2 : memref<8xf32> to memref<?xf32>
call @fillResource1DFloat(%3, %cst_0) : (memref<?xf32>, f32) -> ()
call @fillResource1DFloat(%4, %cst) : (memref<?xf32>, f32) -> ()
call @fillResource1DFloat(%5, %cst_1) : (memref<?xf32>, f32) -> ()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed



The SetSPIRVAbiAttribute pass, adds a kernel attribute called spv.entry_point_abi to the kernel function. Since SPIR-V programs themselves are not enough for running workloads on GPU; a companion host application is needed to manage the resources referenced by SPIR-V programs and dispatch the workload. It is also quite possible that both those programs are written by different frond-end languages.Hence the need to add the entry point abi.
spv.entry_point_abi is a struct attribute that should be attached to the entry function. Some of the lowering passes expect this attribute to perform the lowering.
Copy link
Contributor

@fschlimb fschlimb Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand how this can be useful if all it does is adding a constant attribute. If it gets added to all entry functions, how does it make a difference. I guess I am missing something fundamental.


mlir::StringRef getAllocSharedAttrName() { return "gpu.alloc_shared"; }

struct InsertGPUAllocs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some summary in doxygen style would auto-generate docs.

return false;
};

// Traverse through all the memory access ops under GPU launch Op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you asked: until here, there is no comment or anything saying what happens in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Some reasonable comments are needed. Usually the questions from readers/reviewers are good hints where a comment might be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more comments in the code


// GetMemrefGlobal Op Case:
// This is the case where the inputs are globals contants and accessed using
// memref.get_global op. This code will add the IR for memeory allocation on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: memory

@@ -0,0 +1,61 @@
//===- SetSPIRVCapabalities.cpp - SetSPIRVCapabalities Pass -------*- C++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filename has a typo

struct SetSPIRVCapabilitiesPass
: public mlir::PassWrapper<SetSPIRVCapabilitiesPass,
mlir::OperationPass<mlir::ModuleOp>> {
void runOnOperation() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intel-specific? If so, shouldn't this be called accordingly?


module attributes {gpu.container_module} {

// CHECK: module attributes {gpu.container_module, spv.target_env = #spv.target_env<#spv.vce<v1.0, [Addresses, Float16Buffer, Int64, Int16, Int8, Kernel, Linkage, Vector16, GenericPointer, Groups, Float16, Float64, AtomicFloat32AddEXT, ExpectAssumeKHR], [SPV_EXT_shader_atomic_float_add, SPV_KHR_expect_assume]>, #spv.resource_limits<>>} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why this test is so big? Most of the code here is not checked, so why is it needed?
It might be worthwhile generally reducing the suggested tests. Minimal tests are easier to maintain and understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -8,4 +8,4 @@ set(LIBS
add_llvm_executable(imex-opt imex-opt.cpp)

llvm_update_compile_flags(imex-opt)
target_link_libraries(imex-opt PRIVATE ${LIBS})
target_link_libraries(imex-opt PRIVATE ${LIBS} IMEXTransforms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is IMEXTransforms not in ${LIBS}. It should be.


The InsertGpuAllocs pass, as the name suggests, inserts the gpu allocs in the IR. Memref alloc is an operation in the memref dialect that can be used to allocate the memory on the host side and or on the device side. The MLIR IR is a mix of host and device code.
To distinguish between host side memory allocation and device side memory allocation, we convert all the memref.allocs that refer to device (gpu) side memory allocations and references, into gpu.alloc, which is an operation of the upstream GPU dialect. This distinction helps in lowering to llvm and calling the appropriate memory allocation operation at runtime.
The pass traverses all the memref (load/store) operations inside the gpu launch op in the IR and checks for its aliases and its defining op. If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the limitations should be documented here. We don't want user to find it out during debugging process.

If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device.

Also for this sentence, I wonder how the upstream vulkun runner work, since the upstream MLIR doesn't have this Pass.

return {{store.memref()}};
} else if (auto call = mlir::dyn_cast<mlir::func::CallOp>(op)) {
mlir::SmallVector<mlir::Value, 4> ret;
for (auto arg : call.operands()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because there might be a func.call whose operands would be memrefs. basically, we are just looking for all the memref producers

I am a bit confused by this statement. Store doesn't produce memref. There are so many other dialects/ops producing memrefs. Why we only care load/store/call op? What exact analysis and transformation we do in this pass for these three ops?

getMemRef() is a very generic name, and the implementation just takes three op categories. I think it should be modified to be a narrower name exactly describing what it is.

return false;
};

// Traverse through all the memory access ops under GPU launch Op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Some reasonable comments are needed. Usually the questions from readers/reviewers are good hints where a comment might be needed.

// Traverse through all the memory access ops under GPU launch Op
// and add device memory allocation appropriately.
if (func.walk([&](mlir::Operation *op) {
if (!op->getParentOfType<mlir::gpu::LaunchOp>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would it be any different after the kernels are outlined?
I think that may involve cross device func and host code analysis, since the memref used inside device kernel needs to be gpu.alloc in the host code. I would expect the code needs to handle launch_func. I don't know how important it is so far for our use cases, but I think we need to tell user upfront that we are not handling gpu.launch_func.

currently this pass runs before the kernels get outlined.
Maybe you can put this in the pass introduction. So user knows the limitation.

return mlir::WalkResult::interrupt();
}

} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the gpu params case,
Here you may want to put an assert statement to assert your assumption.

The logic is not straightforward -

Here you assume that if alias.getDefiningOp() do return a null pointer, then the assumption is that alias can give Parentblocks where you can find the arguments. What if the loop body has nested loop inside? Do you write test case to verify?

if (op->getDialect() == scfDialect ||
mlir::isa<mlir::ViewLikeOpInterface>(op))
continue;
if (mlir::isa<mlir::memref::AllocOp>(op)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these cases important to handle? Or we choose to tell users upfront that we are not handling this? I would prefer the functionality to cover broad cases if the efforts are reasonable.

We don't want to find out these corner cases. We should either implement them or documents them.

It is fine for this PR not to implementation. But would be good to know the limitation and the plan.

return mlir::WalkResult::advance();
})
.wasInterrupted()) {
signalPassFailure();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pass work with Vulkan runner?

return {{store.memref()}};
} else if (auto call = mlir::dyn_cast<mlir::func::CallOp>(op)) {
mlir::SmallVector<mlir::Value, 4> ret;
for (auto arg : call.operands()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is to find the use of memref (produced or consumed) within the device kernel, whose memory buffers are likely being prepared by outside the device kernel. That's why it looks for load/store.

For call, the parameter may or may not be defined outside the device kernel. I guess they are collected and then verified later.

if (mlir::isa<mlir::func::CallOp>(user)) {
bool onDevice = user->getParentOfType<mlir::gpu::LaunchOp>();
(onDevice ? ret.deviceRead : ret.hostRead) = true;
(onDevice ? ret.deviceWrite : ret.hostWrite) = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use for "deviceRead " and "deviceWrite"? Do the code do any optimization for these tags?

auto rank = static_cast<unsigned>(memrefType.getRank());
filter.clear();
dims.clear();
for (auto i : llvm::seq(0u, rank)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cover shape with unknown rank, unkown dim, and known rank/dim? Do you have test cases for those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works with known rank and unknown dims and obviously with static shapes ...not other cases..I will add a test case for dynamic dim case

Comment on lines 17 to 40
#include <llvm/ADT/SmallBitVector.h>
#include <memory>
#include <mlir/Analysis/BufferViewFlowAnalysis.h>
#include <mlir/Conversion/ArithmeticToSPIRV/ArithmeticToSPIRV.h>
#include <mlir/Conversion/ControlFlowToSPIRV/ControlFlowToSPIRV.h>
#include <mlir/Conversion/FuncToSPIRV/FuncToSPIRV.h>
#include <mlir/Conversion/GPUToSPIRV/GPUToSPIRV.h>
#include <mlir/Conversion/MathToSPIRV/MathToSPIRV.h>
#include <mlir/Conversion/SCFToSPIRV/SCFToSPIRV.h>
#include <mlir/Dialect/Affine/IR/AffineOps.h>
#include <mlir/Dialect/ControlFlow/IR/ControlFlowOps.h>
#include <mlir/Dialect/Func/IR/FuncOps.h>
#include <mlir/Dialect/GPU/Transforms/ParallelLoopMapper.h>
#include <mlir/Dialect/GPU/Transforms/Passes.h>
#include <mlir/Dialect/MemRef/IR/MemRef.h>
#include <mlir/Dialect/SCF/IR/SCF.h>
#include <mlir/Dialect/SPIRV/IR/SPIRVDialect.h>
#include <mlir/Dialect/SPIRV/IR/SPIRVOps.h>
#include <mlir/Dialect/SPIRV/IR/TargetAndABI.h>
#include <mlir/Dialect/SPIRV/Transforms/SPIRVConversion.h>
#include <mlir/Pass/Pass.h>
#include <mlir/Target/SPIRV/Serialization.h>
#include <mlir/Transforms/DialectConversion.h>
#include <mlir/Transforms/GreedyPatternRewriteDriver.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better to move all these includes in cpp file to limit the inclusion scope..

if (!isMemReadWriteOp(op))
return mlir::WalkResult::advance();

auto memref = getMemReadWriteOp(op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMemReadWriteOp() sounds like returning an OP not memref.

@nbpatel nbpatel merged commit e5fdf12 into refactor Sep 15, 2022
@nbpatel nbpatel deleted the nishant_add_passes branch October 4, 2022 21:35
Hardcode84 pushed a commit to Hardcode84/mlir-extensions that referenced this pull request Feb 19, 2024
…sable workflows (intel#310)

* Fixed Nightly CI workflow

Signed-off-by: Gregory Shimansky <[email protected]>

* Added secrets inherit when calling reusable workflows

Signed-off-by: Gregory Shimansky <[email protected]>

* Added missing rependency for sending report

Signed-off-by: Gregory Shimansky <[email protected]>

* Fixed conditional operator

Signed-off-by: Gregory Shimansky <[email protected]>

* Fixed runs-on for generate_report

Signed-off-by: Gregory Shimansky <[email protected]>

* Removed report because it cannot be revived

Signed-off-by: Gregory Shimansky <[email protected]>

* Execute cpu and gpu builds in parallel

Signed-off-by: Gregory Shimansky <[email protected]>

---------

Signed-off-by: Gregory Shimansky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants