-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port InsertGpuAlloc, SetSPIRVCapabilities & SetAbiAttrPass from main #310
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but we need all these tests in main first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, we are getting some real functionality into refactor! God to see comprehensive (user-)docs and tests!
We should follow the upstream conventions and have the docu in the .td files (yes, hey have some separate files in docs, but that's the exception). If we follow this process we get the standard layout, formatting and a TOC maintained for free.
One reason for the refactor was to make sure that all code is in a state which allows non-authors to easily understand it. The code here has almost no comments, that's not good. I know this is how it is on main. However here, at least a summary of the functional flow in each file as well as of each (non-trivial) function should be added. I don't think a meaningful review can be done without.
@nbpatel if you install the "precommit" to your local repository (from where you push) - as mentioned in the RADME - you will get most of the formatting issues reported by pre-commit-check automatically corrected during your commit there. If you have issues using it, we should fix it (you might raise an issue for that). |
@@ -38,3 +39,22 @@ int main(int argc, char **argv) { | |||
return ::mlir::asMainReturnCode( | |||
::mlir::MlirOptMain(argc, argv, "Imex optimizer driver\n", registry)); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better if registration could be done through "::imex::registerAllPasses()" automatically by using files like Passes.td Passes.h but could be done in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the conversions already go into AllPasses thingy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
return {{store.memref()}}; | ||
} else if (auto call = mlir::dyn_cast<mlir::func::CallOp>(op)) { | ||
mlir::SmallVector<mlir::Value, 4> ret; | ||
for (auto arg : call.operands()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason we care call op's operands, not others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because there might be a func.call whose operands would be memrefs. basically we are just looking for all the memref producers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something you could add as a comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because there might be a func.call whose operands would be memrefs. basically, we are just looking for all the memref producers
I am a bit confused by this statement. Store doesn't produce memref. There are so many other dialects/ops producing memrefs. Why we only care load/store/call op? What exact analysis and transformation we do in this pass for these three ops?
getMemRef() is a very generic name, and the implementation just takes three op categories. I think it should be modified to be a narrower name exactly describing what it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is to find the use of memref (produced or consumed) within the device kernel, whose memory buffers are likely being prepared by outside the device kernel. That's why it looks for load/store.
For call, the parameter may or may not be defined outside the device kernel. I guess they are collected and then verified later.
} | ||
return std::move(ret); | ||
} else { | ||
op->emitError("Uhhandled mem op in gpu region"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "emitError" mean? Is this really an error?
// Traverse through all the memory access ops under GPU launch Op | ||
// and add device memory allocation appropriately. | ||
if (func.walk([&](mlir::Operation *op) { | ||
if (!op->getParentOfType<mlir::gpu::LaunchOp>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a significant limitation. Any plan to support gpu.launch_func? Any code or design changes needed to support gpu.launch_func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would it be any different after the kernels are outlined? currently this pass runs before the kernels get outlined. The drawback I see is making sure the gpu level tests have to have the gpu.launch blocks ... format...
maybe I am missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would it be any different after the kernels are outlined?
I think that may involve cross device func and host code analysis, since the memref used inside device kernel needs to be gpu.alloc in the host code. I would expect the code needs to handle launch_func. I don't know how important it is so far for our use cases, but I think we need to tell user upfront that we are not handling gpu.launch_func.
currently this pass runs before the kernels get outlined.
Maybe you can put this in the pass introduction. So user knows the limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pass was specifically written to be run before kernel outlining, it is much harder to work on outlined kernel and generally not worth it.
lib/Transforms/InsertGpuAllocs.cpp
Outdated
continue; | ||
if (mlir::isa<mlir::memref::AllocOp>(op)) { | ||
gpuBufferAllocs.insert({op, {}}); | ||
} else if (mlir::isa<mlir::func::CallOp>(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is call op treated like scfdialect? If so, maybe list that three conditions together.
lib/Transforms/InsertGpuAllocs.cpp
Outdated
gpuGetMemrefGlobalParams.insert({op, {}}); | ||
continue; | ||
} | ||
if (op->getDialect() == scfDialect || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the purpose of making these two conditions parallel? Seems unrelated to each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why only scfDialect? There are so many other dialects. like linalg/Affine loop could produce memref also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pass should be run after all linalg/affine ops were translated to scf, supporting them is too much work without any actual benefits.
lib/Transforms/InsertGpuAllocs.cpp
Outdated
auto param = block.getArgument(it.first); | ||
it.second = getAccessType(param); | ||
|
||
it.second.hostRead = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the function always assumes the input/output is in the host side? This is not TRUE for XLA usage, where the caller controls the runtime resource, so the input/output passed would be in device side for most of time.
Are there a use case where there are two consecutive function calls, the first one produce a buffer for the second one to use, can the intermediate buffer stay in the device memory? How the current code would be restructured to support that need?
I don't mean that this PR needs to fix these limitations. But would be useful if we list these considerations as future development opportunities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right, the function assumes that the input/output is in host side. we dont have a test case of the above mentioned case. maybe in the future I will do a subsequent PR to address this?
alloc->replaceAllUsesWith(gpuAlloc); | ||
alloc.erase(); | ||
if (access.hostRead || access.hostWrite) | ||
gpuAlloc->setAttr(imex::getAllocSharedAttrName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why in this case, there is no deallocop, like in the parameter and global case? "builder.createmlir::gpu::DeallocOp(loc, llvm::None, allocResult);"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a func first alloc a temp memory and then release it? We only replace the memref.alloc, not memref.dealloc?
Don't we need negative test cases also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I will add the code for adding the gpu.dealloc in this case
if (op->getDialect() == scfDialect || | ||
mlir::isa<mlir::ViewLikeOpInterface>(op)) | ||
continue; | ||
if (mlir::isa<mlir::memref::AllocOp>(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not handled as of now in this PR, only memref::allocOp is handled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these cases important to handle? Or we choose to tell users upfront that we are not handling this? I would prefer the functionality to cover broad cases if the efforts are reasonable.
We don't want to find out these corner cases. We should either implement them or documents them.
It is fine for this PR not to implementation. But would be good to know the limitation and the plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a limitations section in the doc description
builder.setInsertionPoint(alloc); | ||
auto gpuAlloc = builder.create<mlir::gpu::AllocOp>( | ||
loc, alloc.getType(), /*asyncToken*/ nullptr, | ||
/*asyncDependencies*/ llvm::None, alloc.dynamicSizes(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the getType and size are different than the global and parameters?
for (auto i : llvm::seq(0u, rank)) {
if (memrefType.isDynamicDim(i)) {
auto op = builder.create<mlir::memref::DimOp>(loc, param, i);
dims.push_back(op);
filter.insert(op);
}
}
auto allocType = mlir::MemRefType::get(
memrefType.getShape(), memrefType.getElementType(),
mlir::MemRefLayoutAttrInterface{}, memrefType.getMemorySpace());
auto gpuAlloc = builder.create<mlir::gpu::AllocOp>(
loc, allocType, /*asyncToken*/ nullptr,
/*asyncDependencies*/ llvm::None, dims,
/*symbolOperands*/ llvm::None);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because here you directly have a memref alloc to convert to gpu alloc and in global and params case you dont have it, you add the allocs and subsequent mem copy's
lib/Transforms/InsertGpuAllocs.cpp
Outdated
// This is the case where the inputs are passed as arguments to the | ||
// function. This code will add the IR for memeory allocation on the device | ||
// with gpu.alloc and insert a memref.copy from host to device | ||
for (auto it : gpuBufferParams) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for global and parameter to share code as much as possible if most of handling are same?
|
||
The InsertGpuAllocs pass, as the name suggests, inserts the gpu allocs in the IR. Memref alloc is an operation in the memref dialect that can be used to allocate the memory on the host side and or on the device side. The MLIR IR is a mix of host and device code. | ||
To distinguish between host side memory allocation and device side memory allocation, we convert all the memref.allocs that refer to device (gpu) side memory allocations and references, into gpu.alloc, which is an operation of the upstream GPU dialect. This distinction helps in lowering to llvm and calling the appropriate memory allocation operation at runtime. | ||
The pass traverses all the memref (load/store) operations inside the gpu launch op in the IR and checks for its aliases and its defining op. If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving this last line into the file with the implementation. This is good example of the kind of high-level description of the flow of the implementation which I suggest to have in all files. A "user" of this pass does not care about such details, but the reader of the code greatly benefits from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the limitations should be documented here. We don't want user to find it out during debugging process.
If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device.
Also for this sentence, I wonder how the upstream vulkun runner work, since the upstream MLIR doesn't have this Pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return | ||
} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples are great. Can you provide a smaller example? Much of this is not really needed to show what's happening and are rather distracting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any ways to remove or reduce these lines?
%3 = memref.cast %0 : memref<8xf32> to memref<?xf32>
%4 = memref.cast %1 : memref<8xf32> to memref<?xf32>
%5 = memref.cast %2 : memref<8xf32> to memref<?xf32>
call @fillResource1DFloat(%3, %cst_0) : (memref<?xf32>, f32) -> ()
call @fillResource1DFloat(%4, %cst) : (memref<?xf32>, f32) -> ()
call @fillResource1DFloat(%5, %cst_1) : (memref<?xf32>, f32) -> ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
|
||
|
||
The SetSPIRVAbiAttribute pass, adds a kernel attribute called spv.entry_point_abi to the kernel function. Since SPIR-V programs themselves are not enough for running workloads on GPU; a companion host application is needed to manage the resources referenced by SPIR-V programs and dispatch the workload. It is also quite possible that both those programs are written by different frond-end languages.Hence the need to add the entry point abi. | ||
spv.entry_point_abi is a struct attribute that should be attached to the entry function. Some of the lowering passes expect this attribute to perform the lowering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand how this can be useful if all it does is adding a constant attribute. If it gets added to all entry functions, how does it make a difference. I guess I am missing something fundamental.
|
||
mlir::StringRef getAllocSharedAttrName() { return "gpu.alloc_shared"; } | ||
|
||
struct InsertGPUAllocs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some summary in doxygen style would auto-generate docs.
return false; | ||
}; | ||
|
||
// Traverse through all the memory access ops under GPU launch Op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you asked: until here, there is no comment or anything saying what happens in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Some reasonable comments are needed. Usually the questions from readers/reviewers are good hints where a comment might be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added more comments in the code
lib/Transforms/InsertGpuAllocs.cpp
Outdated
|
||
// GetMemrefGlobal Op Case: | ||
// This is the case where the inputs are globals contants and accessed using | ||
// memref.get_global op. This code will add the IR for memeory allocation on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: memory
@@ -0,0 +1,61 @@ | |||
//===- SetSPIRVCapabalities.cpp - SetSPIRVCapabalities Pass -------*- C++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename has a typo
struct SetSPIRVCapabilitiesPass | ||
: public mlir::PassWrapper<SetSPIRVCapabilitiesPass, | ||
mlir::OperationPass<mlir::ModuleOp>> { | ||
void runOnOperation() override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intel-specific? If so, shouldn't this be called accordingly?
|
||
module attributes {gpu.container_module} { | ||
|
||
// CHECK: module attributes {gpu.container_module, spv.target_env = #spv.target_env<#spv.vce<v1.0, [Addresses, Float16Buffer, Int64, Int16, Int8, Kernel, Linkage, Vector16, GenericPointer, Groups, Float16, Float64, AtomicFloat32AddEXT, ExpectAssumeKHR], [SPV_EXT_shader_atomic_float_add, SPV_KHR_expect_assume]>, #spv.resource_limits<>>} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason why this test is so big? Most of the code here is not checked, so why is it needed?
It might be worthwhile generally reducing the suggested tests. Minimal tests are easier to maintain and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
@@ -8,4 +8,4 @@ set(LIBS | |||
add_llvm_executable(imex-opt imex-opt.cpp) | |||
|
|||
llvm_update_compile_flags(imex-opt) | |||
target_link_libraries(imex-opt PRIVATE ${LIBS}) | |||
target_link_libraries(imex-opt PRIVATE ${LIBS} IMEXTransforms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is IMEXTransforms not in ${LIBS}. It should be.
|
||
The InsertGpuAllocs pass, as the name suggests, inserts the gpu allocs in the IR. Memref alloc is an operation in the memref dialect that can be used to allocate the memory on the host side and or on the device side. The MLIR IR is a mix of host and device code. | ||
To distinguish between host side memory allocation and device side memory allocation, we convert all the memref.allocs that refer to device (gpu) side memory allocations and references, into gpu.alloc, which is an operation of the upstream GPU dialect. This distinction helps in lowering to llvm and calling the appropriate memory allocation operation at runtime. | ||
The pass traverses all the memref (load/store) operations inside the gpu launch op in the IR and checks for its aliases and its defining op. If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the limitations should be documented here. We don't want user to find it out during debugging process.
If the defining op is a memref.alloc op it replaces that op in the IR with gpu.alloc op, because all the operations under the gpu.launch op are device side computations and will execute on the device.
Also for this sentence, I wonder how the upstream vulkun runner work, since the upstream MLIR doesn't have this Pass.
return {{store.memref()}}; | ||
} else if (auto call = mlir::dyn_cast<mlir::func::CallOp>(op)) { | ||
mlir::SmallVector<mlir::Value, 4> ret; | ||
for (auto arg : call.operands()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because there might be a func.call whose operands would be memrefs. basically, we are just looking for all the memref producers
I am a bit confused by this statement. Store doesn't produce memref. There are so many other dialects/ops producing memrefs. Why we only care load/store/call op? What exact analysis and transformation we do in this pass for these three ops?
getMemRef() is a very generic name, and the implementation just takes three op categories. I think it should be modified to be a narrower name exactly describing what it is.
return false; | ||
}; | ||
|
||
// Traverse through all the memory access ops under GPU launch Op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Some reasonable comments are needed. Usually the questions from readers/reviewers are good hints where a comment might be needed.
// Traverse through all the memory access ops under GPU launch Op | ||
// and add device memory allocation appropriately. | ||
if (func.walk([&](mlir::Operation *op) { | ||
if (!op->getParentOfType<mlir::gpu::LaunchOp>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would it be any different after the kernels are outlined?
I think that may involve cross device func and host code analysis, since the memref used inside device kernel needs to be gpu.alloc in the host code. I would expect the code needs to handle launch_func. I don't know how important it is so far for our use cases, but I think we need to tell user upfront that we are not handling gpu.launch_func.
currently this pass runs before the kernels get outlined.
Maybe you can put this in the pass introduction. So user knows the limitation.
return mlir::WalkResult::interrupt(); | ||
} | ||
|
||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the gpu params case,
Here you may want to put an assert statement to assert your assumption.
The logic is not straightforward -
Here you assume that if alias.getDefiningOp() do return a null pointer, then the assumption is that alias can give Parentblocks where you can find the arguments. What if the loop body has nested loop inside? Do you write test case to verify?
if (op->getDialect() == scfDialect || | ||
mlir::isa<mlir::ViewLikeOpInterface>(op)) | ||
continue; | ||
if (mlir::isa<mlir::memref::AllocOp>(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these cases important to handle? Or we choose to tell users upfront that we are not handling this? I would prefer the functionality to cover broad cases if the efforts are reasonable.
We don't want to find out these corner cases. We should either implement them or documents them.
It is fine for this PR not to implementation. But would be good to know the limitation and the plan.
return mlir::WalkResult::advance(); | ||
}) | ||
.wasInterrupted()) { | ||
signalPassFailure(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this pass work with Vulkan runner?
return {{store.memref()}}; | ||
} else if (auto call = mlir::dyn_cast<mlir::func::CallOp>(op)) { | ||
mlir::SmallVector<mlir::Value, 4> ret; | ||
for (auto arg : call.operands()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is to find the use of memref (produced or consumed) within the device kernel, whose memory buffers are likely being prepared by outside the device kernel. That's why it looks for load/store.
For call, the parameter may or may not be defined outside the device kernel. I guess they are collected and then verified later.
if (mlir::isa<mlir::func::CallOp>(user)) { | ||
bool onDevice = user->getParentOfType<mlir::gpu::LaunchOp>(); | ||
(onDevice ? ret.deviceRead : ret.hostRead) = true; | ||
(onDevice ? ret.deviceWrite : ret.hostWrite) = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use for "deviceRead " and "deviceWrite"? Do the code do any optimization for these tags?
lib/Transforms/InsertGpuAllocs.cpp
Outdated
auto rank = static_cast<unsigned>(memrefType.getRank()); | ||
filter.clear(); | ||
dims.clear(); | ||
for (auto i : llvm::seq(0u, rank)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cover shape with unknown rank, unkown dim, and known rank/dim? Do you have test cases for those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works with known rank and unknown dims and obviously with static shapes ...not other cases..I will add a test case for dynamic dim case
eb25c93
to
73e655b
Compare
include/imex/Transforms/Transforms.h
Outdated
#include <llvm/ADT/SmallBitVector.h> | ||
#include <memory> | ||
#include <mlir/Analysis/BufferViewFlowAnalysis.h> | ||
#include <mlir/Conversion/ArithmeticToSPIRV/ArithmeticToSPIRV.h> | ||
#include <mlir/Conversion/ControlFlowToSPIRV/ControlFlowToSPIRV.h> | ||
#include <mlir/Conversion/FuncToSPIRV/FuncToSPIRV.h> | ||
#include <mlir/Conversion/GPUToSPIRV/GPUToSPIRV.h> | ||
#include <mlir/Conversion/MathToSPIRV/MathToSPIRV.h> | ||
#include <mlir/Conversion/SCFToSPIRV/SCFToSPIRV.h> | ||
#include <mlir/Dialect/Affine/IR/AffineOps.h> | ||
#include <mlir/Dialect/ControlFlow/IR/ControlFlowOps.h> | ||
#include <mlir/Dialect/Func/IR/FuncOps.h> | ||
#include <mlir/Dialect/GPU/Transforms/ParallelLoopMapper.h> | ||
#include <mlir/Dialect/GPU/Transforms/Passes.h> | ||
#include <mlir/Dialect/MemRef/IR/MemRef.h> | ||
#include <mlir/Dialect/SCF/IR/SCF.h> | ||
#include <mlir/Dialect/SPIRV/IR/SPIRVDialect.h> | ||
#include <mlir/Dialect/SPIRV/IR/SPIRVOps.h> | ||
#include <mlir/Dialect/SPIRV/IR/TargetAndABI.h> | ||
#include <mlir/Dialect/SPIRV/Transforms/SPIRVConversion.h> | ||
#include <mlir/Pass/Pass.h> | ||
#include <mlir/Target/SPIRV/Serialization.h> | ||
#include <mlir/Transforms/DialectConversion.h> | ||
#include <mlir/Transforms/GreedyPatternRewriteDriver.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better to move all these includes in cpp file to limit the inclusion scope..
if (!isMemReadWriteOp(op)) | ||
return mlir::WalkResult::advance(); | ||
|
||
auto memref = getMemReadWriteOp(op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getMemReadWriteOp() sounds like returning an OP not memref.
dd9573f
to
7acb4d1
Compare
…sable workflows (intel#310) * Fixed Nightly CI workflow Signed-off-by: Gregory Shimansky <[email protected]> * Added secrets inherit when calling reusable workflows Signed-off-by: Gregory Shimansky <[email protected]> * Added missing rependency for sending report Signed-off-by: Gregory Shimansky <[email protected]> * Fixed conditional operator Signed-off-by: Gregory Shimansky <[email protected]> * Fixed runs-on for generate_report Signed-off-by: Gregory Shimansky <[email protected]> * Removed report because it cannot be revived Signed-off-by: Gregory Shimansky <[email protected]> * Execute cpu and gpu builds in parallel Signed-off-by: Gregory Shimansky <[email protected]> --------- Signed-off-by: Gregory Shimansky <[email protected]>
Please review these guidelines to help with the review process:
This PR ports the InsertGpuAllocs, SetSPIRVCapabilities & SetAbiAttrPass from main branch. The passes have test cases and a doc attached to them.