Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Support pushing custom ops through backend-contract using torch.operator #1959

Closed

Conversation

makslevental
Copy link
Collaborator

@makslevental makslevental commented Mar 20, 2023

Pitch

I think lots of people want to be able to push opaque kernels through backend contract (#1519, #1947, #1514). Indeed, there is also a similarly themed proposal that goes through backend-legal-ops (and is specific to TOSA). I believe that there's also a need for supporting custom ops, i.e., ops that are effectively placeholders for some possible lowering/implementation on the otherside of backend contract (posterchild here is quantized ops).

Approach

The approach here is an adaptation of the other approach - we go through backend-legal-ops using torch.operator. The catch is that because OperandOp doesn't possess the HasValueSemantics trait, ReduceOpVariants and MaximizeValueSemantics will stumble and ultimately backend contract won't be satisfied. Thus, we add HasValueSemantics to torch.operator. In addition, we extend wrapWithCalculateOpIfLibraryFunctionAvailable to support user-provided shape and dtype functions; in particular we provide these two (trivial) refinement functions:

func.func @__torch_mlir_shape_fn.operator.goofy.identity(%arg0: !torch.list<int>) -> !torch.list<int> {
  return %arg0 : !torch.list<int>
}
func.func @__torch_mlir_dtype_fn.operator.goofy.identity(%arg0: !torch.int, %arg1: !torch.int) -> !torch.int {
  return %arg1 : !torch.int
}

With the example/demo here I am able fully lower to

module attributes {torch.debug_module_name = "CustomOpExampleModule"} {
  func.func @forward(%arg0: !torch.vtensor<[3,4],f32>) -> !torch.vtensor<[3,4],f32> {
    %int2 = torch.constant.int 2
    %0 = torch.aten.mul.Scalar %arg0, %int2 : !torch.vtensor<[3,4],f32>, !torch.int -> !torch.vtensor<[3,4],f32>
    %1 = torch.operator "goofy.identity"(%0) {has_value_semantics = true} : (!torch.vtensor<[3,4],f32>) -> !torch.vtensor<[3,4],f32>
    return %1 : !torch.vtensor<[3,4],f32>
  }
}

Certainly there are other ways to do this, so I'm open to suggestions/advice.

cc @powderluv @qedawkins @AmosLewis

@makslevental makslevental changed the title RFC: Support custom ops using torch.operator RFC: Support pushing custom ops through backend-contract using torch.operator Mar 20, 2023
Comment on lines +13 to +15
goofy_lib = torch.library.Library("goofy", "DEF")
goofy_lib.define("identity(Tensor t) -> Tensor")
goofy_lib.impl("identity", identity)
Copy link
Collaborator Author

@makslevental makslevental Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as the "classical" torch custom op registration; this torch.jit.traces to

graph(%self : __torch__.CustomOpExampleModule,
      %a : Float(3, 4, strides=[4, 1], requires_grad=0, device=cpu)):
  %4 : Long(requires_grad=0, device=cpu) = prim::Constant[value={2}]() 
  %5 : Float(3, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::mul(%a, %4) 
  %6 : Float(3, 4, strides=[4, 1], requires_grad=0, device=cpu) = goofy::identity(%5) 
  return (%6)

if (!libFunc)
return success();
} else {
libFuncNamesUsed.push_back(libFuncName);
Copy link
Collaborator Author

@makslevental makslevental Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libFuncNamesUsed is ultimately used to "import" shape functions - since the user-provided shape functions are already in the user module, this isn't necessary (and causes a segfault somewhere around ReifyAbstractInterpCalculationsUtils.cpp#L159).

if (isa<OperatorOp>(op)) {
auto opOp = cast<OperatorOp>(op);
auto opName = opOp->getAttr("name").cast<StringAttr>().getValue();
name_ = "operator." + opName.str();
Copy link
Collaborator Author

@makslevental makslevental Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape/dtype functions for operator ops should be namespaced one level deeper.

@makslevental makslevental force-pushed the operator_op_has_value_sem branch from cc5e2c2 to 7e69413 Compare March 20, 2023 22:36
resultType.isa<Torch::NoneType>() ||
(resultType.isa<Torch::ListType>() && cast<Torch::ListType>(resultType)
.getContainedType()
.isa<Torch::IntType>());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape functions return list<int>.

@powderluv
Copy link
Collaborator

This is potentially a way to support LLaMA 4bit quant via Torch-mlir.

@silvasean
Copy link
Contributor

We already have a plan for this (#1462) and it is in progress. The next step is to migrate to the dtype functions (see #1807). Any help on that would be appreciated.

@makslevental
Copy link
Collaborator Author

makslevental commented Mar 21, 2023

We already have a plan for this (#1462) and it is in progress. The next step is to migrate to the dtype functions (see #1807). Any help on that would be appreciated.

Okay but the approach proposed here works today with minimal expansion of the API surface. Is it possible to merge this in before the other roadmap is complete and then remove it afterwards? In fact, given this

From a design perspective, I don't see any option other than to require all custom ops be value-semantic.

if we instead just simply add HasValueSemantics to torch.operator, that will eliminate the special case checking in ReduceOpVariants and then this PR is completely forward compatible with your proposal (the rest of the changes are necessary for enabling importing user-provided shape/dtype functions, which your proposal calls for as well).

@makslevental makslevental force-pushed the operator_op_has_value_sem branch from ebbf9a5 to ae35567 Compare March 21, 2023 13:28
@silvasean
Copy link
Contributor

We already have a plan for this (#1462) and it is in progress. The next step is to migrate to the dtype functions (see #1807). Any help on that would be appreciated.

Okay but the approach proposed here works today with minimal expansion of the API surface. Is it possible to merge this in before the other roadmap is complete and then remove it afterwards?

No, I don't think that's a wise engineering decision. Removing workarounds is usually 10x the work of adding them. It is better to push forward and complete the plan as originally specified. I think we will probably end up with something kind of similar to this in some aspects, but until we have migrated to the dtype functions there isn't much point since we won't be able to compile real models (see explanation in #1807 for why it happens on a branch). An equally big problem is how this feature exposed to users -- the approach in this patch "happens to work", but does not rely on a supported API surface area.

@makslevental
Copy link
Collaborator Author

We already have a plan for this (#1462) and it is in progress. The next step is to migrate to the dtype functions (see #1807). Any help on that would be appreciated.

Okay but the approach proposed here works today with minimal expansion of the API surface. Is it possible to merge this in before the other roadmap is complete and then remove it afterwards?

No, I don't think that's a wise engineering decision. Removing workarounds is usually 10x the work of adding them. It is better to push forward and complete the plan as originally specified. I think we will probably end up with something kind of similar to this in some aspects, but until we have migrated to the dtype functions there isn't much point since we won't be able to compile real models (see explanation in #1807 for why it happens on a branch). An equally big problem is how this feature exposed to users -- the approach in this patch "happens to work", but does not rely on a supported API surface area.

And how about with the change I just pushed?

@silvasean
Copy link
Contributor

And how about with the change I just pushed?

It doesn't fundamentally address any of the issues I mentioned.

@makslevental
Copy link
Collaborator Author

makslevental commented Mar 21, 2023

It doesn't fundamentally address any of the issues I mentioned.

It does;

  1. from your proposal

    From a design perspective, I don't see any option other than to require all custom ops be value-semantic. This seems like it will be enough for the use cases that the community has presented so far.

    So you will need to mark torch.operator as HasValueSemantics inevitably as well.

  2. An equally big problem is how this feature exposed to users -- the approach in this patch "happens to work", but does not rely on a supported API surface area.

    With the explicit trait, it becomes explicit API surface.

  3. we won't be able to compile real models

    I can compile, today, real models, by creating a torch custom op quantized.matmul, then lowering the opaque torch.operator "quantized.matmul" and then replacing it with a custom implementation of quantized.matul (in terms of linalg) on the otherside of the backend contract.

There is nothing here that won't have to look exactly the same to support your proposal - it just frontloads HasValueSemantics and the user-provided shape functionality.

@@ -191,7 +191,10 @@ static bool isValidNonContainerResultType(Type resultType) {
resultType.isa<Torch::FloatType>() ||
resultType.isa<Torch::IntType>() ||
resultType.isa<Torch::BoolType>() ||
resultType.isa<Torch::NoneType>();
resultType.isa<Torch::NoneType>() ||
(resultType.isa<Torch::ListType>() && cast<Torch::ListType>(resultType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be doing this. This function was created with the goal of preventing something like a ListType return to reach the backend contract. This would lead to invalid IR being generated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine but if you want to enable user-provided shape and dtype functions in the same parent module then there needs to be special casing for them. The alternative, is to provide some mechanism for passing handles to a ModuleOp all the way down into wrapWithCalculateOpIfLibraryFunctionAvailable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User provided shape and dtype functions will be handled exactly the same way that current shape and dtype functions are handled. The plan is to load them from a .mlir file when the shape or dtype pass is taking place, then remove them once the pass is done. This is all outlined in the RFC: #1462. There shouldn't be any coupling between the shape and dtype pipelines and the rest of the passes in torch-mlir.

@ramiro050
Copy link
Collaborator

So you will need to mark torch.operator as HasValueSemantics inevitably as well.

We will likely require all custom ops to be value-semantics. However, this does not mean we want all torch.operators generated to have value semantics. This would generate invalid IR in cases where the op in the torch.operator is an actual torch op that does not have value semantics, which could lead to errors that are very difficult to debug. What we should do is have a method of identifying ops that have a shape and dtype function in the abstract interpretation library, and mark those ops as having value semantics.

I agree with Sean's comments that it would be really appreciated if we could focus our work efforts on getting the custom op support RFC published a few months ago across the finish line rather than duplicating efforts.

@makslevental
Copy link
Collaborator Author

makslevental commented Mar 21, 2023

@ramiro050

We will likely require all custom ops to be value-semantics. However, this does not mean we want all torch.operators generated to have value semantics. This would generate invalid IR in cases where the op in the torch.operator is an actual torch op that does not have value semantics, which could lead to errors that are very difficult to debug. What we should do is have a method of identifying ops that have a shape and dtype function in the abstract interpretation library, and mark those ops as having value semantics.

If you check the first commit in this stack, this literally exactly what I had

static bool operatorOpHasValueSemantics(OperatorOp opOp) {
  if (!opOp->hasAttr("has_value_semantics"))
    return false;
  auto hasValueSemantics =
      opOp->getAttr("has_value_semantics").cast<BoolAttr>().getValue();
  return hasValueSemantics;
}

and then special case checks in ConvertHasValueSemanticsOpsToValueTensors. I removed this because Sean believes "the approach in this [commit] 'happens to work', but does not rely on a supported API surface area."

I'm happy to develop more safety and docs in order to get it to cross the threshold from "happens to work" to "supported API surface area."

@powderluv
Copy link
Collaborator

Today torch-mlir lacks the ability to lower state of art quantized models (llama 4bit, stable diffusion 8bit, and other custom models).

IIUC this PR allows for a pragmatic but short term way to support 4bit llama and 8bit SD today for our customers. It is forward progress. We can continue to invest in bringing the best approach and can get dedicated resources to help with the right approach but we do have to ship something and can't wait O(months) since customers will just bypass torch-mlir.

@ramiro050
Copy link
Collaborator

ramiro050 commented Mar 21, 2023

If you check the first commit in this stack, this literally exactly what I had

static bool operatorOpHasValueSemantics(OperatorOp opOp) {
  if (!opOp->hasAttr("has_value_semantics"))
    return false;
  auto hasValueSemantics =
      opOp->getAttr("has_value_semantics").cast<BoolAttr>().getValue();
  return hasValueSemantics;
}

and then special case checks in ConvertHasValueSemanticsOpsToValueTensors. I removed this because Sean believes "the approach in this [commit] 'happens to work', but does not rely on a supported API surface area."

I'm happy to develop more safety and docs in order to get it to cross the threshold from "happens to work" to "supported API surface area."

Sorry, this isn't what I meant. Users shouldn't have to modify the IR. We should have a very simple pass that checks if the torch.operator has a dtype+shape function in the library, and if so, turn the operator into a value semantics one.

@makslevental
Copy link
Collaborator Author

makslevental commented Mar 21, 2023

Sorry, this isn't what I meant. Users shouldn't have to modify the IR. We should have a very simple pass that checks if the torch.operator has a dtype+shape function in the library, and if so, turn the operator into a value semantics one.

I'm sorry, I'm not trying to be argumentative but I'm not sure why

hasDtypeFn && hasShapeFn => HasValueSemantics

is an appropriate contract/model/invariant; can't (conceptually) non-valsem ops support dtype and shape refinement? Like I don't see what type refinement even has to do (conceptually) with value semantics.

And having to implement those functions isn't "hidden API surface" (combined a with a whole pass) but

def Torch_OperatorOp : Torch_Op<"operator", [
    AllowsTypeRefinement
  ]> {
  let arguments = (ins StrAttr:$name,  
    DefaultValuedOptionalAttr<BoolAttr, "false">:$has_value_semantics,
    Variadic<AnyTorchType>:$operands
  );
}

is hidden API surface? Or onerous? Or bad practice? There are ample examples of just this pattern in mlir-hlo; in particular
https://github.com/tensorflow/mlir-hlo/blob/cad6055ad2e9779ba01d318ca2c0a9ce5dc968e7/lhlo/IR/lhlo_ops.td#L642.

@ramiro050
Copy link
Collaborator

I'm sorry, I'm not trying to be argumentative but I'm not sure why

hasDtypeFn && hasShapeFn => HasValueSemantics

Not exactly. It's more like

hasDtypeFn && hasShapeFn && isTorchOperator => isCustomOp

The RFC then makes the proposal of assuming isCustomOp => hasValueSemantics.

can't (conceptually) non-valsem ops support dtype and shape refinement? Like I don't see what type refinement even has to do (conceptually) with value semantics.

Indeed, type refinement on its own has nothing to do with value semantics. Once again, this is a fundamental assumption of the RFC. Since it was posted several months ago, no member of the community has disagreed with this assumption in the design, but if you have concerns, we should definitely discuss them in the RFC.

And having to implement those functions isn't "hidden API surface" (combined a with a whole pass) but

def Torch_OperatorOp : Torch_Op<"operator", [
    AllowsTypeRefinement
  ]> {
  let arguments = (ins StrAttr:$name,  
    DefaultValuedOptionalAttr<BoolAttr, "false">:$has_value_semantics,
    Variadic<AnyTorchType>:$operands
  );
}

is hidden API surface? Or onerous? Or bad practice? There are ample examples of just this pattern in mlir-hlo; in particular https://github.com/tensorflow/mlir-hlo/blob/cad6055ad2e9779ba01d318ca2c0a9ce5dc968e7/lhlo/IR/lhlo_ops.td#L642.

Not sure I understand what your point is here. The implementing of shape+dtype functions is the design proposed in the RFC that so far no one has had issues with. It uses an approach people are already familiar with thanks to the shape library. Again, if you have design concerns, do bring them up on the RFC.

@silvasean
Copy link
Contributor

Okay folks, I want to take a step back here. For each custom op, we need shape/dtype functions and knowing that it has value semantics. This will be specified in the library. The mechanism for specifying value semantics is TBD but a simple way to do it is a dummy function __torch_mlir_has_value_semantics.foo.bar which, if present in the library, indicates value semantics for the foo.bar op (we probably don't want to bother using this mechanism for the existing non-custom ops, though in theory we could migrate to using this uniformly).

The biggest part that this PR doesn't address is how users interact with the custom ops API, and that is the part that currently needs design work. Torch-MLIR's only supported "stable" interface is the torch_mlir.compile Python API. There are no specific guarantees about the internal stability of various passes and IR constructs, though of course we can adjust those over time to user requirements. Monkey-patching the input like this PR does is definitely not supportable. I pitched something like $ torch-mlir-opt -pass-pipeline='torchscript-module-to-torch-backend-pipeline{extra-library=my_custom_ops.mlir}' in the original RFC, and this would translate to an input MLIR module to torch_mlir.compile. To avoid an MLIR module on the torch_mlir.compile API, we might want the input to torch_mlir.compile to be a list of shape/dtype/value-semantics functions at the Python level and have them be compiled internally. This needs design work, but is totally tractable.

I will point out that this custom ops RFC has been in place since October and was in large part developed for (and signed off by) the Torch-MLIR users in this thread asking for short-term workarounds. We have not seen contributions to the RFC by these users, not even at the baseline level of pings for "when will this be done? we need this", so the RFC has languished. We do not want to set the precedent that community members can deprioritize contributing and then expect short-term workarounds to be approved. That is not in line with LLVM community engineering practices and does not scale to a vibrant community upholding high engineering standards.

Going forward we should be more intentional about indicating and following up on the relative priority and timelines for features that are expected to be needed, so that they can be ready at the right time. This is a reality of open-source development across a variety of industry partners -- unlike within a single monolithic organization, the community doesn't have a direct sense of the importance of different work, so it falls on everyone to drive/champion/follow-up on the features that are important to them, especially when work is being done for them by other community members. (there are lots of times where users ask for things but then drop off the radar and end up not needing the feature they asked for, so slowing down work on things people don't actively ask for is "a feature, not a bug", though here it seems like the outcome wasn't great)

Concretely, to move forward here, what would be useful is if we can do the following:

  1. We need help migrating to the dtype functions -- this is largely mechanical work and can be done by anybody. In particular though it would greatly free up Ramiro's time and allow him to work in parallel on the other aspects.
  2. Ramiro and myself can prioritize designing and implementing the API and other aspects to complete the custom ops feature.

This is definitely an O(weeks) type thing and not O(months). We originally specced the RFC as a 1-2 month project and significant work has already happened, so expect less than that.

@powderluv
Copy link
Collaborator

Few notes to consider:

1: Ouch.

Concretely, to move forward here, what would be useful is if we can do the following:

  1. We need help migrating to the dtype functions -- this is largely mechanical work and can be done by anybody. In particular though it would greatly free up Ramiro's time and allow him to work in parallel on the other aspects.
  2. Ramiro and myself can prioritize designing and implementing the API and other aspects to complete the custom ops feature.

That sounds like "yeah the community can do all the grunt / mechanical work let Sean / Ramiro handle the architectural work". Besides that is what we have been doing since September 2022 when we created this branch https://github.com/llvm/torch-mlir/tree/custom-op-example for existing custom op users and the RFC was floated for the new way forward. We are six months in and so we see this PR as a pragmatic stopgap. We are happy to hash out what the compile api should look like.

2: Ouch again.

We have not seen contributions to the RFC by these users, not even at the baseline level of pings for "when will this be done? we need this", so the RFC has languished. We do not want to set the precedent that community members can deprioritize contributing and then expect short-term workarounds to be approved. That is not in line with LLVM community engineering practices and does not scale to a vibrant community upholding high engineering standards.

As the largest contributor in terms of commits to torch-mlir, including the grunt work of keeping up to date with RollPyTorch the CI, LLVM rotations we would like to know how we can do better. Besides we don't think this is a short term workaround - it accelerates the custom op support (happy to discuss any api surface changes) see three below.

3: Most importantly existing dtypes port is not required for custom op support.
dtype migration isn't a blocker for torch.operator (

def Torch_OperatorOp : Torch_Op<"operator", [
) because torch.operator is not currently (and never was) handled by the RefineTypes pass (ctrl+f OperatorOp here https://github.com/llvm/torch-mlir/blob/c2ef5f41652776cbbec66988c6d3048fbe8e6319/lib/Dialect/Torch/Transforms/RefineTypes.cpp)

All that said, like I offered in late January for dtype porting we are happy to get someone to help on the mechanical dtype work but we shouldn't conflate it as a mandatory requirement for custom op support as proven by this PR (modulo API refinements).

@ramiro050
Copy link
Collaborator

3: Most importantly existing dtypes port is not required for custom op support. dtype migration isn't a blocker for torch.operator (

def Torch_OperatorOp : Torch_Op<"operator", [

) because torch.operator is not currently (and never was) handled by the RefineTypes pass (ctrl+f OperatorOp here https://github.com/llvm/torch-mlir/blob/c2ef5f41652776cbbec66988c6d3048fbe8e6319/lib/Dialect/Torch/Transforms/RefineTypes.cpp)
All that said, like I offered in late January for dtype porting we are happy to get someone to help on the mechanical dtype work but we shouldn't conflate it as a mandatory requirement for custom op support as proven by this PR (modulo API refinements).

I wanna quickly comment on this since I am one of the people most familiar with this issue and I think it is important for guiding future decisions on this.

This is a blocker for custom op support. Any op that has to be handled by the new shape+dtype inference pipeline is affected by the dtype function transition. The reason is explained in detail here #1807, but I will try to give a different explanation, since it seems to still be a point of a lot of confusion.

First, when we get to the type refinement level, custom ops and regular ops are essentially the same. We need a pass that goes to that op, and uses the information from the inputs of the op to calculate the shape and dtype of the outputs. For dtypes, this currently can happen in two places: RefineTypes, dtype-functions pipeline. Because for most ops you need to know the types of the inputs to determine their result type, type information in the graphs Torch-MLIR deals with flows almost always down the graph in the order the ops are written.

If your graph contains some ops that are handled by RefineTypes and some ops that are handled by dtype-functions, you can basically think of your graph as a series of blocks, where each block has one or more ops in it:

RefineTypes block
dtype-functions block
RefineTypes block
dtype-functions block
...

Because the information for dtypes flows in order down the graph, when RefineTypes happens, it will only handle the first RefineTypes block and stop when it reaches the first dtype-functions block. Then the dtype-functions pipeline will handle the dtype-functions block and stop when the next RefineTypes block is encountered. This will keep happening until the entire graph is covered, which leads to our issue. If the graph has many custom ops, it will require many iterations of the LowerToBackendContract pipeline. If the number of iterations reaches the iteration limit, this will cause LowerToBackendContract to fail. So for users with large graphs and many RefineTypes blocks and dtype-functions blocks, the custom op support offered here will not work.

Note: this has nothing to do with torch.operator appearing in the RefineTypes file. You are indeed correct that RefineTypes has never handled this op.

@makslevental
Copy link
Collaborator Author

makslevental commented Mar 22, 2023

I wanna quickly comment on this since I am one of the people most familiar with this issue and I think it is important for guiding future decisions on this.

Okay it's true that wasn't clear to me so thank you for the clarification but

If the graph has many custom ops, it will require many iterations of the LowerToBackendContract pipeline. If the number of iterations reaches the iteration limit, this will cause LowerToBackendContract to fail.

is a perfectly conservative and explicit graceful fail for anyone attempting to use this PR (if it were merged) - users who are willing to pay the price will bump the maxIterations until things go through and everyone else will get the same conservative result they get without the PR - torch.operator fails to pass through backend contract. In addition, torch.operator is literally the only op for which this partial implemenation is satisfactory even from the maxIterations perspective; no user will ever have more than a few of these so the (RefineTypes block, dtype-functions block) iterations will look like

RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
dtype-functions block // for the lone torch.operator
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
RefineTypes block
...

@silvasean
Copy link
Contributor

@powderluv what is your specific timeline that you would like to have this done by? If 2-3 weeks is okay with your timeline, then let's work together, prioritize this, and implement it as we already had specced and planned. If this is needed sooner, then I really can't think of a way to move forward with this on the main branch while sticking to our engineering principles and overarching LLVM community practices.

That sounds like "yeah the community can do all the grunt / mechanical work let Sean / Ramiro handle the architectural work". Besides that is what we have been doing since September 2022 when we created this branch https://github.com/llvm/torch-mlir/tree/custom-op-example for existing custom op users and the RFC was floated for the new way forward. We are six months in and so we see this PR as a pragmatic stopgap. We are happy to hash out what the compile api should look like.

Sorry, I didn't intend it to come off like "grunt work", but there is a separation between the more mechanical changes here and the parts that need a deeper analysis of the architectural implications, and Ramiro and I happen to be the ones with more extensive knowledge here. I am happy to help other folks who want to work on this, but since delivery time is the primary criterion here, having Ramiro and I work on it seems most practical.

As the largest contributor in terms of commits to torch-mlir, including the grunt work of keeping up to date with RollPyTorch the CI, LLVM rotations we would like to know how we can do better. Besides we don't think this is a short term workaround - it accelerates the custom op support (happy to discuss any api surface changes) see three below.

I think that pinging PR's or RFC's that aren't moving but that are important would be my specific request, and giving expected timelines for things. This PR as-is is definitely a short-term workaround. Undoing these things is really, really difficult. Once you start punching layering/API holes things fall apart very quickly and repairing that is always 10x more work.

All that said, like I offered in late January for dtype porting we are happy to get someone to help on the mechanical dtype work but we shouldn't conflate it as a mandatory requirement for custom op support as proven by this PR (modulo API refinements).

I agree, but the "modulo API refinements" is probably 2-3 weeks end-to-end time to implement. And as Ramiro said you are likely to hit a significant compilation time issue if we don't port the dtype functions too so it is kind of risky to try to ignore the dtype functions too (could result in 100x+ compilation time blowup for a model like llama with a lot of layers).

@powderluv
Copy link
Collaborator

2-3 weeks is great. Let's split up the tasks and prioritize them. @gpetters94 is back on Friday and can help with dtype porting.

@makslevental
Copy link
Collaborator Author

makslevental commented Mar 22, 2023

And as Ramiro said you are likely to hit a significant compilation time issue if we don't port the dtype functions too so it is kind of risky to try to ignore the dtype functions too (could result in 100x+ compilation time blowup for a model like llama with a lot of layers)

That's only 100x (at worst) blowup in compile from PyTorch to backend-contract, not the entire compilation to LLVM target. Let's say it's a 10s compile time (I would hope it doesn't take that long to go from PyTorch to backend-contract). Wouldn't it be okay if some people are fine with waiting 16 minutes to lower llama/SD today? And again, it's a completely opt-in slow-down; no one is going to hit this accidentally for custom ops/unimplemented ops because they won't provide shape/dtype functions for their custom torch.operator and thus TorchDtypeRefinementPipeline will never run at all.

I don't have SD or llama on hand but I timed EfficientNet_B7, which has ~1100 ops (491 convs) and it took ~8.043s. So again, if someone is willing to spend 13 minutes waiting for a large model to compile, what's the harm in enabling them to do that?

@silvasean silvasean mentioned this pull request Mar 22, 2023
4 tasks
@silvasean
Copy link
Contributor

Hey folks I've created another issue to shepherd this feature to completion at a level of concrete action items: #1963

I know some folks are on a deadline and hopefully this will help show remaining work and how on track we are for transparency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants