-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fusion for pad op in Linalg #2783
Comments
Not sure if this is a dupe of #1605, but this issue has a nice description so I figured I'd put it here :) #2 here talks more about what he is doing: But the code is the easiest to understand, and is as straightforward as you'd expect: We have the same ability to do what he is here: our dynamic shapes are available as push constants (like his This has the not-at-all-insignificant side-effect of avoiding the need to pad out buffers in memory, using less space, allowing direct transfer from producers to consumers without reallocs/copies if the pad could not be fused into the producer, etc. |
This will be critical for the GPU burndown. |
I feel this is related to the discussion that @MaheshRavishankar and I had today. We are thinking to have a pad_subview op in Linalg, and see if we can handle it at vector.transfer_read/write with mask. This does avoid to pad out buffers in memory (ie using less memory), and it's really good to know that this is that fast. Every details are still unclear to me now, but it's good to know that we have a similar option here. |
The PR #9194 fuses |
@MaheshRavishankar Bumping this up as a stale P1 - please update or deprioritize as needed. |
Funny. I was just going to start working on some of this, and was looking for this bug :) |
One option is to have a
linalg.tensor_pad
. If thelinalg.tensor_pad
is the consumer, we can lower it to buffers world by adding thelinalg.fill
op andsubview
op to the top and passing the result ofsubview
op togeneric
op as output argument. For example:->
This only works when the pad is the "last op" (or say there is a buffer for the result tensor of pad op), because we need to make the subview and pass it to generic op.
Haven't get the idea for lowering when a
linalg.tensor_pad
op is a producer. We probably need #2782 feature, so we can have a temp memory in the middle.Tag @nicolasvasilache for visibility. I will discuss this approach with @nicolasvasilache
The text was updated successfully, but these errors were encountered: