[Relax] Implement relax.op.view #16955

Lunderberg · 2024-04-29T17:15:00Z

This commit implements relax.op.view (R.view in TVMScript) to produce a view into an existing array. This returned view shares the same backing allocation as the existing array.

Because R.view comes with potential trade-offs; such as increased memory footprint, performance cost to apply a non-zero DLTensor::byte_offset, and potential misalignment for vector operators; this PR does not use R.view apart from unit tests. Applications of R.view, either for specific compute kernels or in optimization passes, is instead kept for follow-up PRs.

Lunderberg · 2024-04-29T17:20:32Z

By line count, this PR looks bigger than it actually is. Because this functionality can , there's a lot more error checking than usual, and a lot more test cases to validate those error-checking paths. Breaking down the changes in this commit, the majority are test cases, and the majority of what remain are error-checking paths.

~100 lines implementation in view.cc
~260 lines error checking in view.cc
~100 lines exposing the functionality through the Python API
~750 lines testing the functionality in test_op_view.py

This commit implements `relax.op.view` (`R.view` in TVMScript) to produce a view into an existing array. This returned view shares the same backing allocation as the existing array. Because `R.view` comes with potential trade-offs; such as increased memory footprint, performance cost to apply a non-zero `DLTensor::byte_offset`, and potential misalignment for vector operators; this PR does not use `R.view` apart from unit tests. Applications of `R.view`, either for specific compute kernels or in optimization passes, is instead kept for follow-up PRs.

masahi · 2024-04-29T18:40:07Z

src/relax/op/tensor/view.cc

+  }
+
+  StructInfoDeriveFunc infer_sinfo_env_func;
+  infer_sinfo_env_func = EnvFunc::Get("tvm.relax.struct_info.infer_view_sinfo");


Why does this need to be a packed func? Can't we use the C++ function directly?

If we want to define the StructInfo for the generated Call node, we could call the C++ function directly. Ideally, though, if the arguments change due to some downstream transform, the shape inference should be repeated with the new argument. For that

Using an operator's FInferStructInfo function. This only applies to tvm.ir.Op instances, and we've just removed that as part of LegalizeOps.

Using the params and ret fields from the FuncStructInfo. This works for static cases, and most dynamic shapes, but doesn't support inferring the output shape based on a ShapeExpr argument.

Using FuncStructInfo::OpaqueFunc with a derivation func. This is the most general method, and essentially lets you pack a FInferStructInfo into a FuncStructInfo.

The third option doesn't come up very often, and I could only find one previous location where this functionality is used. (Previous usage is here, which defines the use of CallNode::sinfo_args as the default inferred struct info for external functions.)

tqchen · 2024-04-30T14:08:53Z

Just want to note here. Having view operation can in general cause problems mainly because most ops(including generated and external ones) assumes elem_offset = 0 for performance reasons(alignment, less kernel argument).

Ideally we would like to ensure such assumption to hold. This being said there are some needs in slicing out the arrays. e.g. in the case of LoRA elements. There are a few ways to go with this:

Enable special ops that handle inputs which can come a view, likely only LoRA ones, which can inline view operations into ops.
Add a R.memory.ensure_compact operation, which can potentially results in a copy for backends that do not have direct memory ptr access, but can potentially do ptr editing for backends that support them (this would need a target dependent lowering)
R.view perhaps should be renamed as R.memory.view, this is a more advanced operator that contains certain assumptions and likely not something we want to advertise genrally for now.

src/relax/op/tensor/view.h

Lunderberg · 2024-04-30T20:17:44Z

Having view operation can in general cause problems mainly because most ops(including generated and external ones) assumes elem_offset = 0

So long as the operations assert that the element offset is zero when this assumption is being made, this makes sense. This is what we do in MakePackedAPI for PrimFuncs that require the elem_offset to be zero.

For external operations that accept an aligned pointer to data, I like your suggestion of ensuring that we provide aligned data. My plan was to only introduce views that maintain the same alignment that is provided by existing allocations.

For external operations that accept a NDArray and assume the offset is zero without validating that assumption, I think we should view this as a bug in those external operations. If we know specific cases that ignore the offset, we can definitely insert alignment operators to provide aligned buffers. I wouldn't want to add it for every single external operation, though, because at some point we need to trust that functions accept the arguments that they say are accepted.

Add a R.memory.ensure_compact operation, which can potentially results in a copy for backends that do not have direct memory ptr access, but can potentially do ptr editing for backends that support them (this would need a target dependent lowering)

To be clear, do you mean R.memory.ensure_aligned instead of R.memory.ensure_compact? This has come up a few times in this discussion, and I want to make sure that we are discussing the same thing. The view operation cannot be applied to a strided NDArray, and its output is always compact.

Having the dedicated operation for it would also work well for dynamically-shaped arguments. In those cases, we wouldn't know until runtime whether the operation requires a copy or not in order to provide an aligned argument.

Enable special ops that handle inputs which can come a view, likely only LoRA ones, which can inline view operations into ops.

I agree with aiming to have views be fused with later operations where possible, though I'd add that this is not LoRA-specific functionality. Anywhere that CombineParallelMatmul can be used to improve the matmul performance, a view into the result can be used to avoid an unnecessary copy from the output.

R.view perhaps should be renamed as R.memory.view, this is a more advanced operator that contains certain assumptions and likely not something we want to advertise genrally for now.

I think the only assumption it makes is that a platform supports casting of pointers.

Regarding names, I agree that R.memory.view is a better name for it, and will update the PR.

- Rename `R.view` to `R.memory.view` - Rename `relax.op.view` to `relax.op.memory.view`

Lunderberg · 2024-04-30T20:29:29Z

Changes have been made as requested, ready for re-review.

tqchen · 2024-05-01T01:00:38Z

R.memory.ensure_compact will actually return a new NDArray whose elem_offset=0, and for devices that allows explicit ptr moving, this is something that can accelerate backends

Lunderberg · 2024-05-02T17:53:00Z

Having R.memory.ensure_aligned makes sense, as that will allow us to explicitly mark where additional copies are required after applying R.memory.view, before passing to a compute kernel. I'm putting it together as a follow-up PR.

Any other concerns? I think this PR is ready to merge.

dismissing my previous requests as they are addressed

tqchen · 2024-05-02T18:48:09Z

Thank you! my previous comments are addressed. Unfortunately i didn't get a chance to do a thourough read, would be good to get review from another person. So i just dissmissed my previous comment

Lunderberg · 2024-05-02T23:58:34Z

Sounds good, and thank you on the feedback!

masahi · 2024-05-03T19:47:11Z

python/tvm/relax/op/memory/view.py

+# specific language governing permissions and limitations
+# under the License.
+
+"""Operations that act on the DLTensor container """


Need update here

Sounds good, and updated.

masahi · 2024-05-03T19:47:27Z

python/tvm/relax/op/memory/view.py

+    dtype: Optional[Expr] = None,
+    relative_byte_offset: Optional[Expr] = None,
+) -> Expr:
+    """Broadcasts a tensor to a specified shape.


Need update here

Thank you on the catch, and updated.

masahi · 2024-05-03T19:48:24Z

python/tvm/script/parser/relax/entry.py

@@ -296,13 +298,17 @@ class CallableProxy(StructInfoProxy):
    purity : bool
        Whether the callable is pure.

+    derive_func: Optional[Union[str, tvm.ir.EnvFunc]]
+        The derivation function for the outputq


Typo outputq?

And what is a derivation function?

Typo, updated to output.

The derivation function is equivalent to FInferStructInfo set for relax operations, but can be applied to an arbitrary function. It can be used in cases where a PackedFunc may be called, and the output StructInfo should be derived from the input arguments, rather than taken from sinfo_args. This functionality has existed in the C++ API for a while, but it looks like this is the first time it's been exposed through the Python API, so I fleshed out the description here.

masahi · 2024-05-08T19:19:11Z

@tvm-bot rerun

… R.view (#17145) Previously, `R.view` was legalized to extern call to `runtime.TVMArrayCreateView` during `LegalizeOps`. This call to extern func can't be properly handled by `StaticBlockPlanMemory` because it assumes the extern func does not retain the input buffer. Extern func returning a view of the input would break the ref count of the buffer. This PR defers the legalization of `R.view` so that it can be explicitly handled by memory planning. A new op `R.ensure_aligned` is added as discussed in #16955

Lunderberg requested a review from masahi April 29, 2024 17:15

Lunderberg force-pushed the relax_implement_view_operator branch from 0cb58ce to ed4fd50 Compare April 29, 2024 17:34

masahi requested review from tqchen and Hzfengsy April 29, 2024 18:36

masahi reviewed Apr 29, 2024

View reviewed changes

tqchen previously requested changes Apr 30, 2024

View reviewed changes

src/relax/op/tensor/view.h Outdated Show resolved Hide resolved

Move view operation to be in the "memory" group

fb7e615

- Rename `R.view` to `R.memory.view` - Rename `relax.op.view` to `relax.op.memory.view`

masahi reviewed May 3, 2024

View reviewed changes

Updates based on review comments

3da5e86

masahi approved these changes May 8, 2024

View reviewed changes

masahi merged commit 4c1ebcf into apache:main May 9, 2024
20 checks passed

Lunderberg deleted the relax_implement_view_operator branch May 9, 2024 13:21

vinx13 mentioned this pull request Jul 9, 2024

[Relax] Implement R.ensure_zero_offset and update memory planning for R.view #17145

Merged

ysh329 mentioned this pull request Jul 20, 2024

[Release] v0.17.0 Release Candidate Notes #17178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relax] Implement relax.op.view #16955

[Relax] Implement relax.op.view #16955

Lunderberg commented Apr 29, 2024

Lunderberg commented Apr 29, 2024

masahi Apr 29, 2024

Lunderberg Apr 29, 2024

tqchen commented Apr 30, 2024 •

edited

Loading

Lunderberg commented Apr 30, 2024

Lunderberg commented Apr 30, 2024

tqchen commented May 1, 2024

Lunderberg commented May 2, 2024

tqchen commented May 2, 2024

Lunderberg commented May 2, 2024

masahi May 3, 2024

Lunderberg May 8, 2024

masahi May 3, 2024

Lunderberg May 8, 2024

masahi May 3, 2024

Lunderberg May 8, 2024

masahi commented May 8, 2024

[Relax] Implement relax.op.view #16955

[Relax] Implement relax.op.view #16955

Conversation

Lunderberg commented Apr 29, 2024

Lunderberg commented Apr 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Apr 30, 2024 • edited Loading

Lunderberg commented Apr 30, 2024

Lunderberg commented Apr 30, 2024

tqchen commented May 1, 2024

Lunderberg commented May 2, 2024

tqchen commented May 2, 2024

Lunderberg commented May 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi commented May 8, 2024

tqchen commented Apr 30, 2024 •

edited

Loading