-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bounded Dynamism RFC #194
Bounded Dynamism RFC #194
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your proposal and for your patience as we're working out the StableHLO evolution process!
I think that this RFC provides a good summary of the current state of bounded dynamism in *HLO opsets in (P1)-(P3). Both the statically-shaped spec (spec_draft.md) and the unbounded dynamism RFC (#8) are still in the works, but once they are finished I believe that this RFC will work well as the delta that specifically introduced bounded dynamism.
My main comment is about (P4) which I believe needs further work, both with respect to technical details of the proposal and the organizational details of what this proposal means for the community. PTAL!
3861065
to
2c3021c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some drive-by comments.
Thanks for the feedback! I am talking with stakeholders to further refine this proposal and expand on the future vision and roadmap. Should be able to have an update in a week. |
Implements TODO to add verification for bounds. This is in preparation for moving bounds out of experimental as part of #194 I considered doing this in the RFC itself but better to do the code change separately and immediately given that these invariants are already assumed during type inference.
Addressed review comments. Thanks to @burmako for extensive offline discussions and help! Please take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing this. Overall it looks good, and it's great to see this being documented thoroughly. I'm not an expert on bounded dynamism, so I'll reserve judgement for folks who have more experience here.
I mostly left nits on wording and examples, though I'd like to better understand how we plan to unify the different mechanisms here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Smit! Thank you for the updated RFC and for your continued hard work on working out a story for bounded dynamism in StableHLO.
To the best of my understanding, at the high level this RFC can be summarized as follows:
- Let's document bounded tensor type, GetDimensionSizeOp and SetDimensionSizeOp that have been part of the StableHLO opset since its creation when it was forked from MHLO.
- Bounded dynamism can probably be reframed as a special case of unbounded dynamism, so let's pave the way towards migrating from the former to the latter in the long term.
Below, I will provide feedback on these two parts separately:
1) Some of the concepts that are used to discuss part 1 are rather informal in comparison with the spec, but I think that's fine. We haven't yet fully worked out the StableHLO formalism (although it feels that we're just a few weeks away from it being relatively complete), and in the meanwhile we have to be able to make forward progress, which this RFC achieves.
The only major change to part 1 that I would request (in addition to a bunch of smaller comments) is for this PR to also update spec.md with sections for GetDimensionSizeOp and SetDimensionSizeOp.
2) As far as part 2 goes, my understanding is that it is currently a hypothesis that bounded dynamism can be reframed a special case of unbounded dynamism. This hypothesis looks plausible to me, but until we have strong theoretical and/or experimental evidence, I don't think we should recommend that "Producers are encouraged to use the unbounded dynamic operations" just yet.
Similarly, I would request to mark P4 as aspirational and sharpen it a little bit. For example, at the moment, it is not quite clear to me what the expected takeaway from P4 should be. It compares unbounded and bounded programs that do something similar, calls out the potential (i.e. plausible but not yet realized) benefits of unbounded programs and... that's it. Perhaps reformulate P4 or add some kind of a conclusion at the end?
3) Finally, I would like to better understand the proposed sequencing of this RFC with the upcoming dynamism RFC. E.g. P2 says "All ops that support dynamic operands or results can have bounds specified for them", but we haven't yet documented which StableHLO ops should document dynamic operands or result.
Does this mean that the proposal is to wait until the dynamism RFC is worked out and then review this RFC? Or the idea is to discuss this RFC in the context of what's currently roughly planned for the dynamism RFC and seek an advance approval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. I made a few comments below for you to help address.
Overall, I take it that the existing set_dimension_size
and get_dimension_size
APIs work as intended (i.e. identical to what the downstream XLA provides); the new dynamism APIs proposed here unlock unbounded dynamism which is theorized to be more flexible than the XLA bounded dynamism API. Correct me if I missed a key takeaway here plz.
rfcs/20220925-bounded-dynamism.md
Outdated
|
||
Bounded dynamism allows programs to represent the maximum runtime size that a | ||
particular dynamic dimension of a tensor can have. This makes it possible to run | ||
such programs on hardware that doesn't support dynamic tensors but could support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mostly a limitation of the software stack rather than the hardware when this happens. Now it may be that the hardware makes it difficult and or the resultant program's performance or space requirements get too great for the HW, but it's not an intrinsic HW issue. Perhaps hardware -> platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, better to be generic.
to separate approval. Note that producers that generate unbounded programs | ||
don't need this | ||
in StableHLO. | ||
* Provide a transformation that converts StableHLO programs to bounded StableHLO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is bound propagation in scope? E.g., given program with bounds on inputs/certain ops, propagate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not. There is some related work in #622
|
||
## Detailed Proposal | ||
|
||
### (P1) Bounded tensor type using the encoding field in the RankedTensorType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets start with adding a "wrapper" class so that one can use LLVM style RTTI (e.g., dyn_cast) and then use its convenience methods. In particular this would allow flipping extension attribute to an interface later if needed without changes in users. The implementation details of how stored and actual type remains the same, but C++ API is more shielded from underlying [in particular I do want to be able to reference a context as in value inference above]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea!
Filed #812
`RankedTensorType` encoding field. Bounds in `TypeExtensionsAttr` is an | ||
`int64_t` array of size equal to rank of the tensor. Values corresponding to | ||
static dimensions must be `ShapedType::kDynamicSize` which is printed as `?` in | ||
the IR. Disallowing a static bound value for static dimensions makes the IR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative is to only represent bounds for dynamic dims (e.g., size of bounds == number of dynamic dims).
Now XLA today uses bounds here to effectively represent a view into a padded structure (that and the API implementations thereof are implementation details of the interpretations there rather than represented in StableHLO or type system) and in that world padding of static shapes make sense. As the padded values represent the shape of the underlying memory. Now Tensor type here does not represent memory, but just to say one can still have dynamic or static dims in tensor type while having static bounds and it would make sense for some users. (its not a use case I'm particularly interested in as those should be memrefs as there is explicit memory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we went with the simple list because it will provide easier accessor but that doesn't have to part of the IR given your comment about wrapper that could provide other views. I think we should at least weigh this.
%arg3: tensor<2xf32>, | ||
%arg4: tensor<4xf32>) { | ||
// %arg0 is compatible with %arg1, %arg2 and %arg3 as bounded types could have | ||
// tensor<2xf32> type during runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same could be said for a tensor<?xf32>
and tensor<*xf32>
input too. Is that allowed operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
%arg4: tensor<4xf32>) { | ||
// %arg0 is compatible with %arg1, %arg2 and %arg3 as bounded types could have | ||
// tensor<2xf32> type during runtime. | ||
%0 = "stablehlo.add"(%arg0, %arg1) : (tensor<?xf32, #stablehlo.type_extensions<bounds = [3]>>, tensor<?xf32, #stablehlo.type_extensions<bounds = [2]>>) -> tensor<?xf32, #stablehlo.type_extensions<bounds = [2]>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From type inference point of view, this op here and its behavior effectively adds constraint that arg0 is bounded by 2 and the next does the same for arg2, and 70 makes it required to be. So given elementwise constraint on add we can therefore backpropagate that these aren't actually dynamic but all are tensor<2xf32>
. Is that consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we could do that. For verification, we end up only checking the direct and not doing aggressive analysis to infer more information but that can be done separately.
``` | ||
|
||
Currently, the StableHLO dialect uses the MLIR ranked tensor type to represent | ||
bounds. In the future we plan to introduce StableHLO type which supports bounds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where it is good if you used the little wrapper class approach above to shield C++ changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #812
### (P3) get\_dimension\_size / set\_dimension\_size ops | ||
|
||
The `get_dimension_size` op takes a tensor and a dimension index and returns the | ||
runtime size as `tensor<i32>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not use tensor<index>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of generalizing to i64 and index as well for consistency. But given there is no requirement now. Seemed better to follow XLA until some use-case comes up.
Currently, the StableHLO dialect uses the MLIR ranked tensor type to represent | ||
bounds. In the future we plan to introduce StableHLO type which supports bounds, | ||
along with a custom pretty printing format. There will be a separate RFC on | ||
this. Also, the proposal will follow StableHLO backward compatibility policies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be good to show how function calls will be handled. E.g., are we keeping to the constraints imposed by func.func and func.call ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is imposed by func op and we are not making any changes there.
Update: We decided to merge this with #8 and have a single proposal covering both bounded and unbounded. Follow that issue to get updates and make any suggestions. |
The current design of dynamism in MHLO and StableHLO has been practically useful. There are success stories of it connecting JAX, PyTorch and TensorFlow to a number of compilers, in a mix of research and production environments. This RFC aims to leverage existing support for dynamism in the StableHLO dialect, discuss improvements to the existing design and then formalize the improved design in the StableHLO opset. The main challenge with writing this RFC was that it affects the entire opset. The current design involves a lot of corner cases, so it took about a year of practical evaluation by the author - primarily within JAX native serialization, but also correlating with other ongoing and prior projects - to distill the design into just a few general design principles. I think I'm happy with the outcome, but please take a look at the "Summary" section for what that entails. This RFC addresses a considerable chunk of community feedback on the existing design, but some feedback is deliberately left out of scope for this RFC to enable incremental progress while areas which require additional alignment are developed in parallel. See sections "Community feedback" and "Out of scope" for details. The only open question at the moment is interoperability with HLO with respect to bounded dynamism. The proposed representation for bounded dynamic types in StableHLO is in parity with the representation for bounded dynamic types in HLO. However, I'm not sure whether this parity covers 100% of bounded dynamism functionality. For example: 1) there appears to be a mismatch in how broadcasts are represented, 2) there is misalignment in representations of dynamic windows (HLO has a high-level representation: VALID and SAME, whereas StableHLO expects the producers to dynamically compute window sizes). Nonetheless, I think that this shouldn't block the initial review of the RFC, since there's a lot of stuff to discuss - and in the meanwhile, I'll be working on confirming interoperability with HLO. Finally, I'd like to acknowledge Smit Hinsu's work on the [Bounded Dynamism RFC](#194) from Q4 2022, which was superseded by this work. The representation for bounded dynamic types in the StableHLO dialect was designed and implemented by Smit, and Smit's proposal to allow bounded dynamic types everywhere is compatible with the more general proposal from this RFC to enable dynamism for all size-related program elements. Furthermore, Smit contributed the formal spec for get_dimension_size as well as the informal spec for set_dimension_size.
RFC describing the current state and future plans for bounded dynamism in StableHLO. This includes the tensor type with bounds, op semantics and bounded dynamism ops.