-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concretization replays loop transforms #3950
base: main
Are you sure you want to change the base?
Conversation
!test |
Review updated until commit 971a5e3 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
!test |
@@ -831,7 +831,36 @@ TensorView* DynamicTransformConcretizer::concretizeNonEmptyReshape( | |||
NVF_ERROR( | |||
old_logical.size() == new_logical.size(), | |||
"Concretized reshape logical size does not match symbolic logical size"); | |||
for (auto idx : c10::irange((int64_t)new_logical.size())) { | |||
|
|||
IterDomainMap old_logical_to_new; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After getting the PR to work, I'll merge this into fullReplay.
@@ -7787,87 +7787,6 @@ TEST_F(NVFuserTest, Reduction3DConstantIterationDomain) { | |||
testValidate(executor_cache.fusion(), cg_outputs, {t0}, __LINE__, __FILE__); | |||
} | |||
|
|||
// don't cache if the input tv is used by slice. | |||
// https://github.com/NVIDIA/Fuser/issues/1697 | |||
TEST_F(NVFuserTest, AvoidCachingSliceInput) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to test_resize because test_gpu3 is gigantic.
cc @jacobhinkle I tried to replay logical-to-leaf transforms from the original TV to the concretized TV. It works except for Fuser/tests/python/test_python_frontend.py Line 3369 in 7a9dbc6
According to Fuser/csrc/dynamic_transform.cpp Line 811 in 7a9dbc6
I'm not sure if reduction dimensions have to follow a certain order. If not, I could fix this by appending all unmapped reduction dimensions to the end of the loop domain. |
I am not sure I understand the issue. IIUC the purpose of this PR is to replay loop transforms when we do concretizations in order to preserve DID scheduling, right? When you concretize a reshape, neither the original or the replacement should have any reduction domains will it? |
Correct.
I thought so until I hit #1691, which I'm sure you know everything about. |
Ah ok, actually I had forgotten about that entirely. In a case like this we are actually replacing the ViewOp output with its input, so we're not creating a new TensorView are we? In that case the loop domain would still be in tact. However, if that TV was also replaced during concretization then it might no longer have a non-trivial loop domain... I assume this is the case you are worried about. |
Here's a case I want to support: Before concretization:
with all TVs' first dimension being outer-split by After concretization:
with all TVs' first dimension (i.e. i0) being split the same way as before. Therefore, this PR tries to replay loop transforms in addition.
That's right -- no new TensorViews are created. In fact, I can fix this error by diff --git a/csrc/dynamic_transform.cpp b/csrc/dynamic_transform.cpp
index a016f0af..3c9a3919 100644
--- a/csrc/dynamic_transform.cpp
+++ b/csrc/dynamic_transform.cpp
@@ -795,6 +795,9 @@ TensorView* DynamicTransformConcretizer::concretizeNonEmptyReshape(
TensorView* incomplete_out_tv,
const AnalyzeViewResult& view_analysis) {
TensorView* concrete_reshape_out_tv = reshape(inp_tv, view_analysis);
+ if (concrete_reshape_out_tv == inp_tv) {
+ return inp_tv;
+ }
// Extent expressions often change when concretizing a reshape. Here we
// replace these in all downstream expressions so that the Fusion looks just Why wasn't #1691 fixed this way? Can concretize_reshape_out_tv be different from inp_tv but still have more reduction dimensions than incomplete_out_tv? (It's currently fixed by removing reduction dimensions from concrete_reshape_out_tv before registering concretization). |
For #2563