-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF-#4445: Stop recomputing both indices for axis-wide applies. #4460
base: master
Are you sure you want to change the base?
PERF-#4445: Stop recomputing both indices for axis-wide applies. #4460
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4460 +/- ##
==========================================
- Coverage 86.82% 86.19% -0.63%
==========================================
Files 222 222
Lines 18041 18100 +59
==========================================
- Hits 15664 15602 -62
- Misses 2377 2498 +121
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
@@ -1864,6 +1864,7 @@ def apply_full_axis( | |||
new_index=None, | |||
new_columns=None, | |||
dtypes=None, | |||
func_may_change_complementary_index_size=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we would pass new row lengths and new column widths from the QC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The QC probably shouldn't know anything about this (partition lengths) though. However, what do you think on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func_may_change_complementary_index_size
looks a bit clumsy.
…applies. Signed-off-by: mvashishtha <[email protected]>
Signed-off-by: mvashishtha <[email protected]>
68c4302
to
a1b861d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvashishtha I like the idea of splitting the functionality like this, but maybe we need a new operator instead of handling it like this?
The difference between the new operator and the current |
If we do not want to pass lengths from the QC to the core dataframe, I would suggest renaming |
@mvashishtha For the algebra, typically there are two things that qualify an operator:
We typically want each operator to have certain guarantees without requiring too much information from upper layers. This is why I believe it should be a different operator, since both metadata and data can change at the same time, which isn't something the current operators explicitly support. |
@devin-petersohn The new param I introduced to modin/modin/core/dataframe/pandas/dataframe/dataframe.py Lines 2101 to 2102 in cca9468
|
@mvashishtha for now, let's keep this fix local to the axis-wide apply. we should revisit after. |
Signed-off-by: mvashishtha [email protected]
What do these changes do?
Stop recomputing both indices for axis-wide applies. These changes make the slow computation in the snippet from #4445 (comment) asynchronous.
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date