Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove or split up Frame methods that use the index #10439

Merged
merged 18 commits into from
Mar 21, 2022

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Mar 15, 2022

This PR contributes to excising indexes from Frame and moving them entirely to IndexedFrame. A number of methods (such as copy, equals, and _mimic_inplace) have either been removed from Frame or had the index-related logic moved to corresponding function overrides in IndexedFrame. In the process, this PR also implements certain optimizations as part of the rewrite. Of particular interest is that because indexes are immutable, IndexedFrame.copy has been modified to always shallow copy the index, which should significantly reduce memory pressure in copy-heavy applications. This also results in significant performance improvements when the Index is not a RangeIndex, for example:

In [2]: df = cudf.DataFrame({"a": range(10000), "b": range(10000)}, index=[i*2 for i in range(10000)])
# Before
In [3]: %timeit df.copy(deep=True)
151 µs ± 2.05 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# After
In [3]: %timeit df.copy(deep=True)
119 µs ± 804 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Other methods rewritten in this PR have also been sped up, although the speedups are typically more modest. For example:

# Using the same DataFrame as above
# Before
In [4]: %timeit df.clip(100, 1000)
489 µs ± 4.39 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# After
In [4]: %timeit df.clip(100, 1000)
431 µs ± 2.42 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

@vyasr vyasr added 3 - Ready for Review Ready for review by team code quality Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 15, 2022
@vyasr vyasr added this to the CuDF Python Refactoring milestone Mar 15, 2022
@vyasr vyasr self-assigned this Mar 15, 2022
@vyasr vyasr requested a review from a team as a code owner March 15, 2022 22:39
@vyasr vyasr requested review from galipremsagar and isVoid March 15, 2022 22:39
@codecov
Copy link

codecov bot commented Mar 15, 2022

Codecov Report

Merging #10439 (ddbc468) into branch-22.04 (21ed251) will increase coverage by 0.19%.
The diff coverage is 100.00%.

❗ Current head ddbc468 differs from pull request most recent head 4f109fe. Consider uploading reports for the commit 4f109fe to get more accurate results

@@               Coverage Diff                @@
##           branch-22.04   #10439      +/-   ##
================================================
+ Coverage         85.95%   86.15%   +0.19%     
================================================
  Files               139      139              
  Lines             22435    22447      +12     
================================================
+ Hits              19285    19340      +55     
+ Misses             3150     3107      -43     
Impacted Files Coverage Δ
python/cudf/cudf/core/column_accessor.py 93.47% <100.00%> (+0.04%) ⬆️
python/cudf/cudf/core/dataframe.py 93.58% <100.00%> (+<0.01%) ⬆️
python/cudf/cudf/core/frame.py 91.84% <100.00%> (+<0.01%) ⬆️
python/cudf/cudf/core/index.py 92.27% <100.00%> (+0.04%) ⬆️
python/cudf/cudf/core/indexed_frame.py 92.97% <100.00%> (+0.67%) ⬆️
python/cudf/cudf/core/multiindex.py 92.14% <100.00%> (-0.02%) ⬇️
python/cudf/cudf/core/series.py 95.16% <100.00%> (ø)
python/cudf/cudf/core/single_column_frame.py 96.85% <100.00%> (-0.17%) ⬇️
python/cudf/cudf/core/column/string.py 88.91% <0.00%> (+0.12%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40baeb4...4f109fe. Read the comment docs.

@vyasr
Copy link
Contributor Author

vyasr commented Mar 16, 2022

rerun tests

python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/frame.py Show resolved Hide resolved
python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
@vyasr vyasr requested a review from galipremsagar March 17, 2022 00:42
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small comment below.

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved
@vyasr vyasr requested a review from isVoid March 17, 2022 20:55
@vyasr
Copy link
Contributor Author

vyasr commented Mar 21, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 2426faf into rapidsai:branch-22.04 Mar 21, 2022
@vyasr vyasr deleted the refactor/remove_frame_index branch March 31, 2022 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants