Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython API Refactor: transpose.pyx, sort.pyx #10675

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Apr 16, 2022

This PR contributes to #10153, refactors all cython APIs in transpose.pyx, sort.pyx to accept a list of columns as input.

This PR also includes several minor improvements in the code base, see comments below for detail.

@isVoid isVoid requested a review from a team as a code owner April 16, 2022 00:06
@isVoid isVoid requested review from vyasr and skirui-source April 16, 2022 00:06
@github-actions github-actions bot added the Python Affects Python cuDF API. label Apr 16, 2022
See Also
--------
cudf.core.DataFrame.transpose
def transpose(list source_columns):
Copy link
Contributor Author

@isVoid isVoid Apr 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change to transpose converts the categorical column into numerical column codes. These calls depends on higher level APIs/external APIs, which I would like to avoid in cython. I thus moved them to the python API.

Comment on lines +320 to +323
along with referencing an owner Python object that owns the memory
lifetime. owner must be either None or a list of column. If owner
is a list of columns, the owner of the `i`th ``cudf::column_view``
in the table view is ``owners[i]``. For more about memory ownership,
Copy link
Contributor Author

@isVoid isVoid Apr 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current docstring is refers to owner as an argument with backticks. But owners (plural form) is the actual argument here.

@@ -1288,89 +1288,6 @@ def _quantiles(
result._copy_type_metadata(self)
return result

@_cudf_nvtx_annotate
def rank(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is moved to indexed_frame, since index.rank is unsupported in pandas.

Comment on lines +1787 to +1792
@pytest.mark.parametrize("num_rows", [1, 100])
@pytest.mark.parametrize("num_bins", [1, 10])
@pytest.mark.parametrize("right", [True, False])
@pytest.mark.parametrize("dtype", NUMERIC_TYPES + ["bool"])
@pytest.mark.parametrize("series_bins", [True, False])
def test_series_digitize(num_rows, num_bins, right, dtype, series_bins):
Copy link
Contributor Author

@isVoid isVoid Apr 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is moved from test_dataframe.py. I also did some work to reduce parameters from 700+ to ~100.

@isVoid isVoid added non-breaking Non-breaking change 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function labels Apr 16, 2022
@codecov
Copy link

codecov bot commented Apr 16, 2022

Codecov Report

Merging #10675 (e2f7c27) into branch-22.06 (94a5d41) will increase coverage by 0.63%.
The diff coverage is 89.13%.

@@               Coverage Diff                @@
##           branch-22.06   #10675      +/-   ##
================================================
+ Coverage         86.38%   87.02%   +0.63%     
================================================
  Files               142      142              
  Lines             22334    23485    +1151     
================================================
+ Hits              19294    20438    +1144     
- Misses             3040     3047       +7     
Impacted Files Coverage Δ
python/cudf/cudf/core/frame.py 93.55% <66.66%> (-0.12%) ⬇️
python/cudf/cudf/core/indexed_frame.py 91.70% <89.47%> (-0.07%) ⬇️
python/cudf/cudf/core/dataframe.py 95.38% <95.00%> (+1.63%) ⬆️
python/cudf/cudf/core/column/numerical.py 96.17% <100.00%> (ø)
python/cudf/cudf/core/column/string.py 89.22% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.72% <0.00%> (+0.22%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/categorical.py 90.29% <0.00%> (+0.51%) ⬆️
python/cudf/cudf/core/column/lists.py 92.79% <0.00%> (+1.27%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94a5d41...e2f7c27. Read the comment docs.

@isVoid isVoid self-assigned this Apr 18, 2022
@isVoid
Copy link
Contributor Author

isVoid commented Apr 19, 2022

rerun tests

@isVoid
Copy link
Contributor Author

isVoid commented Apr 19, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 31a5f44 into rapidsai:branch-22.06 Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants