-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cython API Refactor: transpose.pyx
, sort.pyx
#10675
Cython API Refactor: transpose.pyx
, sort.pyx
#10675
Conversation
See Also | ||
-------- | ||
cudf.core.DataFrame.transpose | ||
def transpose(list source_columns): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to transpose
converts the categorical column into numerical column codes. These calls depends on higher level APIs/external APIs, which I would like to avoid in cython. I thus moved them to the python API.
along with referencing an owner Python object that owns the memory | ||
lifetime. owner must be either None or a list of column. If owner | ||
is a list of columns, the owner of the `i`th ``cudf::column_view`` | ||
in the table view is ``owners[i]``. For more about memory ownership, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current docstring is refers to owner
as an argument with backticks. But owners
(plural form) is the actual argument here.
@@ -1288,89 +1288,6 @@ def _quantiles( | |||
result._copy_type_metadata(self) | |||
return result | |||
|
|||
@_cudf_nvtx_annotate | |||
def rank( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is moved to indexed_frame
, since index.rank
is unsupported in pandas.
@pytest.mark.parametrize("num_rows", [1, 100]) | ||
@pytest.mark.parametrize("num_bins", [1, 10]) | ||
@pytest.mark.parametrize("right", [True, False]) | ||
@pytest.mark.parametrize("dtype", NUMERIC_TYPES + ["bool"]) | ||
@pytest.mark.parametrize("series_bins", [True, False]) | ||
def test_series_digitize(num_rows, num_bins, right, dtype, series_bins): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is moved from test_dataframe.py
. I also did some work to reduce parameters from 700+ to ~100.
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10675 +/- ##
================================================
+ Coverage 86.38% 87.02% +0.63%
================================================
Files 142 142
Lines 22334 23485 +1151
================================================
+ Hits 19294 20438 +1144
- Misses 3040 3047 +7
Continue to review full report at Codecov.
|
rerun tests |
@gpucibot merge |
This PR contributes to #10153, refactors all cython APIs in
transpose.pyx
,sort.pyx
to accept a list of columns as input.This PR also includes several minor improvements in the code base, see comments below for detail.