Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor cython interface: copying.pyx #10359

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Feb 25, 2022

Part of #10153

Aside from the two harder cases: boolean_mask_scatter and sample that's been addressed in #10202 and #10262 , this PR tackles rest of refactors that's in copying.pyx, in combination of the other two, this PR should address all interface refactor in copying.pyx.

@isVoid isVoid requested a review from a team as a code owner February 25, 2022 21:39
@github-actions github-actions bot added the Python Affects Python cuDF API. label Feb 25, 2022
@isVoid isVoid added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function tech debt labels Feb 25, 2022
@codecov
Copy link

codecov bot commented Feb 25, 2022

Codecov Report

Merging #10359 (f4349bd) into branch-22.04 (a7d88cd) will increase coverage by 75.73%.
The diff coverage is n/a.

Impacted file tree graph

@@                Coverage Diff                @@
##           branch-22.04   #10359       +/-   ##
=================================================
+ Coverage         10.42%   86.15%   +75.73%     
=================================================
  Files               119      139       +20     
  Lines             20603    22450     +1847     
=================================================
+ Hits               2148    19342    +17194     
+ Misses            18455     3108    -15347     
Impacted Files Coverage Δ
...ython/custreamz/custreamz/tests/test_dataframes.py 99.39% <0.00%> (-0.01%) ⬇️
python/custreamz/custreamz/_version.py 0.00% <0.00%> (ø)
python/dask_cudf/dask_cudf/_version.py 0.00% <0.00%> (ø)
...ython/dask_cudf/dask_cudf/io/tests/test_parquet.py 100.00% <0.00%> (ø)
python/cudf/cudf/core/udf/pipeline.py
...thon/dask_cudf/dask_cudf/tests/test_distributed.py 86.79% <0.00%> (ø)
python/dask_cudf/dask_cudf/tests/test_dispatch.py 100.00% <0.00%> (ø)
python/cudf/cudf/core/udf/utils.py 98.63% <0.00%> (ø)
python/dask_cudf/dask_cudf/tests/test_accessor.py 98.41% <0.00%> (ø)
python/dask_cudf/dask_cudf/tests/test_join.py 100.00% <0.00%> (ø)
... and 105 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9876cf...f4349bd. Read the comment docs.

@isVoid isVoid self-assigned this Feb 28, 2022
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One important question about duplicating work and a couple of questions about establishing good patterns for APIs like this, but generally looks good.

cdef table_view target_table_view = table_view_from_columns(
(target_column,))
cdef bool c_bounds_check = bounds_check
cdef scatter_scalar(list source_device_slrs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about: is there a better way for us to handle scalar vs column functions? We have this division in a lot of our Cython layers, can we think of a way to reduce it? One option would be to define generic scalar/column functions (not sure exactly what that would look like) and then passing specialized functions as arguments to handle different features. This is a very speculative idea, so no need to have an answer in this PR, but I do dislike duplicating this scalar/column division over and over.

python/cudf/cudf/_lib/utils.pyx Show resolved Hide resolved
python/cudf/cudf/core/_base_index.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions for improvement but it LGTM! I'll let you address the comments before merging.

python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
isVoid and others added 4 commits March 8, 2022 11:53
@isVoid
Copy link
Contributor Author

isVoid commented Mar 8, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b3dc9d6 into rapidsai:branch-22.04 Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants