Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate method parameters to DataFrame.join, DataFrame.merge. #9291

Merged
merged 7 commits into from
Oct 1, 2021

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Sep 23, 2021

The method parameter to DataFrame.join and DataFrame.merge isn't used internally after changes in #7454. This PR updates the docstrings and adds deprecation notices via FutureWarning as discussed in #9347.

The parameter is now deprecated in the public API. I removed all internal uses of the method parameter.

@bdice bdice added 3 - Ready for Review Ready for review by team doc Documentation labels Sep 23, 2021
@bdice bdice requested a review from a team as a code owner September 23, 2021 20:28
@github-actions github-actions bot added the Python Affects Python cuDF API. label Sep 23, 2021
@bdice bdice added 2 - In Progress Currently a work in progress and removed 3 - Ready for Review Ready for review by team labels Sep 23, 2021
@bdice bdice marked this pull request as draft September 23, 2021 20:47
@codecov
Copy link

codecov bot commented Sep 23, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@079af45). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.12    #9291   +/-   ##
===============================================
  Coverage                ?   10.77%           
===============================================
  Files                   ?      116           
  Lines                   ?    19394           
  Branches                ?        0           
===============================================
  Hits                    ?     2090           
  Misses                  ?    17304           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 079af45...4c060c0. Read the comment docs.

@bdice bdice changed the title Add docstring for method argument to DataFrame.join. Indicate for method parameters to DataFrame.join, DataFrame.merge are unused. Sep 24, 2021
@bdice bdice changed the title Indicate for method parameters to DataFrame.join, DataFrame.merge are unused. Indicate method parameters to DataFrame.join, DataFrame.merge are unused. Sep 24, 2021
@bdice bdice self-assigned this Sep 24, 2021
@bdice bdice added the non-breaking Non-breaking change label Sep 28, 2021
@bdice bdice marked this pull request as ready for review September 28, 2021 16:51
@bdice bdice added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 28, 2021
@bdice
Copy link
Contributor Author

bdice commented Sep 28, 2021

I just noticed the method argument is tested here:

@pytest.mark.parametrize("aa,bb,how,method", make_params())
def test_dataframe_join_how(aa, bb, how, method):

The tests should be updated to skip the method argument, then this will be ready for a final pass of review.

@bdice
Copy link
Contributor Author

bdice commented Sep 29, 2021

rerun tests

@bdice
Copy link
Contributor Author

bdice commented Sep 29, 2021

@charlesbluca @brandon-b-miller This is now ready for review.

Copy link
Contributor

@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a kwarg for this in pandas. Looking back through git it looks like this is a legacy parameter leftover from pygdf days. I do not see anywhere in the libcudf join header that consumes this either.

Should we deprecate or deprecate and remove this instead?

@bdice
Copy link
Contributor Author

bdice commented Sep 29, 2021

There doesn't seem to be a kwarg for this in pandas. Looking back through git it looks like this is a legacy parameter leftover from pygdf days. I do not see anywhere in the libcudf join header that consumes this either.

Should we deprecate or deprecate and remove this instead?

@shwina suggested "I think we want to expose it, but perhaps with a warning that it does literally nothing." @shwina do you have additional thoughts on this? I vote for deprecation here.

@brandon-b-miller
Copy link
Contributor

At some point we discussed what a python interface to hash_join would look like. Could be that we left it in planning to convert it over to be part of that API. But there were nuances around the memory behaving in puzzling ways to the user IIRC and we never built it out. I would still be weakly in favor of removing it until then though.

Generally the pattern is that we write the function signature such that it either matches the pandas API exactly or matches exactly plus a superset of kwargs. Any parameters we don't support we detect and throw. this avoids users converting their code from pandas and possibly picking up quiet bugs from args/kwargs not being in the right spots.

@shwina
Copy link
Contributor

shwina commented Sep 29, 2021

This was my mistake @bdice. When we discussed this, my assumption was that Pandas exposed a method= parameter, and we needed to match that API. As (TIL) this is not the case, can we modify this PR to throw a deprecation warning instead? Let's plan to remove it entirely in the release after 21.12.

@bdice
Copy link
Contributor Author

bdice commented Sep 29, 2021

@shwina No problem! I'll make that addition. 👍

@bdice bdice changed the title Indicate method parameters to DataFrame.join, DataFrame.merge are unused. Deprecate method parameters to DataFrame.join, DataFrame.merge. Oct 1, 2021
@shwina
Copy link
Contributor

shwina commented Oct 1, 2021

I created #9353 to track removal.

@shwina
Copy link
Contributor

shwina commented Oct 1, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 34b54ca into rapidsai:branch-21.12 Oct 1, 2021
@bdice bdice deleted the join-method-docstring branch October 4, 2021 13:39
rapids-bot bot pushed a commit that referenced this pull request Oct 5, 2021
This PR improves deprecation warnings. I fixed some typos to make it easier to grep for deprecations, and added the warning type `DeprecationWarning` where appropriate. (Found these issues while looking for examples of deprecations for #9291.)

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Ashwin Srinath (https://github.com/shwina)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #9347
rapids-bot bot pushed a commit that referenced this pull request Jan 13, 2022
This PR removes the deprecated `method` parameter from `DataFrame.merge` and `DataFrame.join`. This resolves #9353 and follows up on #9291.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #9944
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team doc Documentation non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants