-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Enforce and automate compatibility with Pandas methods #6135
Comments
How can we overwrite docstrings in a case where we need to mention some shortcomings or unsupported types. |
The docstring is not copied -- only the signature line. We still need to write docstrings and document the arguments we support and their behaviour. We can add a footnote to docstrings that all undocumented members are provided for Pandas compatibility, or something similar. |
is it possible to add extra arguments (apart from pandas API arguments) with |
Yes, any extra arguments that our own APIs have are automatically added to the resulting API's signature. In my example, def foo(a, b=None, c=0): # imagine `foo` is the "Pandas API"
pass
@mimic_signature(foo)
def bar(a, c=1, d=[], **kwargs): # `bar` is the equivalent "cuDF API"
print("a = ", a)
print("c = ", c)
print("d = ", d) |
This PR removes `**kwargs` from the string/categorical accessors where unnecessary, and exposes keyword arguments like `inplace` to the user directly. If we want to maintain parity with Pandas APIs for Dask/others using cuDF internally, we can consider using the approach described in #6135, which will automatically raise `NotimplementedError` when unsupported kwargs are passed. Authors: - Ashwin Srinath <[email protected]> Approvers: - GALI PREM SAGAR - Keith Kraus - Keith Kraus URL: #6750
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. |
Still relevant but it's unclear if this is the route we want to go down. So far, we have been matching the Pandas API manually. |
Lets start exploring our options for this in 0.20 |
The systematic approach for automating this compatibility is cudf.pandas testing, and in particular running the pandas test suite on cudf.pandas. We can investigate incompatibilities more systematically once #15724 is implemented. |
One of the goals of cuDF is to provide a Pandas-like API. This means that our classes and functions (including methods) should have the same names as those of Pandas. It also means that our functions should accept the same positional and keyword arguments as their Pandas counterparts.
Currently, we have functions with varying degrees of consistency with Pandas:
The problem
The way we currently enforce API consistency is "by hand", i.e., we write functions with the same signature as Pandas, and document unsupported arguments as such. A good example of this is
quantile
, where we support a subset of the arguments Pandas supports, and document the others as "non functional":The manual approach is problematic for a few reasons:
NotImplementedError
.Proposed solution
The proposal is to both enforce and automate consistency using a
@mimic_signature
decorator (still a WIP).Here it is in action:
First, the signature of
bar
is a merge of the signatures offoo
andbar
:bar
is called exactly likefoo
:But it retains its own default values for its args:
And it can be called with its own additional args:
Pros
No boilerplate. We don't have to include unsupported arguments in our signature and documentation manually. We can also have the decorator do the work of raising
NotImplementedError
if unsupported arguments with non-default values are passed. Our own implementations can remain completely unaware of unsupported arguments.We are always guaranteed to be consistent with the Pandas API.
Changes to the Pandas API (e.g., a new kwarg, or a change in the order of appearance of kwargs) don't require changes to our code. Our own API will change automatically.
Cons
The big disadvantage here is that we (cuDF developers) should be careful when using APIs decorated with
@mimic_signature
internally. It's important to remember that the signature in the function definition is not the same as the calling signature. Thus, we should always use keyword arguments explicitly. That is, we should prefer:not:
This applies only to cuDF developers as we're the only ones with visibility into the "internal" signature.
The text was updated successfully, but these errors were encountered: