-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Offer more control over CPU fallback in cudf.pandas #14975
Comments
A python Note we are not |
@lmeyerov |
Re:cudf, Some reason I thought a few cudf methods will fall back to CPU, like in parsing or others, rather than throwing NotImpl or a warning Seperately / more broadly, there are some perf gotchas in cudf like where it makes copies or sorts that good code would avoid. A perf tips flag/mode that warns in these cases would be helpful for us, not just for the CPU fallback case. But that is a bigger story. |
Good feedback! There are a few cases in I/O where cudf does not offer a GPU-accelerated reader/writer for every format. That's the only exception I can think of right now where cudf executes CPU-only code (it copies to device and returns a GPU dataframe at the end). Those are documented in the notes on this page: https://docs.rapids.ai/api/cudf/stable/user_guide/io/io/ I can think of a few algorithms where cudf has cut down on extraneous copies/sorting over the last few releases (like |
Yes, my meta is perf warnings mode, like when defaults are slow for conformance reasons and a special calling pattern would make faster, would be very helpful :) |
If it's okay with you @mroeschke, can I still work on this component since it covers the issue I opened? |
Yes go for it @Matt711! |
We could have two debugging mode options (note: we can use different names):
(1.) is for when fallback does not occur. It checks that the results from cudf and pandas agree and returns a warning if they do not. I'm working on that option in this PR #15837 . (2.) is for when fallback does occur. It could return errors on the specific types of fallback mentioned:
What do we think about these two options? |
Making these modes independently configurable is definitely what we want, yes. As I commented on this in #15837, though, I don't think options are the right way to expose this. options are user-facing, whereas what we're trying to accomplish here is something for developers. Some environment variables documented in the developer guide are probably closer to what I would envision, especially for the first one (pandas_debugging). I don't see a reason for a user to ever need that one. I could envision exposing some internal APIs to control the second case (fallback_debugging) because in that scenario it could be useful to have the profiler hook into these so that users could collect information on why fallback occurred. |
Using an environment variable instead of an option is fine with me. I am curious if you have a more specific place in mind in the Developer Guide for documenting the environment variable? |
Maybe we can add a new section on the fast-slow-proxy wrapping scheme. It can be mostly stubbed out and we can add info. |
Yes, and I could add that in a new cudf.pandas section in the Developer Guide? |
This PR provides documentation for cudf.pandas in the Developer Guide. It will describe the fast-slow proxy wrapping scheme as well as document the `CUDF_PANDAS_DEBUGGING` environment variable created in PR #15837 for issue #14975. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) URL: #15889
#15837) Part of #14975 This PR adds a pandas debugging option to `_fast_slow_function_call` that runs the slow path after the fast and returns a warning if the results differ. Authors: - Matthew Murray (https://github.com/Matt711) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #15837
#16562) This PR makes more on #14975 by adding an environment variable that fails when fallback occurs in cudf.pandas. It also adds some tests that do __not__ fallback. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #16562
Thanks for the reminder! I'll create a PR that raises on specific kinds of fallback, which I think should close this issue. |
Is your feature request related to a problem? Please describe.
The default execution model for
cudf.pandas
is to try to execute an operation on the GPU, then fall back to the CPU if it fails for any reason. This approach is desirable for end-users to maximize the number of cases wherecudf.pandas
"just works", but it makes it difficult to analyze when failures are occurring and why. The former can be addressed by running under the profiler, but that is more cumbersome than we would like in many cases where we would rather get a quick signal in the form of failure (e.g. when running a workflow or a test suite to analyze unsupported cases). Furthermore, there is no easy way to determine whether cudf and pandas return the same results for a given operation, which is a different failure mode that is currently not possible to capture.Describe the solution you'd like
We should generalize
_fast_slow_function_call
to support a wider range of fallback options. These options could be configurable by an environment variable, or by some global configuration option (the former is probably fine to start with). The different behaviors we would want to support are:We may want to support warning instead of raising errors in some cases, but I don't think that's critical to start.
Describe alternatives you've considered
This could be configured by the cudf.pandas profiler, or a similar context manager?
Additional context
Feedback from @ianozsvald and @lmeyerov would be welcome!
The text was updated successfully, but these errors were encountered: