-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable patching #27
Enable patching #27
Conversation
I was testing this branch out and I think it may have some issues. I added some prints in Dask to help me with debugging: diff --git a/dask/dataframe/__init__.py b/dask/dataframe/__init__.py
index 87e945ff..3a619ad2 100644
--- a/dask/dataframe/__init__.py
+++ b/dask/dataframe/__init__.py
@@ -161,6 +161,7 @@ else:
raise ImportError(msg) from e
if dask.config.get("dataframe.query-planning-warning"):
+ print("dask.dataframe warning enabled", flush=True)
warnings.warn(
"""The current Dask DataFrame implementation is deprecated.
In a future release, Dask DataFrame will use a new implementation that
@@ -196,6 +197,10 @@ To disable this warning in the future, set dask config:
DeprecationWarning,
stacklevel=2,
)
+ else:
+ print("dask.dataframe warning disabled", flush=True)
from dask.dataframe._testing import test_dataframe
+
+print("dask.dataframe imported", flush=True) with the changes above and current >>> import dask
>>> import dask.dataframe
dask.dataframe warning enabled
<stdin>:1: DeprecationWarning: The current Dask DataFrame implementation is deprecated.
In a future release, Dask DataFrame will use a new implementation that
contains several improvements including a logical query planning.
The user-facing DataFrame API will remain unchanged.
The new implementation is already available and can be enabled by
installing the dask-expr library:
$ pip install dask-expr
and turning the query planning option on:
>>> import dask
>>> dask.config.set({'dataframe.query-planning': True})
>>> import dask.dataframe as dd
API documentation for the new implementation is available at
https://docs.dask.org/en/stable/dask-expr-api.html
Any feedback can be reported on the Dask issue tracker
https://github.com/dask/dask/issues
To disable this warning in the future, set dask config:
# via Python
>>> dask.config.set({'dataframe.query-planning-warning': False})
# via CLI
dask config set dataframe.query-planning-warning False
dask.dataframe imported
>>> In the case above, >>> import dask
>>> import dask.dataframe
>>> dask.test_attr
'hello world'
>>> It seems that with the changes here |
It's likely because I wasn't careful in my handling of submodules and am only dealing with the top-level package right now. |
I've pushed fixes that should resolve the To be clear, I don't know enough about all the ways in which dask/distributed can be imported/interact to know the exact right solutions here. I'll definitely need help from you all in writing the logic so that it only intercepts the desired imports and that it only patches the required modules. For instance, with the current version of the code every submodule of dask and distributed will also have a Currently I have dask and distributed patches separately, but it might instead make sense to just have a single list of patches and require each patch to determine what action to take based on the module name. |
Also just FYI this approach will not work with editable installs. I assume nobody is trying that, just a warning (I don't know how to make that work). |
I added a tests module. Please feel free to check out this code and push additional tests that you think are appropriate. I can help fix failing cases once I know what they are. |
Thanks Vyas! I can confirm the import now works, but for some reason On a separate note I was thinking of one problem that we may (or at least I did) have overlooked: with this change we now change Dask's behavior to everyone who has |
I just ran into this as well. We do want to avoid having an affect on "normal" dask/distributed behavior unless it is behavior that we are specifically targeting in a patch.
This is definitely an important consideration to make when deciding to add a new patch. For the case of a "hot fix" that we would have preferred to merge into dask/distributed proper if time allowed, this is a no-brainer to me: There is little to no downside in modifying CPU-only behavior. For the "query-planning" config change, this is a much tougher question to answer. While I am very comfortable silencing the deprecation warning for our CPU and GPU users alike, I am less comfortable changing the default for dask>=2024.3.0. It may ultimately make sense to embrace this |
Is the dask warning you're referring to a true Python DeprecationWarning? If so, I would be curious how it's being surfaced by default. The default behavior of DeprecationWarning in Python is to be hidden. If dask is not using a custom warning class that looks like |
IMO if someone is installing RAPIDS and dask into the same environment it is OK to have patching take effect even if they're not using RAPIDS. There's really no meaningful way to differentiate those two cases. Even our more drastic solutions to the patching problem (like forking) would have the same problem. |
I think we should generally leave this discussion for later, but can you elaborate why is that @rjzamora ?
@vyasr This is what they're doing: https://github.com/dask/dask/blob/d66c5c88906ed51b5cb47dcfa0030717657675d2/dask/dataframe/__init__.py#L173-L207 AFAIK, you don't need to do anything besides adjusting to the proper |
I'm not stating that we shouldn't do it, just that it has larger consequences than the other cases mentioned, and so I am relatively less comfortable. With that said, I tend to agree with Vyas that this is no worse than forking/publishing our own proxy version of dask. |
Yes, I agree with that statement as well, but this is something I didn't consider for either case (forking or pth) until earlier today. If you both think we're still ok eventually disabling the query-planning warning even though this will disable for CPU-only users who install RAPIDS packages as well, I'm ok with that too but wanted to raise this point to make sure everyone is aware of it. |
- This rolls back the changes in #25 until we have a plan to deal with the loud `dask.dataframe` deprecation warning in `dask>=2024.2.0` - This adds `dask_expr` as a dependency in preparation for rapidsai/cudf#14805 (happy to add this in a follow-up PR if others prefer) - We may be able to remove the pin for 24.04 if/when #27 is merged (fingers crossed). However, the *default* plan is still to keep the 2024.1.1 dask pin in place until 24.06 (see: #26)
The reason the DeprecationWarning is currently being hidden is that DeprecationWarnings in Python are hidden by default unless they are in
but if I put that same code into a file and import it, it won't display because the warning is coming from one level deeper in the stack, not
If we wanted the warning to show in the second case, we could set the stacklevel so that Python would instead check whether one frame up is
The problem with the patching approach in this PR is that it adds many extra stack frames to the import of We have a few options for how to deal with this:
|
Signed-off-by: Vyas Ramasubramani <[email protected]>
Signed-off-by: Vyas Ramasubramani <[email protected]>
Signed-off-by: Vyas Ramasubramani <[email protected]>
Signed-off-by: Vyas Ramasubramani <[email protected]>
Signed-off-by: Vyas Ramasubramani <[email protected]>
Signed-off-by: Vyas Ramasubramani <[email protected]>
Signed-off-by: Vyas Ramasubramani <[email protected]>
Signed-off-by: Vyas Ramasubramani <[email protected]>
I think this PR is good to go on my end. Packaging has been fully updated. I'll leave it as a draft for now so that we don't accidentally merge before 24.06. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to me. Just some possible/minor README suggestions.
Co-authored-by: Richard (Rick) Zamora <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this @vyasr !
/merge |
This PR adds the ability to monkey-patch imports of dask and distributed whenever those imports occur by simply installing rapids-dask-dependency. There's a tiny bit of scope creep here because this PR added real Python code to the repo for the first time, so I also added pre-commit hooks that in turn modified some unrelated files (only minimally, though). TODO: - [x] Update conda CI and packaging - [ ] Stress test extensively --------- Signed-off-by: Vyas Ramasubramani <[email protected]> Co-authored-by: Richard (Rick) Zamora <[email protected]>
This PR adds the ability to monkey-patch imports of dask and distributed whenever those imports occur by simply installing rapids-dask-dependency. There's a tiny bit of scope creep here because this PR added real Python code to the repo for the first time, so I also added pre-commit hooks that in turn modified some unrelated files (only minimally, though). TODO: - [x] Update conda CI and packaging - [ ] Stress test extensively --------- Signed-off-by: Vyas Ramasubramani <[email protected]> Co-authored-by: Richard (Rick) Zamora <[email protected]>
Currently, the tests added in rapidsai#27 (backported to 24.04 in rapidsai#36) do not check the exit codes of their subprocesses. This means that failing tests are not caught. This PR fixes the test utilities to check the exit codes and print any stdout/stderr outputs.
This PR backports #27, #37, and #39 to 24.04 --------- Signed-off-by: Vyas Ramasubramani <[email protected]> Co-authored-by: Richard (Rick) Zamora <[email protected]> Co-authored-by: Bradley Dice <[email protected]>
This PR adds the ability to monkey-patch imports of dask and distributed whenever those imports occur by simply installing rapids-dask-dependency.
There's a tiny bit of scope creep here because this PR added real Python code to the repo for the first time, so I also added pre-commit hooks that in turn modified some unrelated files (only minimally, though).
TODO: