-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose MultiDeviceExecutor
and overlapped AG+matmul to python API
#3923
base: main
Are you sure you want to change the base?
Conversation
!test |
Review updated until commit d27031f Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
416eaec
to
bbb7ac1
Compare
!test |
MultiDeviceExecutor
to python APIMultiDeviceExecutor
and overlapped AG+matmul to python API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for changing the Python side of the thing! I have one question for @rdspring1. Depending on that, I can either merge this or do some more cleanups in a separate PR before merging this.
@@ -367,6 +368,11 @@ class NVF_API FusionDefinition : public FusionState { | |||
|
|||
private: | |||
mutable std::optional<std::string> debug_output_ = std::nullopt; | |||
|
|||
public: | |||
//! (Experimental) toggle using MultiDeviceExecutor directly instead of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, I'd create an enable option to let the Python frontend use MultiDeviceExecutor. Enable options can be controlled by the user like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see how this fits in the _enable_option
param which needs to correspond to options in csrc/options.h
, whereas our toggle only deals with python API's internal.
Can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused that this was a per-execution flag. Now I agree inside FusionDefinition makes sense.
Can we specify "use multidevice executor" via the constructor? This way, readers can know for sure the underlying executor won't be switched over. While it's technically possible to fd.execute via FEC
, set_mde
, and then fd.execute via MDE
, I'd like to keep the support simple to save troubles.
Given that multi-device executor is separate from our mainline of execution, could we use a separate execute function completely until the functionality is integrated into FusionExecutorCache? We are very sensitive to adding latency to our hot path from python down the stack. |
I suspect we'll need to replicate a lot of code from |
…eEXecutor_to_python
I agree with Jingyue. For now the patch only adds one boolean check in the data path so it shouldn't affect performance at all. Adding a whole new "execute" function to the API sounds much bigger of change for something that can fit in the current structure |
!test |
!test |
!test |
There are merge conflicts in your PR so the build couldn't start. 😢 |
…eEXecutor_to_python
…DIA/Fuser into expose_MultiDeviceEXecutor_to_python
!test |
As requested by myself at #3923 (comment) This PR tries to fix the Python frontend to construct `FusionExecutorCache`s with a ready-to-run fusion. Previously, the code constructs FusionExecutorCache with an empty fusion and populates it later. This is not how we normally use and test FusionExecutorCache and is indeed problematic in some situations. For example, `FusionExecutorCache::exact_map_` was always empty because the constructor thought it was going to run an empty fusion. See one of the tests fixed by this PR. cc @samnordmann
MultiDeviceExecutor
instead ofFusionExecutorCache
to the python API. The feature is disabled by default -- to activate it, call FusionDefinition.use_multidevice_executor() in the python fusion instance.