-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Re-design call_hook
interface
#8506
Comments
call_hook
interfacecall_hook
interface
I'd also add to the drawbacks:
In an ideal state, do we really need |
It should be public as the users should be able to call custom hooks or change the order/location with loop customization. This is why I feel like this is an important issue to resolve - the users are limited because it currently does too much.
The only benefit then would be not having to deal with the profiler.
If we go with option (a), since the callable itself is passed to
Sure. Although I think we could do what this issue proposes which would already improve the current state - if #8509 is resolved, then potentially we would just need to remove its use. |
I agree, I don't think this and #8509 are at odds.
I think we could keep it private for now and expose this as and when the definition of custom loops is clearer. @carmocca I wonder what we really want My view:
What do you think? |
@carmocca i have a stupid question: when would someone pass a Callable vs the hook name in option A? |
I actually wasn't sure if we would need Although, if we want to do the following:
We might want to call it like |
We could also do the following: def _profile_and_call(self, obj: object, hook_name: str, *args: Any, **kwargs: Any) -> Optional[Any]:
fn = getattr(obj, hook_name)
with self.profiler.profile(f"{obj.__class__.__name__}.{hook_name}"):
return fn(*args, **kwargs)
def call_hook(
self,
obj: object,
hook_name: str,
*args: Any,
pl_module: Optional["pl.LightningModule"] = None,
**kwargs: Any,
) -> Optional[Any]:
pl_module = pl_module or self.lightning_module
if pl_module is not None:
prev_fx_name = pl_module._current_fx_name
pl_module._current_fx_name = hook_name
output = None
if isinstance(obj, Trainer):
for obj in self.callbacks:
if hook_name in ("on_init_start", "on_init_end"):
# these `Callback` hooks are the only ones that do not take a lightning module
output = self._profile_and_call(obj, hook_name, self, *args, **kwargs)
else:
output = self._profile_and_call(obj, hook_name, self, pl_module, *args, **kwargs)
else:
output = self._profile_and_call(obj, hook_name, *args, **kwargs)
if pl_module is not None:
# restore current_fx when nested context
pl_module._current_fx_name = prev_fx_name
return output Usage: self.trainer.call_hook(self.trainer, "on_before_zero_grad", optimizer)
self.trainer.call_hook(self.trainer.lightning_module, "on_before_zero_grad", optimizer) Which would allow profiling each hook per class in the case of callbacks. It would also remove the need for with the disadvantage of losing traceability |
i think the latter proposal is risky: what guarantees on does having separate functions resolve this? i think this is a slightly restricted view of option a but still addresses the drawbacks |
I don't think it would significantly. Would that be the same code split over 3 functions with an I'm a bit conflicted on this because I wanted to reduce the responsability of this method, however, managing the profiler and the profiled names complicates this. Pros:
Cons:
|
Hey @carmocca, I believe your proposal is sane and I agree any misusage would be on the caller, but at the same time the behaviour is quite clear. Best, |
@carmocca a few questions
|
I like your proposal above in #8506 (comment) (let's call it option d), since it allows us to deprecate |
Got dragged into other things. I plan to finish it for 1.6
When I say "loses traceability", I mean that tools cannot track its calls and usage anymore. Compare: # no traceability
fn = getattr(obj, "method_name")
fn()
# vs
# Can be understood by IDEs
obj.method_name()
Every hook should, if it doesn't then it's an oversight (hook == user-overrideable method with no default impl).
Sure. |
@ananthsub are you okay with option d above in #8506 (comment)? We can add a check to make sure this is only called on Callback, LM, or Accelerator? |
Another thought I had is do we expect users to sometimes want to call a hook on a callback directly? i.e. instead of calling |
Also @carmocca why do we need to pass |
Mostly for legacy reasons. Some methods of the |
Ah, sorry I meant |
Thanks for explaining why we need to pass
Got it, do you think it would be feasible to remove support for it? I think it would be nice if we can, separate from this issue, since it can be confusing that some functions are passed a LM and others just use self.lightning_module, resulting in lots of lines like this |
@carmocca @ananthsub I have a first draft implementation in #10575, would appreciate if you could take a look. I created a new PR since #9029 is quite old and has a lot of merge conflicts |
Haven't followed the full discussion, came here from the PR review. I find it irritating that we are choosing the call_hook to be protected yet accessing it everywhere as if it's public. A protected member is meant to be accessed only within the class itself. This pattern is recurring in Lightning and becoming a standard, at which point people are probably annoyed that I bring it up again and again. I get it that you don't want to make the methods public, but have you considered making it a function instead? The function would be public and take the trainer instance as input. |
@awaelchli I agree, we should not be accessing protected methods from outside the class. Why would making them public functions be better than public methods on the Trainer class? |
Since we want to call hooks from components outside the Trainer, and a trainer instance is required to do it, there would naturally be two options:
Option 1) does not go well these days, because #7740 has put a lot of pressure not to expose any Trainer attributes to the user unless highly stable. |
I do prefer having a method on the trainer as only the trainer itself and the loops (which have a
It's true, however, the loops right now include many methods important for customization which are "protected", yet we advertise loop customization. Just as we should review these Loop's protected methods to make them public at some point in the future, we can do the same for call hook right now, start protected and reevaluate later. Not a strong opinion anyways. |
🚀 Feature
The current
Trainer.call_hook
implementation has several drawbacks:Callback
hooks are called beforeLightningModule
hooks of the same name, however, this is not always true. An example ison_load_checkpoint
, where the order is the opposite.Callback
hook API) with the same name as the hook to call will be silently called. This is potentially dangerous if somebody names a trainer method with the same name as a hook.setup
andteardown
. Also dangerous.Callback
hook to return somethinghttps://github.com/PyTorchLightning/pytorch-lightning/blob/e1442d247e0e4967dd2772bdcf5166226c974f89/pytorch_lightning/trainer/trainer.py#L1282-L1317
Motivation
The previously mentioned reasons mean that the current implementation is error-prone. Additionally, when we support loop customization, we will want users to be able to call these hooks themselves.
Pitch
Provide a lambda call_hook function that just handles any necessary state maintenance before the hook call.
Option (a):
The state maintenance is currently just the
_current_fx_name
required forself.log()
This option means that the code will be more verbose but the calls will be made explicit and the user will have more responsibility.
This extra responsibility means that the user might break things if the hooks are not called in the proper order - whatever the proper order is for the hook in particular
Alternatives
Option (b):
Does not resolve (1) and (4)
Option (c):
Does not resolve (1) and (4)
Questions
Do the profilers merge profile timings with the same name?
If you enjoy PL, check out our other projects:
cc: @tchaton @awaelchli @ananthsub
cc @Borda @justusschock @awaelchli @akihironitta @tchaton
The text was updated successfully, but these errors were encountered: