[PyTorch] Don't use autograd hook for bwd reduction #781

ksivaman · 2024-04-15T19:16:56Z

Introduced in #575; using torch.autograd.graph.register_multi_grad_hook is the suspected reason for hangs in certain workloads. This PR uses a different design to achieve the reduction of amaxes for gradient tensors by directly calling it from the backward pass of the modules if needed.

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

cyanguwa

LGTM

ksivaman · 2024-04-15T20:54:59Z

/te-ci pytorch

Don't use autograd hook for bwd reduction Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

Don't use autograd hook for bwd reduction Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Pawel Gadzinski <[email protected]>

Don't use autograd hook for bwd reduction

22cef73

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

ksivaman requested review from timmoon10 and cyanguwa April 15, 2024 19:16

cyanguwa approved these changes Apr 15, 2024

View reviewed changes

ksivaman merged commit f69e45b into NVIDIA:main Apr 15, 2024
9 of 10 checks passed

pggPL pushed a commit to pggPL/TransformerEngine that referenced this pull request May 9, 2024

[PyTorch] Don't use autograd hook for bwd reduction (NVIDIA#781)

ebfa54c

Don't use autograd hook for bwd reduction Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Don't use autograd hook for bwd reduction #781

[PyTorch] Don't use autograd hook for bwd reduction #781

ksivaman commented Apr 15, 2024

cyanguwa left a comment

ksivaman commented Apr 15, 2024

[PyTorch] Don't use autograd hook for bwd reduction #781

[PyTorch] Don't use autograd hook for bwd reduction #781

Conversation

ksivaman commented Apr 15, 2024

cyanguwa left a comment

Choose a reason for hiding this comment

ksivaman commented Apr 15, 2024