Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx #7081

Conversation

deepcharm
Copy link
Contributor

@deepcharm deepcharm commented Feb 26, 2025

This PR is a continuation of the efforts to improve Deepspeed performance when using PyTorch compile.

The instrument_w_nvtx decorator is used to instrument code with NVIDIA Tools Extension (NVTX) markers for profiling and visualizing code execution on GPUs.

Along with executing the function itself, instrument_w_nvtx makes calls to nvtx.range_push and nvtx.range_pop which can't be traced by Dynamo.

That's why this decorator causes a graph break.
The impact on performance can be significant due to numerous uses of the decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions when torch is compiling.

This PR is a continuation of the effort to improve
Deepspeed performance when using PyTorch compile.

The instrument_w_nvtx decorator is used to instrument
code with NVIDIA Tools Extension (NVTX) markers for profiling
and visualizing code execution on GPUs.

Along with executing the function itself, instrument_w_nvtx
makes calls to nvtx.range_push and nvtx.range_pop which
can't be traced by Dynamo.

That's why this decorator causes a graph break.
The impact on performnace can be significant due to numerous
uses of the decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions
when torch is compiling.

Signed-off-by: Max Kovalenko <[email protected]>
tjruwase
tjruwase previously approved these changes Feb 26, 2025
@tjruwase tjruwase dismissed their stale review February 26, 2025 16:51

Requested usage of DeepSpeed utility to address CI failures.

@loadams loadams enabled auto-merge March 3, 2025 19:28
@loadams loadams added this pull request to the merge queue Mar 3, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 3, 2025
@loadams loadams added this pull request to the merge queue Mar 3, 2025
Merged via the queue into deepspeedai:master with commit a88f56a Mar 3, 2025
10 checks passed
ys950902 pushed a commit to ys950902/DeepSpeed that referenced this pull request Mar 6, 2025
…deepspeedai#7081)

This PR is a continuation of the efforts to improve Deepspeed
performance when using PyTorch compile.

The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA
Tools Extension (NVTX) markers for profiling and visualizing code
execution on GPUs.

Along with executing the function itself, `instrument_w_nvtx` makes
calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by
Dynamo.

That's why this decorator causes a graph break.
The impact on performance can be significant due to numerous uses of the
decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions when
torch is compiling.

---------

Signed-off-by: Max Kovalenko <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Signed-off-by: yisheng <[email protected]>
raza-sikander pushed a commit to raza-sikander/DeepSpeed that referenced this pull request Mar 6, 2025
…deepspeedai#7081)

This PR is a continuation of the efforts to improve Deepspeed
performance when using PyTorch compile.

The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA
Tools Extension (NVTX) markers for profiling and visualizing code
execution on GPUs.

Along with executing the function itself, `instrument_w_nvtx` makes
calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by
Dynamo.

That's why this decorator causes a graph break.
The impact on performance can be significant due to numerous uses of the
decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions when
torch is compiling.

---------

Signed-off-by: Max Kovalenko <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
raza-sikander pushed a commit to raza-sikander/DeepSpeed that referenced this pull request Mar 6, 2025
…deepspeedai#7081)

This PR is a continuation of the efforts to improve Deepspeed
performance when using PyTorch compile.

The `instrument_w_nvtx` decorator is used to instrument code with NVIDIA
Tools Extension (NVTX) markers for profiling and visualizing code
execution on GPUs.

Along with executing the function itself, `instrument_w_nvtx` makes
calls to `nvtx.range_push` and `nvtx.range_pop` which can't be traced by
Dynamo.

That's why this decorator causes a graph break.
The impact on performance can be significant due to numerous uses of the
decorator throughout the code.

We propose a simple solution: Don't invoke the sourceless functions when
torch is compiling.

---------

Signed-off-by: Max Kovalenko <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Signed-off-by: Shaik Raza Sikander <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants