Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Profiler Support for Performance Analysis #2883

Merged
merged 27 commits into from
Jul 1, 2024

Conversation

yhna940
Copy link
Contributor

@yhna940 yhna940 commented Jun 23, 2024

What does this PR do?

This PR introduces profiler support to the Accelerate library, enabling users to collect performance metrics during model training and inference. The profiler allows for detailed analysis of execution time and memory consumption of model operators and can generate profiling traces for visualization in Chrome's tracing tool. Additionally, it provides options to profile long-running jobs with customizable scheduling options.

profile_export

Key changes include:

  • Addition of ProfileKwargs to customize profiling options.
  • Implementation of profiling context manager in the Accelerate class.
  • New documentation explaining how to use the profiler with examples.
  • Updated examples to demonstrate profiling capabilities (profiler.py).
  • Tests for profiling functionality with various scheduling options.

Context and Motivation

Other frameworks like MMEngine and PyTorch Lightning offer profiling techniques based on the Torch Profiler. Inspired by these tools, we aimed to bring similar profiling capabilities to Accelerate. This enhancement helps users optimize and improve model performance by providing insights into the computational and memory aspects of their models.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this cool feature. At a first glance, the PR already looks super clean, really great job.

I checked the rendered profiler docs and there seems to be an issue with </hfoption>, not sure what's going on there. Could you please check? Edit: It seems you were faster than me :)

I also saw that the dates of the copyright headers you added are outdated, could you please update them to 2024?

Otherwise, this PR looks great to me. I don't have a ton of experience with the profiler -- except that I found that it can be super slow :D. In terms of the design of the feature, it looks quite nice to me, but let's wait for Zach's return to office for a full review.

@yhna940
Copy link
Contributor Author

yhna940 commented Jun 24, 2024

Thanks for implementing this cool feature. At a first glance, the PR already looks super clean, really great job.

I checked the rendered profiler docs and there seems to be an issue with </hfoption>, not sure what's going on there. Could you please check? Edit: It seems you were faster than me :)

I also saw that the dates of the copyright headers you added are outdated, could you please update them to 2024?

Otherwise, this PR looks great to me. I don't have a ton of experience with the profiler -- except that I found that it can be super slow :D. In terms of the design of the feature, it looks quite nice to me, but let's wait for Zach's return to office for a full review.

Thank you for the quick review @BenjaminBossan !

I appreciate you pointing out the issue with the </hfoption> tag in the docs. I've corrected it now. Additionally, I've updated the copyright headers to 2024 as requested.

However, I noticed that one image in the docs is broken. Could you please provide some guidance on how to fix this? 🙏

Regarding the profiler's performance, I'm also interested in investigating the slowdown you mentioned. On my local server, it seems to work fine. Could you provide more details on where you're experiencing the slowness?

Thanks again for your feedback :)

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good already, I have only some minor comments. As mentioned, the full review should come from Zach when he's back.

However, I noticed that one image in the docs is broken.

I wonder if it would ahve been okay as it was, once the PR would be merged. Currently, the image is indeed not at the original location, but that would change after the merge. Unfortunately, I couldn't find any example in the existing docs to verify this.

Regarding the profiler's performance, I'm also interested in investigating the slowdown you mentioned. On my local server, it seems to work fine. Could you provide more details on where you're experiencing the slowness?

From memory, when I tested it, I found some sources of slow down:

  • Processing the data seems to take quite some time when the model is big.
  • Some options (IIRC with_stack) would considerably slow down the run itself.
  • The generated trace could be huge (>1GB) and bring Chrome to its knees.

Of course, this is to be expected to some extent but it made me not want to use the feature ;)

@@ -0,0 +1,254 @@
# Copyright 2021 The HuggingFace Inc. team. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still old year.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 20f377b

os.makedirs(profile_handler.output_trace_dir, exist_ok=True)
profiler.export_chrome_trace(
os.path.join(
profile_handler.output_trace_dir, PROFILE_PATTERN_NAME.format(suffix=f"_{self.process_index}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding the _ to the PROFILE_PATTERN_NAME directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 9cfbbf3



@dataclass
class ProfileKwargs(KwargsHandler):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this docstring to be included in the API docs, you need to add an entry to docs/source/package_references/kwargs.md.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 06a2430

@@ -248,6 +249,10 @@ def test_early_stopping(self):
testargs = ["examples/by_feature/early_stopping.py"]
run_command(self.launch_args + testargs)

def test_profiler(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least on my machine, this takes a significant amount of time to run, should @slow be added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :) I totally agree with your comment. I added a slow decorator to the function.

def on_trace_ready(prof):
nonlocal count
count += 1
print(prof.key_averages().table(sort_by="cpu_memory_usage", row_limit=-1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder: Instead of printing, could we, for instance, just add this to a list, then do a basic sanity check on the output, for example what columns we should expect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to check if CPU time total: exists.

@yhna940
Copy link
Contributor Author

yhna940 commented Jun 25, 2024

Thank you for the review and the feedback! I appreciate the insights and will wait for Zach's full review for any additional comments or suggestions.

I will conduct some experiments to further investigate the performance impacts of with_stack and data processing times.

Additionally, I propose updating the user guide to include tips or warnings about handling large or complex tasks using the PyTorch profiler's API for long-running jobs. As highlighted in the PyTorch official documentation:

PyTorch profiler offers an additional API to handle long-running jobs (such as training loops). Tracing all of the execution can be slow and result in very large trace files. To avoid this, use optional arguments:

  • schedule - specifies a function that takes an integer argument (step number) as an input and returns an action for the profiler, the best way to use this parameter is to use the torch.profiler.schedule helper function that can generate a schedule for you;
  • on_trace_ready - specifies a function that takes a reference to the profiler as an input and is called by the profiler each time the new trace is ready.

By updating the documentation with these details, we can provide more comprehensive guidance to users, helping them to effectively manage performance profiling for their models.

Let me know if there's anything else I should address or if you have further suggestions!

Thanks again for the review :)

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the updates. From my point of view, this looks good. As mentioned, we'll wait for Zach for the final review.

I will conduct some experiments to further investigate the performance impacts of with_stack and data processing times.

By updating the documentation with these details, we can provide more comprehensive guidance to users, helping them to effectively manage performance profiling for their models.

I wouldn't put too much time into this, as I'm sure this is a moving target and we can't constantly monitor the PyTorch docs for updates. I think it's sufficient to refer users to the PyTorch docs for caveats and performance tips.

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an exceedingly clean, thorough, and excellent PR @yhna940, very nice work! Let's get this in 🔥

cc @stas00

@muellerzr muellerzr merged commit 5d5d07a into huggingface:main Jul 1, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants