-
Notifications
You must be signed in to change notification settings - Fork 976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Profiler Support for Performance Analysis #2883
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this cool feature. At a first glance, the PR already looks super clean, really great job.
I checked the rendered profiler docs and there seems to be an issue with </hfoption>
, not sure what's going on there. Could you please check? Edit: It seems you were faster than me :)
I also saw that the dates of the copyright headers you added are outdated, could you please update them to 2024?
Otherwise, this PR looks great to me. I don't have a ton of experience with the profiler -- except that I found that it can be super slow :D. In terms of the design of the feature, it looks quite nice to me, but let's wait for Zach's return to office for a full review.
Thank you for the quick review @BenjaminBossan ! I appreciate you pointing out the issue with the However, I noticed that one image in the docs is broken. Could you please provide some guidance on how to fix this? 🙏 Regarding the profiler's performance, I'm also interested in investigating the slowdown you mentioned. On my local server, it seems to work fine. Could you provide more details on where you're experiencing the slowness? Thanks again for your feedback :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good already, I have only some minor comments. As mentioned, the full review should come from Zach when he's back.
However, I noticed that one image in the docs is broken.
I wonder if it would ahve been okay as it was, once the PR would be merged. Currently, the image is indeed not at the original location, but that would change after the merge. Unfortunately, I couldn't find any example in the existing docs to verify this.
Regarding the profiler's performance, I'm also interested in investigating the slowdown you mentioned. On my local server, it seems to work fine. Could you provide more details on where you're experiencing the slowness?
From memory, when I tested it, I found some sources of slow down:
- Processing the data seems to take quite some time when the model is big.
- Some options (IIRC
with_stack
) would considerably slow down the run itself. - The generated trace could be huge (>1GB) and bring Chrome to its knees.
Of course, this is to be expected to some extent but it made me not want to use the feature ;)
examples/by_feature/profiler.py
Outdated
@@ -0,0 +1,254 @@ | |||
# Copyright 2021 The HuggingFace Inc. team. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still old year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 20f377b
src/accelerate/accelerator.py
Outdated
os.makedirs(profile_handler.output_trace_dir, exist_ok=True) | ||
profiler.export_chrome_trace( | ||
os.path.join( | ||
profile_handler.output_trace_dir, PROFILE_PATTERN_NAME.format(suffix=f"_{self.process_index}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding the _
to the PROFILE_PATTERN_NAME
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 9cfbbf3
|
||
|
||
@dataclass | ||
class ProfileKwargs(KwargsHandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this docstring to be included in the API docs, you need to add an entry to docs/source/package_references/kwargs.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 06a2430
@@ -248,6 +249,10 @@ def test_early_stopping(self): | |||
testargs = ["examples/by_feature/early_stopping.py"] | |||
run_command(self.launch_args + testargs) | |||
|
|||
def test_profiler(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on my machine, this takes a significant amount of time to run, should @slow
be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you :) I totally agree with your comment. I added a slow
decorator to the function.
tests/test_kwargs_handlers.py
Outdated
def on_trace_ready(prof): | ||
nonlocal count | ||
count += 1 | ||
print(prof.key_averages().table(sort_by="cpu_memory_usage", row_limit=-1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder: Instead of printing, could we, for instance, just add this to a list, then do a basic sanity check on the output, for example what columns we should expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to check if CPU time total:
exists.
Thank you for the review and the feedback! I appreciate the insights and will wait for Zach's full review for any additional comments or suggestions. I will conduct some experiments to further investigate the performance impacts of Additionally, I propose updating the user guide to include tips or warnings about handling large or complex tasks using the PyTorch profiler's API for long-running jobs. As highlighted in the PyTorch official documentation:
By updating the documentation with these details, we can provide more comprehensive guidance to users, helping them to effectively manage performance profiling for their models. Let me know if there's anything else I should address or if you have further suggestions! Thanks again for the review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the updates. From my point of view, this looks good. As mentioned, we'll wait for Zach for the final review.
I will conduct some experiments to further investigate the performance impacts of with_stack and data processing times.
By updating the documentation with these details, we can provide more comprehensive guidance to users, helping them to effectively manage performance profiling for their models.
I wouldn't put too much time into this, as I'm sure this is a moving target and we can't constantly monitor the PyTorch docs for updates. I think it's sufficient to refer users to the PyTorch docs for caveats and performance tips.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this PR do?
This PR introduces profiler support to the Accelerate library, enabling users to collect performance metrics during model training and inference. The profiler allows for detailed analysis of execution time and memory consumption of model operators and can generate profiling traces for visualization in Chrome's tracing tool. Additionally, it provides options to profile long-running jobs with customizable scheduling options.
Key changes include:
ProfileKwargs
to customize profiling options.Accelerate
class.profiler.py
).Context and Motivation
Other frameworks like MMEngine and PyTorch Lightning offer profiling techniques based on the Torch Profiler. Inspired by these tools, we aimed to bring similar profiling capabilities to Accelerate. This enhancement helps users optimize and improve model performance by providing insights into the computational and memory aspects of their models.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.