Add Profiler Support for Performance Analysis #2883

yhna940 · 2024-06-23T10:42:35Z

What does this PR do?

This PR introduces profiler support to the Accelerate library, enabling users to collect performance metrics during model training and inference. The profiler allows for detailed analysis of execution time and memory consumption of model operators and can generate profiling traces for visualization in Chrome's tracing tool. Additionally, it provides options to profile long-running jobs with customizable scheduling options.

Key changes include:

Addition of ProfileKwargs to customize profiling options.
Implementation of profiling context manager in the Accelerate class.
New documentation explaining how to use the profiler with examples.
Updated examples to demonstrate profiling capabilities (profiler.py).
Tests for profiling functionality with various scheduling options.

Context and Motivation

Other frameworks like MMEngine and PyTorch Lightning offer profiling techniques based on the Torch Profiler. Inspired by these tools, we aimed to bring similar profiling capabilities to Accelerate. This enhancement helps users optimize and improve model performance by providing insights into the computational and memory aspects of their models.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-06-24T13:28:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks for implementing this cool feature. At a first glance, the PR already looks super clean, really great job.

I checked the rendered profiler docs and there seems to be an issue with </hfoption>, not sure what's going on there. Could you please check? Edit: It seems you were faster than me :)

I also saw that the dates of the copyright headers you added are outdated, could you please update them to 2024?

Otherwise, this PR looks great to me. I don't have a ton of experience with the profiler -- except that I found that it can be super slow :D. In terms of the design of the feature, it looks quite nice to me, but let's wait for Zach's return to office for a full review.

yhna940 · 2024-06-24T14:28:05Z

Thanks for implementing this cool feature. At a first glance, the PR already looks super clean, really great job.

I checked the rendered profiler docs and there seems to be an issue with </hfoption>, not sure what's going on there. Could you please check? Edit: It seems you were faster than me :)

I also saw that the dates of the copyright headers you added are outdated, could you please update them to 2024?

Otherwise, this PR looks great to me. I don't have a ton of experience with the profiler -- except that I found that it can be super slow :D. In terms of the design of the feature, it looks quite nice to me, but let's wait for Zach's return to office for a full review.

Thank you for the quick review @BenjaminBossan !

I appreciate you pointing out the issue with the </hfoption> tag in the docs. I've corrected it now. Additionally, I've updated the copyright headers to 2024 as requested.

However, I noticed that one image in the docs is broken. Could you please provide some guidance on how to fix this? 🙏

Regarding the profiler's performance, I'm also interested in investigating the slowdown you mentioned. On my local server, it seems to work fine. Could you provide more details on where you're experiencing the slowness?

Thanks again for your feedback :)

BenjaminBossan

This looks pretty good already, I have only some minor comments. As mentioned, the full review should come from Zach when he's back.

However, I noticed that one image in the docs is broken.

I wonder if it would ahve been okay as it was, once the PR would be merged. Currently, the image is indeed not at the original location, but that would change after the merge. Unfortunately, I couldn't find any example in the existing docs to verify this.

Regarding the profiler's performance, I'm also interested in investigating the slowdown you mentioned. On my local server, it seems to work fine. Could you provide more details on where you're experiencing the slowness?

From memory, when I tested it, I found some sources of slow down:

Processing the data seems to take quite some time when the model is big.
Some options (IIRC with_stack) would considerably slow down the run itself.
The generated trace could be huge (>1GB) and bring Chrome to its knees.

Of course, this is to be expected to some extent but it made me not want to use the feature ;)

BenjaminBossan · 2024-06-25T12:34:45Z

examples/by_feature/profiler.py

@@ -0,0 +1,254 @@
+# Copyright 2021 The HuggingFace Inc. team. All rights reserved.


Still old year.

Done 20f377b

BenjaminBossan · 2024-06-25T12:41:16Z

src/accelerate/accelerator.py

+        os.makedirs(profile_handler.output_trace_dir, exist_ok=True)
+        profiler.export_chrome_trace(
+            os.path.join(
+                profile_handler.output_trace_dir, PROFILE_PATTERN_NAME.format(suffix=f"_{self.process_index}")


How about adding the _ to the PROFILE_PATTERN_NAME directly?

Done 9cfbbf3

BenjaminBossan · 2024-06-25T12:43:53Z

src/accelerate/utils/dataclasses.py

+
+
+@dataclass
+class ProfileKwargs(KwargsHandler):


I think for this docstring to be included in the API docs, you need to add an entry to docs/source/package_references/kwargs.md.

Done 06a2430

BenjaminBossan · 2024-06-25T12:45:52Z

tests/test_examples.py

@@ -248,6 +249,10 @@ def test_early_stopping(self):
        testargs = ["examples/by_feature/early_stopping.py"]
        run_command(self.launch_args + testargs)

+    def test_profiler(self):


At least on my machine, this takes a significant amount of time to run, should @slow be added?

Thank you :) I totally agree with your comment. I added a slow decorator to the function.

BenjaminBossan · 2024-06-25T12:48:19Z

tests/test_kwargs_handlers.py

+            def on_trace_ready(prof):
+                nonlocal count
+                count += 1
+                print(prof.key_averages().table(sort_by="cpu_memory_usage", row_limit=-1))


I wonder: Instead of printing, could we, for instance, just add this to a list, then do a basic sanity check on the output, for example what columns we should expect?

Change to check if CPU time total: exists.

yhna940 · 2024-06-25T15:39:49Z

Thank you for the review and the feedback! I appreciate the insights and will wait for Zach's full review for any additional comments or suggestions.

I will conduct some experiments to further investigate the performance impacts of with_stack and data processing times.

Additionally, I propose updating the user guide to include tips or warnings about handling large or complex tasks using the PyTorch profiler's API for long-running jobs. As highlighted in the PyTorch official documentation:

PyTorch profiler offers an additional API to handle long-running jobs (such as training loops). Tracing all of the execution can be slow and result in very large trace files. To avoid this, use optional arguments:

schedule - specifies a function that takes an integer argument (step number) as an input and returns an action for the profiler, the best way to use this parameter is to use the torch.profiler.schedule helper function that can generate a schedule for you;

on_trace_ready - specifies a function that takes a reference to the profiler as an input and is called by the profiler each time the new trace is ready.

By updating the documentation with these details, we can provide more comprehensive guidance to users, helping them to effectively manage performance profiling for their models.

Let me know if there's anything else I should address or if you have further suggestions!

Thanks again for the review :)

BenjaminBossan

Thanks a lot for the updates. From my point of view, this looks good. As mentioned, we'll wait for Zach for the final review.

I will conduct some experiments to further investigate the performance impacts of with_stack and data processing times.

By updating the documentation with these details, we can provide more comprehensive guidance to users, helping them to effectively manage performance profiling for their models.

I wouldn't put too much time into this, as I'm sure this is a moving target and we can't constantly monitor the PyTorch docs for updates. I think it's sufficient to refer users to the PyTorch docs for caveats and performance tips.

muellerzr

This is an exceedingly clean, thorough, and excellent PR @yhna940, very nice work! Let's get this in 🔥

cc @stas00

yhna940 added 18 commits June 22, 2024 17:50

Add torch profiler

2e375b0

Add example

4637ad5

Fix rank 0 saving

715dfef

Add docstring

0682cf8

Add profile readme

a118456

Fix minor

8c5fd53

Fix example path

f6a8d08

Add exp test code

a1261b9

Rename profile dir

49b95e3

Change readme

761c5fa

Change save format

b1c3fb1

Minor

367f56a

Enhance docstring example

93600f6

Add user guide

817cd70

Add memory profile guide

7503639

Enhance error msg

cfa90bf

Fix type hinting

9cd1ea5

Minor refactor

8828047

Fix hf tag

a2243ed

BenjaminBossan reviewed Jun 24, 2024

View reviewed changes

yhna940 added 2 commits June 24, 2024 22:41

Fix copyright year

ab9a78b

Mv toctree

ce1b875

Fix image path

ee7fb56

yhna940 requested a review from BenjaminBossan June 25, 2024 09:38

BenjaminBossan requested changes Jun 25, 2024

View reviewed changes

yhna940 added 3 commits June 25, 2024 23:40

Fix license year

20f377b

Change profiler pattern name

9cfbbf3

Update package reference

06a2430

yhna940 added 2 commits June 25, 2024 23:48

Add slow decorator

8f3a69a

Check output value

3250898

BenjaminBossan approved these changes Jun 26, 2024

View reviewed changes

muellerzr approved these changes Jul 1, 2024

View reviewed changes

muellerzr merged commit 5d5d07a into huggingface:main Jul 1, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Profiler Support for Performance Analysis #2883

Add Profiler Support for Performance Analysis #2883

yhna940 commented Jun 23, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 24, 2024

BenjaminBossan left a comment •

edited

Loading

yhna940 commented Jun 24, 2024

BenjaminBossan left a comment

BenjaminBossan Jun 25, 2024

yhna940 Jun 25, 2024

BenjaminBossan Jun 25, 2024

yhna940 Jun 25, 2024

BenjaminBossan Jun 25, 2024

yhna940 Jun 25, 2024

BenjaminBossan Jun 25, 2024

yhna940 Jun 25, 2024

BenjaminBossan Jun 25, 2024

yhna940 Jun 25, 2024

yhna940 commented Jun 25, 2024

BenjaminBossan left a comment

muellerzr left a comment

		@@ -0,0 +1,254 @@
		# Copyright 2021 The HuggingFace Inc. team. All rights reserved.



		@dataclass
		class ProfileKwargs(KwargsHandler):

Add Profiler Support for Performance Analysis #2883

Add Profiler Support for Performance Analysis #2883

Conversation

yhna940 commented Jun 23, 2024 • edited Loading

What does this PR do?

Context and Motivation

Before submitting

HuggingFaceDocBuilderDev commented Jun 24, 2024

BenjaminBossan left a comment • edited Loading

Choose a reason for hiding this comment

yhna940 commented Jun 24, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhna940 commented Jun 25, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

yhna940 commented Jun 23, 2024 •

edited

Loading

BenjaminBossan left a comment •

edited

Loading