Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Profiling for a training run #753

Closed
jeremyjordan opened this issue Jan 26, 2020 · 12 comments · Fixed by #782
Closed

Feature: Profiling for a training run #753

jeremyjordan opened this issue Jan 26, 2020 · 12 comments · Fixed by #782
Assignees
Labels
feature Is an improvement or enhancement
Milestone

Comments

@jeremyjordan
Copy link
Contributor

🚀 Feature

It'd be nice if the PyTorch Lightning Trainer had a way for profiling a training run so that I could easily identify where bottlenecks are occurring.

Motivation

I have been developing a model and had been using a small toy dataset to verify that everything is working properly. When I started a training run on a larger dataset, the estimated epoch duration was much longer than I anticipated. I have many ideas about where the bottleneck might be occurring, but I don't have an easy way to validate with data.

Pitch

We should include a flag when constructing the Trainer which signifies that we would like to profile this training run. The default will be False since the profiling output may be quite verbose and unnecessary in most cases.

trainer = Trainer(args, profile=True)
trainer.fit(model)

I was initially thinking that we could write a simple decorator and apply this on all the actions (ie functions) which we wish to profile. The benefit of writing a decorator is that the scope of an action is clearly defined (so we know when to start and stop the timing).

Something along the lines of (non functional example):

from functools import wraps
import time


def profile(func, enabled=False):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        output = func(*args, **kwargs)
        end = time.time()
        print(f'{func.__name__} took {end-start}s')
        return output
    return wrapper


class Model:
    def __init__(self):
        pass

    @profile(enabled=trainer.profile)
    def training_step(self, batch, batch_idx):
        pass

    @profile(enabled=trainer.profile)
    def validation_step(self, batch, batch_idx):
        pass

Some issues with this approach:

  • We want to define whether or not a profile is enabled when constructing the trainer, so it's not easy to access this when decorating functions within the model.
  • Users are expected to overwrite these methods, so we won't actually be able to reliably decorate them anyways.

@williamFalcon suggested (and please correct me if I misstate something) that we could construct a profiler object if profile=True in the Trainer initialization. We could then inject self.profiler.profile() in the training loop code for every major event (get train data, get val data, training step, backward, optimizer step, val step, etc).

As I understand it, the profiler would be something along the lines of:

class Profiler:
    def __init__(self):
        self.last_recorded_time = None
        self.last_recorded_action = None

    def profile(self, action=''):
        current_time = time.time()
        if self.last_recorded_time is not None and self.last_recorded_action is not None:
            print(f'{self.last_recorded_action} took {current_time - self.last_recorded_time}s'
        self.last_recorded_time = current_time
        self.last_recorded_action = action

My concern with this approach (as I currently understand it) is that we define the "scope" of an action as simply waiting until the next invocation of profile. Especially given the fact that functions are calling other functions during training (eg. run_training_epoch -> run_training_batch) it can be very hard to keep track of the scope of an action.

Alternatives

One way we can address this is by invoking the profile within a context handler, but this might get messy by effectively introducing another level of indentation for most of the code.

with profile():
    # do action here
@jeremyjordan jeremyjordan added feature Is an improvement or enhancement help wanted Open to be worked on labels Jan 26, 2020
@jeremyjordan
Copy link
Contributor Author

jeremyjordan commented Jan 26, 2020

Thanks to Github's suggestion, I found this similar issue #251

@williamFalcon
Copy link
Contributor

Great suggestion. I still prefer the logger approach instead of decorator.

API

I imagine it like:

Trainer(profile=True|False|Profiler)
  • If True, we use default Profiler
self.profiler = Profiler(...)
  • If False, no profiler

  • Otherwise use the give profiler

Trainer

In the trainer, we'd call it at key places

tng_data = get_train_data()
self.profile('train data')

for epoch in epochs:
  for batch in dataset:
      training_step()
      self.profile('training step')

      backward()
      self.profile('backward')

      #...  etc

Usability

We also allow users to call it whenever they want

def training_step(...):
    x, y = batch
    x = self.forward(x)
    self.profile('fwd')

    x = self.second_nn(x)
    self.profile('projection a')
    
    x = self.loss(x)
    self.profile('loss')

@Borda
Copy link
Member

Borda commented Jan 26, 2020

for the record, more discussion is here:
https://pytorch-lightning.slack.com/archives/CRBLFHY79/p1579997800007700

@jeremyjordan
Copy link
Contributor Author

What if we did the logger approach but made the scope more explicit?

Eg.

self.profile.start('fetch train data')
tng_data = get_train_data()
self.profile.stop('fetch train data')

for epoch in epochs:
  for batch in dataset:
      self.profile.start('training step')
      training_step()
      self.profile.stop('training step')

      self.profile.start('backward')
      backward()
      self.profile.stop('backward')

      #...  etc

And then we could expose an alternative interface using context handlers if the user prefers.

with self.profile('fetch train data'):
  tng_data = get_train_data()

for epoch in epochs:
  for batch in dataset:
      with self.profile('training step'):
          training_step()

      with self.profile('backward')
          backward()

      #...  etc

Internally, we can store the values as a defaultdict(list) and append each (stop-start) duration, so that at the end of the run we can report mean times for each action as well as standard deviations.

@jeremyjordan
Copy link
Contributor Author

I'm also curious what design @jeffling decided on for their profiling tool

@williamFalcon
Copy link
Contributor

@jeremyjordan I like the context handler suggestion. That way, I can decide on my won contexts.

Want to take a stab at implementing this?

@jeffling
Copy link
Contributor

@jeremyjordan

We use several, but the most useful one was really just using a lot of @timeits. We have a small edit that forces GPU sync so we can get GPU time. Unfortunately, still clunky to use but works. We expose both a decorator and logger api, and the logger one definitely is more popular with the team.

@jeremyjordan
Copy link
Contributor Author

Took an initial stab at implementing this yesterday evening (see branch here).

I wrote a simple profiler which just records durations of actions:

>>> p = Profiler()
>>> 
>>> with p.profile("context handler"):
...     time.sleep(5)
...     a = np.random.randn(3000, 2)
...     b = a + 2
...     c = b / 3
... 
>>> p.start('manual')
>>> time.sleep(5)
>>> a = np.random.randn(3000, 2)
>>> b = a + 2
>>> c = b / 3
>>> p.stop('manual')
>>> 
>>> p.describe()
Action                  |  Mean duration (s)    |  Std dev.       
------------------------------------------------------------
context handler         |  5.0022               |  0.0            
manual                  |  5.0036               |  0.0 

And a more advanced profiler which leverages Python's cProfiler which recorded a more detailed view:

>>> ap = AdvancedProfiler()
>>> with ap.profile("test"):
...     time.sleep(5)
...     a = 5*3
...     b = a + 2
...     c = b / 3
... 
>>> ap.start('ok')
>>> a = 3*2
>>> ap.stop('ok')
>>> ap.describe()
Profile stats for: test
         7 function calls in 5.000 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    5.000    5.000    5.000    5.000 {built-in method time.sleep}
        1    0.000    0.000    0.000    0.000 contextlib.py:116(__exit__)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.next}
        1    0.000    0.000    0.000    0.000 profiler.py:37(profile)
        1    0.000    0.000    0.000    0.000 profiler.py:109(stop)
        1    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



Profile stats for: ok
         4 function calls in 0.000 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 profiler.py:109(stop)
        1    0.000    0.000    0.000    0.000 <stdin>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Still need to validate that everything works properly but figured I'd share the branch in case anyone wants to follow along with the progress.

A couple things I might need help on:

  • I haven't done much distributed training in PyTorch so I might want to get some review and make sure this will work properly with that.
  • I'm not sure I understand the motivation behind a GPU sync to get GPU time

@jeremyjordan
Copy link
Contributor Author

Also, at what point do we want to print the results from the training profile?

@Borda
Copy link
Member

Borda commented Feb 2, 2020

@jeremyjordan could pls make it as PR and add "wip" to its name so it is easier to check the chances...

@jeremyjordan
Copy link
Contributor Author

jeremyjordan commented Feb 2, 2020

@Borda sure thing! I wasn't sure if this repo preferred WIP PRs or not. is there a contributors guide? also not sure about things like code formatting (I usually run Black to format)

Edit: never mind I found it. The style guide is pretty light though :)

@Borda
Copy link
Member

Borda commented Feb 2, 2020

@Borda Borda removed the help wanted Open to be worked on label Feb 3, 2020
@Borda Borda added this to the 0.6.1 milestone Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants