Memory Leak #639

tcapelle · 2022-03-30T15:29:18Z

Hello!
I am thrilled with the functorch package, and have been playing with it lately.

With @soumik12345 we found a memory leak after training a NN. We documented our findings here:

We are probably doing something wrong, but the memory increases after each epoch.

As the GPU is pretty monstrous we didn't notice this straight away, but it clearly fills up progresively. The stateful pytorch training loop does not produce this.

samdow · 2022-03-30T17:30:36Z

~~@tcapelle Thanks for the issue and the detailed repro! We're definitely looking at it now, but don't see anything immediately wrong. You are using the API as intended~~

~~One question for you two since it uses the wandb library, do you know if that samples the GPU utilization based on time or at certain calls?~~

Was able to repro, don't need an answer anymore

samdow · 2022-03-30T18:29:49Z

@tcapelle Thanks for the issue! Basically what's happening is that because the params in the model have requires_grad=True, they'll hold onto the backprop graph and keep adding to it instead of releasing that memory. Since your example doesn't need higher order gradients, we don't need to do this. There's a couple options here:

(1) make params not require_grad

functional_model, params = make_functional(model)
for param in params:
    param.requires_grad_(False)

OR

(2) use detach in sgd

# naive sgd optimizer implementation
def sgd_optimizer(weights, gradients, learning_rate):
    def step(weight, gradient):
        return weight - learning_rate * gradient

    assert (len(weights) == len(gradients))
    return [step(weight, gradient.detach()) for weight, gradient in zip(weights, gradients)]

tcapelle · 2022-03-30T19:28:24Z

I was looking for something like this, zero_grad or detach, but didn't know where to do it.

Then a training tutorial should showcase this. I can work out with @soumik12345 an example to add.

vfdev-5 · 2022-04-04T11:13:28Z

@samdow I was wondering if solution (1) could be an optional behaviour for make_functional(model), e.g. make_functional(model, params_require_grad=False) ? Such that users wont experience mem issue ...

tcapelle · 2022-04-04T11:56:25Z

Ohh, I would also prefer this as default.

samdow · 2022-04-05T14:56:42Z

@samdow I was wondering if solution (1) could be an optional behaviour for make_functional(model), e.g. make_functional(model, params_require_grad=False) ? Such that users wont experience mem issue ...

@vfdev-5 thanks for the idea! I really like that and it shouldn't be difficult to do if we plumb it through the _named_parameters function. Feel free to take it on or let me know if you'd want me to look at it instead!

Ohh, I would also prefer this as default.

To @tcapelle's point, I think I'm personally on the side of making this the default but here's what I'm wrestling with:
On the one hand, since make_functional is a functorch-specific, it feels reasonable to make this the default. If a user tries to use vanilla autograd (either .backward or autograd.grad) with the default of requires_grad=False, it will fail. This is nice because it edges the user towards using functorch versions which, composably, are more tested and robust.

However, if a user does use PyTorch's autograd, they'll get a hard to decipher error message that "____ does not require grad and does not have a grad_fn" when they believe that it should work just like PyTorch. Either way leads to a debugging situation that isn't ideal (either the memory leak if we have the default to True or a hard to decipher error message for False).

Thoughts on this are welcome!

cc @zou3519

zou3519 · 2022-04-05T15:23:52Z

Having an argument to toggle this would be great. I'm not sure what the default should be but I would lean towards not changing it yet. Right now make_functional preserves requires_grad because that was helpful in some other use cases (e.g. maml, lennard-jones) where users wanted to use functorch transforms in the middle of their model and then backprop through the entire thing using regular PyTorch autograd

vfdev-5 · 2022-04-12T19:55:53Z

Feel free to take it on or let me know if you'd want me to look at it instead!

@samdow sorry for delay. Yes, i can send a PR for that. I agree that default value should be True such that we do not alter current behaviour and just add an option.

tcapelle · 2022-04-21T13:03:21Z

I will close this now.

zou3519 added the high priority These issues are at the top of mind for us. label Mar 30, 2022

vfdev-5 mentioned this issue Apr 14, 2022

Added params_require_grad arg to make_functional* #701

Merged

tcapelle closed this as completed Apr 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Leak #639

Memory Leak #639

tcapelle commented Mar 30, 2022 •

edited

Loading

samdow commented Mar 30, 2022 •

edited

Loading

samdow commented Mar 30, 2022 •

edited

Loading

tcapelle commented Mar 30, 2022

vfdev-5 commented Apr 4, 2022

tcapelle commented Apr 4, 2022

samdow commented Apr 5, 2022

zou3519 commented Apr 5, 2022

vfdev-5 commented Apr 12, 2022 •

edited

Loading

tcapelle commented Apr 21, 2022

Memory Leak #639

Memory Leak #639

Comments

tcapelle commented Mar 30, 2022 • edited Loading

samdow commented Mar 30, 2022 • edited Loading

samdow commented Mar 30, 2022 • edited Loading

tcapelle commented Mar 30, 2022

vfdev-5 commented Apr 4, 2022

tcapelle commented Apr 4, 2022

samdow commented Apr 5, 2022

zou3519 commented Apr 5, 2022

vfdev-5 commented Apr 12, 2022 • edited Loading

tcapelle commented Apr 21, 2022

tcapelle commented Mar 30, 2022 •

edited

Loading

samdow commented Mar 30, 2022 •

edited

Loading

samdow commented Mar 30, 2022 •

edited

Loading

vfdev-5 commented Apr 12, 2022 •

edited

Loading