Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak #639

Closed
tcapelle opened this issue Mar 30, 2022 · 9 comments
Closed

Memory Leak #639

tcapelle opened this issue Mar 30, 2022 · 9 comments
Labels
high priority These issues are at the top of mind for us.

Comments

@tcapelle
Copy link

tcapelle commented Mar 30, 2022

Hello!
I am thrilled with the functorch package, and have been playing with it lately.

With @soumik12345 we found a memory leak after training a NN. We documented our findings here:

http://wandb.me/functorch-intro

We are probably doing something wrong, but the memory increases after each epoch.

image

As the GPU is pretty monstrous we didn't notice this straight away, but it clearly fills up progresively. The stateful pytorch training loop does not produce this.

@zou3519 zou3519 added the high priority These issues are at the top of mind for us. label Mar 30, 2022
@samdow
Copy link
Contributor

samdow commented Mar 30, 2022

@tcapelle Thanks for the issue and the detailed repro! We're definitely looking at it now, but don't see anything immediately wrong. You are using the API as intended

One question for you two since it uses the wandb library, do you know if that samples the GPU utilization based on time or at certain calls?

Was able to repro, don't need an answer anymore

@samdow
Copy link
Contributor

samdow commented Mar 30, 2022

@tcapelle Thanks for the issue! Basically what's happening is that because the params in the model have requires_grad=True, they'll hold onto the backprop graph and keep adding to it instead of releasing that memory. Since your example doesn't need higher order gradients, we don't need to do this. There's a couple options here:

(1) make params not require_grad

functional_model, params = make_functional(model)
for param in params:
    param.requires_grad_(False)

OR

(2) use detach in sgd

# naive sgd optimizer implementation
def sgd_optimizer(weights, gradients, learning_rate):
    def step(weight, gradient):
        return weight - learning_rate * gradient

    assert (len(weights) == len(gradients))
    return [step(weight, gradient.detach()) for weight, gradient in zip(weights, gradients)]

@tcapelle
Copy link
Author

I was looking for something like this, zero_grad or detach, but didn't know where to do it.

Then a training tutorial should showcase this. I can work out with @soumik12345 an example to add.

@vfdev-5
Copy link
Contributor

vfdev-5 commented Apr 4, 2022

@samdow I was wondering if solution (1) could be an optional behaviour for make_functional(model), e.g. make_functional(model, params_require_grad=False) ? Such that users wont experience mem issue ...

@tcapelle
Copy link
Author

tcapelle commented Apr 4, 2022

Ohh, I would also prefer this as default.

@samdow
Copy link
Contributor

samdow commented Apr 5, 2022

@samdow I was wondering if solution (1) could be an optional behaviour for make_functional(model), e.g. make_functional(model, params_require_grad=False) ? Such that users wont experience mem issue ...

@vfdev-5 thanks for the idea! I really like that and it shouldn't be difficult to do if we plumb it through the _named_parameters function. Feel free to take it on or let me know if you'd want me to look at it instead!

Ohh, I would also prefer this as default.

To @tcapelle's point, I think I'm personally on the side of making this the default but here's what I'm wrestling with:
On the one hand, since make_functional is a functorch-specific, it feels reasonable to make this the default. If a user tries to use vanilla autograd (either .backward or autograd.grad) with the default of requires_grad=False, it will fail. This is nice because it edges the user towards using functorch versions which, composably, are more tested and robust.

However, if a user does use PyTorch's autograd, they'll get a hard to decipher error message that "____ does not require grad and does not have a grad_fn" when they believe that it should work just like PyTorch. Either way leads to a debugging situation that isn't ideal (either the memory leak if we have the default to True or a hard to decipher error message for False).

Thoughts on this are welcome!

cc @zou3519

@zou3519
Copy link
Contributor

zou3519 commented Apr 5, 2022

Having an argument to toggle this would be great. I'm not sure what the default should be but I would lean towards not changing it yet. Right now make_functional preserves requires_grad because that was helpful in some other use cases (e.g. maml, lennard-jones) where users wanted to use functorch transforms in the middle of their model and then backprop through the entire thing using regular PyTorch autograd

@vfdev-5
Copy link
Contributor

vfdev-5 commented Apr 12, 2022

Feel free to take it on or let me know if you'd want me to look at it instead!

@samdow sorry for delay. Yes, i can send a PR for that. I agree that default value should be True such that we do not alter current behaviour and just add an option.

@tcapelle
Copy link
Author

I will close this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority These issues are at the top of mind for us.
Projects
None yet
Development

No branches or pull requests

4 participants