-
Notifications
You must be signed in to change notification settings - Fork 967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory 'leak' issue with MapTable and clearState #1141
Comments
Update: This issue is fixed if the modules in Line 80 in acc6b8c
|
achalddave
added a commit
to achalddave/nn
that referenced
this issue
Feb 20, 2017
achalddave
added a commit
to achalddave/predictive-corrective
that referenced
this issue
Jul 20, 2017
This should fix the same issue as in torch/nn#1141
achalddave
added a commit
to achalddave/predictive-corrective
that referenced
this issue
Jul 20, 2017
Calling clearState() seems to cause issues that, after 4-5 days of debugging, I haven't been able to fix. See, for example: torch/nn#1141 torch/cunn#441 Further, it's unclear to me if `getParameters` and memory management in general works well when a call to `clearState` can destroy modules (and therefore weight tensors). The easiest solution to all of this is simply to never call clearState on the model while it is training. When saving the model, we create a copy of it on the CPU, and call clearState on this CPU copy, which we then save to disk.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Calling
clearState()
on MapTable seems to lead to a significant increase in memory usage for future iterations. I was unable to find the source, but the following script demonstrates the bug. (Of course, the exact memory amounts may not match across systems, but the relative amounts should be correct.)The script is at: https://gist.github.com/achalddave/6ac8390e06a23ecc6d67e3fa22ef0f04
A few notes:
nn.MapTable
is removed (and replaced with just a single SpatialConvolution operating on a single input tensor).The text was updated successfully, but these errors were encountered: