-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding some Untrained Models #17
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I love it. We should verify some of these implementations by building a training example and training e.g. ResNet34 on ImageNet, both to verify our implementations but also to show users a worked example with the understanding that "if you setup ImageNet like this, you can train your own models"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, accidental approval. ;)
I have a few comments, and the tests should be fixed. :)
Should I consider removing some of the tests, since travis is getting OutOfMemory Error?. I will fix the other conv error, the test follows the old API |
It's weird that it's running out of memory so soon; that's literally just constructing the very first model, and it's smaller than the models we've been training on in the past. That makes me think there's something extremely memory-inefficient in how you're creating the models or something?! |
Yeah, it's quite strange. The test fails for the batchnorm models. I am not quite sure how to loading in a more memory efficient way as it is simply constructing an array of the layers and making a Chain out of it. And even weird the test is passing for the ResNet152 and Densenet264 which are much more memory consuming. |
Looking at the tests I am using, it seems that the previous model stays in memory while I'm trying to generate a new one. I think we should deallocate that before building the next model. |
Ah, I thought it was failing on the untrained model tests. It's the trained model tests. Yes, that makes much more sense. Try putting a call in to |
Interesting. The OSX builder got through, but the Linux builder didn't. |
Okay I think what's going on here is that when we call |
Also we should postpone merge till the squeezenet and the googlenet have the same updated API. |
Now all the models have a common interface. |
Just as an update on this, I am currently attempting to hack together an allocation profiler so that we can see where the memory is going within Julia. This test suite should really not be using as much memory as it is, and combining that with our GPU memory problems, I think we really just need better tools to be able to see what's going on, so I hope to (in a view days) have something hooked up with the |
@staticfloat any update on the status of the allocation profiler? |
Amazingly enough, YES! I don't have time to apply the memory profiler to this branch immediately, but in the next week or two I hope to. It may shed some light on what's going on here. If someone else want to do this, I suggest telling the memory profiler to only pay attention to |
I have added the vgg nets (with and without BN). The current API update is
VGG11()
fetches VGG11 but without pretrained weights and the model is intrain mode
.calling
trained(VGG19)
will give the pretrained VGG19 model in thetest mode
. All models that lack pretrained weights will throw an error if called withtrained
.@MikeInnes could you review this new changes. I will add the remaining models if these changes are ok.