Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding some Untrained Models #17

Closed
wants to merge 42 commits into from
Closed

Conversation

avik-pal
Copy link
Member

I have added the vgg nets (with and without BN). The current API update is
VGG11() fetches VGG11 but without pretrained weights and the model is in train mode.
calling trained(VGG19) will give the pretrained VGG19 model in the test mode. All models that lack pretrained weights will throw an error if called with trained.
@MikeInnes could you review this new changes. I will add the remaining models if these changes are ok.

src/model.jl Outdated Show resolved Hide resolved
src/vgg.jl Outdated Show resolved Hide resolved
src/vgg.jl Outdated Show resolved Hide resolved
Copy link
Contributor

@staticfloat staticfloat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I love it. We should verify some of these implementations by building a training example and training e.g. ResNet34 on ImageNet, both to verify our implementations but also to show users a worked example with the understanding that "if you setup ImageNet like this, you can train your own models"

src/densenet.jl Outdated Show resolved Hide resolved
src/resnet.jl Show resolved Hide resolved
src/resnet.jl Show resolved Hide resolved
Copy link
Contributor

@staticfloat staticfloat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, accidental approval. ;)

I have a few comments, and the tests should be fixed. :)

@avik-pal
Copy link
Member Author

avik-pal commented Oct 4, 2018

Should I consider removing some of the tests, since travis is getting OutOfMemory Error?. I will fix the other conv error, the test follows the old API

@staticfloat
Copy link
Contributor

It's weird that it's running out of memory so soon; that's literally just constructing the very first model, and it's smaller than the models we've been training on in the past. That makes me think there's something extremely memory-inefficient in how you're creating the models or something?!

@avik-pal
Copy link
Member Author

avik-pal commented Oct 4, 2018

Yeah, it's quite strange. The test fails for the batchnorm models. I am not quite sure how to loading in a more memory efficient way as it is simply constructing an array of the layers and making a Chain out of it. And even weird the test is passing for the ResNet152 and Densenet264 which are much more memory consuming.
I am not sure if batchnorm is somehow at fault, because the densenet and resnets employ batchnorm as well.

@avik-pal
Copy link
Member Author

avik-pal commented Oct 4, 2018

Looking at the tests I am using, it seems that the previous model stays in memory while I'm trying to generate a new one. I think we should deallocate that before building the next model.

@staticfloat
Copy link
Contributor

Ah, I thought it was failing on the untrained model tests. It's the trained model tests. Yes, that makes much more sense. Try putting a call in to GC.gc() at the end of every model run?

@staticfloat
Copy link
Contributor

Interesting. The OSX builder got through, but the Linux builder didn't.

@staticfloat
Copy link
Contributor

Okay I think what's going on here is that when we call GC.gc() at the end of the loop, technically we still have model available in scope, so DenseNet 264 can't be freed, then we immediately head off to build VGG11, and DenseNet 264 + VGG11 is too much. Try putting a GC.gc() between the two loops, and also put a whos() in to see what else is using memory.

@avik-pal
Copy link
Member Author

avik-pal commented Oct 9, 2018

Also we should postpone merge till the squeezenet and the googlenet have the same updated API.

@avik-pal avik-pal changed the title [WIP] Adding some Untrained Models Adding some Untrained Models Oct 16, 2018
@avik-pal
Copy link
Member Author

Now all the models have a common interface.

@staticfloat
Copy link
Contributor

Just as an update on this, I am currently attempting to hack together an allocation profiler so that we can see where the memory is going within Julia. This test suite should really not be using as much memory as it is, and combining that with our GPU memory problems, I think we really just need better tools to be able to see what's going on, so I hope to (in a view days) have something hooked up with the Profile module that will allow us to see something similar to the flame graphs that ProfileView spits out, but showing where large objects are being allocated, and how long they survive.

src/resnet.jl Outdated Show resolved Hide resolved
@avik-pal
Copy link
Member Author

avik-pal commented Apr 1, 2019

@staticfloat any update on the status of the allocation profiler?

@staticfloat
Copy link
Contributor

Amazingly enough, YES! I don't have time to apply the memory profiler to this branch immediately, but in the next week or two I hope to. It may shed some light on what's going on here. If someone else want to do this, I suggest telling the memory profiler to only pay attention to big and std allocations, as anything small enough to fit within the pool is likely not useful to our analysis here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants