Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed issues in code #6

Closed
yamins81 opened this issue Jul 15, 2016 · 2 comments
Closed

Speed issues in code #6

yamins81 opened this issue Jul 15, 2016 · 2 comments

Comments

@yamins81
Copy link
Collaborator

yamins81 commented Jul 15, 2016

originally written by @marzCS

I finished making a rough version of the bypass model using tf.rnn, and would appreciate your advice on how to proceed. On github (bypass repo): the model is in bypass_rnn.py, and the Convolutional RNNCells used, as well as a FC layer cell, are defined in ConvRNN.py.

I've attached the Tensorboard visualization of the graph as an example reference. It appears small, but you can zoom in a lot. The model doesn't follow any existing base architecture like VGG, but consists of several Convolutional/Pooling cells and a FC layer cell, followed by a fully connected softmax layer. You can see different nodes representing the same cell at different time points, but I checked that the weight variables are indeed being shared.

Some issues:
The training time is faster than the unrolled version of the network I made earlier, but is still quite slow- with a single training iteration on a batch size of 32 (256x256x3 imagenet images) taking ~3 seconds. Increasing the total number of time steps T increases this time.
If I set the RNN to run for too many time steps, like T > 6 (although the actual number depends on the specific architecture), I quickly get an OOM error.
I hypothesize that this is because: 1. Even though the RNN model has a single cell per layer, the values of the cells at each time step must be kept in tf's memory for backpropagation through time. 2. Tensorflow is slightly slower than other frameworks on most models (see the benchmark data on the Jan 5 comment: soumith/convnet-benchmarks#66). 3. It is much slower than regular RNNs because we are using convolutional layers instead of 1d hidden units. What do you think?
Also, all our different runs (adding or removing an extra FC layer at the end, adding or removing bypass, etc.) converge to a loss of ~6.9 after 15 steps, which is basically like random guessing. A search online suggests that after 8K iterations this should go down, if the model is initialized correctly.

How do you suggest we should proceed? The training time seems to be too slow right now. What should we check for performance-wise in the model? How should we, if possible, speed up the training? For example, Jonas and I were thinking about using pre-trained weights for at least the first layer and using distributed GPUs on tensorflow.

For starters, I will try to: 1. time how long it takes for a single pass through a given layer at a given time, as well as how long it takes to load a batch of training images, to try to find the bottleneck. 2. vary initialization parameters like learning rate (currently 0.05). 3. Base case with 100% (non-trainable) decay variable (so no memory of previous state is carried through).

@yamins81
Copy link
Collaborator Author

@marzCS great -- this is beautiful work.

I agree the key thing is to speed stuff up. The things you suggested doing, esp start with pre-trained network, seem reasonable. But first, can you get away with a smaller model? E.g.:

  1. I notice that you're assuming stride in the conv operation is always 1 (e.g. line 45 of _conv in ConvRNN.py). This is leading to huge models you probably don't need. As stride of at least 2 and possibly even 3 or 4 in the conv operation of the first layer is probably OK and will lead to a much smaller model that will still work. Conv stride of 1 after layer 1 is probably a good idea.

  2. Why not start testing your training algorithms using a model with smaller filterbanks? e.g. max of 64 not e.g. 256? This may not matter than much, but potentially worth trying.

@marzCS marzCS closed this as completed Aug 31, 2016
@qbilius
Copy link
Collaborator

qbilius commented Aug 31, 2016

Reopening as speed is probably not optimal quite yet.

@qbilius qbilius reopened this Aug 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants