-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running Over Whole Sets/Computing Epochs Instead of Iterations #1094
Comments
The test batch size needs to be a divisor of the size of the test set. You could pick 672 / 7 = 96 with 7 test iterations. The solver will compute every test iteration, each with a full batch of the given test batch size, so that when it hits the end of the test set it merely loops around (which is not what you want). |
This seems like a bug - suppose my test set size is a large prime number - then I have no good options. |
Fair enough -- perhaps the fix is to add a solver proto field for the total size of the test set and then rewrite the solver's test net routine to do a last one-off batch to complete the set with a mini-batch of whatever size is needed. This could be a nice PR for ease-of-use. Thanks for raising the issue. |
No problem. Is it a problem that the comment is marked as closed? |
Right, I've re-opened it for now to keep it on the radar but this issue will be replaced by the PR once it is opened. |
So there is no 'epoch' in caffe? For pylearn2, a set of mini-batches is chosen, sequentially or randomly, from the dataset in each epoch, and the solver loops over epochs for max_iter times. Is it that caffe simply generates batches sequentially from the dataset? |
It depends on the type of data layer and configuration, but that's essentially right. Caffe is configured by mini-batches and not epochs. The If there are particular features in pylearn2 for handling data that you find helpful, please post an issue with a clear description and even a development plan or better yet start a PR ☕ |
One could just use a batch size of 1 for the test set and iterate through all of them, right? Would that be significantly slower? I noticed doing this frees up memory, and I'm able to get away with a larger training batch size. Using gradient accumulation #1977 is probably a better way to use a larger training batch size though |
@shelhamer what happens if someone e.g for 672 images choose test_iter = 7 while having batch size of 100? what would go wrong if someone did that? |
It loops around so some inputs will be double-counted. |
I have an odd number of test images (specifically, I have 672 test images). If I want to have a batch size of 100, how many test iterations should I choose? If I pick 6 then we will only iterate through 600 of the 672 test images, but if I pick 7 (to iterate through 700 images) then we will go off the end of the database (though I still get a result, not a segfault). For the record, it seems that if I pick 7 vs 20 iterations, I get a different result, so it seems that the solver does not naturally just stop once it reaches the end of the test set. Any help / advice?
The text was updated successfully, but these errors were encountered: