Skip to content
This repository has been archived by the owner on Jan 7, 2025. It is now read-only.

F660 iter size parameter #675

Closed
wants to merge 2 commits into from

Conversation

drozdvadym
Copy link
Contributor

Added iter_size parameter, which was requested in #660

@gheinrich
Copy link
Contributor

@lukeyeager I think the Travis test failed because we're using nv-caffe 0.13 on Travis. Any reason not to upgrade to 0.14?

@lukeyeager
Copy link
Member

Any reason not to upgrade to 0.14?

Nope, let's do it.

But I'm glad for the failed Travis test. We don't want to break backwards compatibility for something like this. Just wrap the new feature in a version check like this:

https://github.com/NVIDIA/DIGITS/blob/v3.2.0/digits/frameworks/caffe_framework.py#L37-L41

@gheinrich
Copy link
Contributor

Hi @drozdvadym, can you make the change suggested by @lukeyeager ?
We may also make the same change in Torch (possibly in a subsequent commit) to keep feature parity between the two frameworks.

I would have thought one would never want to set iter_size>1. Batched training is useful to improve compute utilization but large batches tend to slow learning down as the number of iterations per epoch shrinks. I don't see how accumulating gradients over several iterations would help. I must be missing something. Can you explain how you intend to use this feature?

@drozdvadym
Copy link
Contributor Author

drozdvadym commented Apr 18, 2016

Hi, I'll try to handle this as @lukeyeager has proposed.

Purpose of this:

  • using old GPU cards, or even using new GPU cards with few amount of memory we can not train models with big batch size, due to memory problem. In some articles authors recommend to use bigger batch size for model training, for example in this one: https://github.com/DeepScale/SqueezeNet

@lukeyeager
Copy link
Member

Closing due to inactivity. I've rebased your changes and cleaned them up a bit at #744.

@drozdvadym would you like to review the changes before I submit them?

@lukeyeager lukeyeager closed this May 16, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants