You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Experiment with what happens when you completely eliminate the internal covariate shift. After backpropagating the gradients, first, update the weights of the initial layer, then recalculate the gradients for the subsequent layers and sequentially update each layer after recalculations.
In order to quickly test this take an easy problem (such as MNIST), and find a sufficiently wide neural network with a single layer such that if you make a batch and make a weight update using lr=1.0 if you were to retry the same batch the loss will be 0. Do this to find a good layer size.
After that, add another layer, due to ICS, if you would do the same process as before of updating with lr=1.0 on the same batch, the loss should have increased (this can be considered the loss caused by ICS). After that, apply back-forward propagation as described in the repo to see if the loss goes back to 0 if our implementation is correct.
Reuse the Back-Forward Propagation project
After which scale up to bigger models and datasets and study the effects of reducing ICS, my intuition is that ICS might play an important role in weight exploration and have a regularization effect against overfitting/memorization. Due to this, we might come up with a training procedure that uses back-forward propagation at the beginning of training to quickly get to a good starting point after which exploration/regularization might become important and ICS is desired (or it may be completely the opposite, and back-forward might be useful for fine-tuning).
The text was updated successfully, but these errors were encountered:
Experiment with what happens when you completely eliminate the internal covariate shift. After backpropagating the gradients, first, update the weights of the initial layer, then recalculate the gradients for the subsequent layers and sequentially update each layer after recalculations.
In order to quickly test this take an easy problem (such as MNIST), and find a sufficiently wide neural network with a single layer such that if you make a batch and make a weight update using lr=1.0 if you were to retry the same batch the loss will be 0. Do this to find a good layer size.
After that, add another layer, due to ICS, if you would do the same process as before of updating with lr=1.0 on the same batch, the loss should have increased (this can be considered the loss caused by ICS). After that, apply back-forward propagation as described in the repo to see if the loss goes back to 0 if our implementation is correct.
Reuse the Back-Forward Propagation project
After which scale up to bigger models and datasets and study the effects of reducing ICS, my intuition is that ICS might play an important role in weight exploration and have a regularization effect against overfitting/memorization. Due to this, we might come up with a training procedure that uses back-forward propagation at the beginning of training to quickly get to a good starting point after which exploration/regularization might become important and ICS is desired (or it may be completely the opposite, and back-forward might be useful for fine-tuning).
The text was updated successfully, but these errors were encountered: