Backforward easy revision #1

simi2525 · 2022-05-10T12:29:46Z

Experiment with what happens when you completely eliminate the internal covariate shift. After backpropagating the gradients, first, update the weights of the initial layer, then recalculate the gradients for the subsequent layers and sequentially update each layer after recalculations.

In order to quickly test this take an easy problem (such as MNIST), and find a sufficiently wide neural network with a single layer such that if you make a batch and make a weight update using lr=1.0 if you were to retry the same batch the loss will be 0. Do this to find a good layer size.

After that, add another layer, due to ICS, if you would do the same process as before of updating with lr=1.0 on the same batch, the loss should have increased (this can be considered the loss caused by ICS). After that, apply back-forward propagation as described in the repo to see if the loss goes back to 0 if our implementation is correct.

Reuse the Back-Forward Propagation project

After which scale up to bigger models and datasets and study the effects of reducing ICS, my intuition is that ICS might play an important role in weight exploration and have a regularization effect against overfitting/memorization. Due to this, we might come up with a training procedure that uses back-forward propagation at the beginning of training to quickly get to a good starting point after which exploration/regularization might become important and ICS is desired (or it may be completely the opposite, and back-forward might be useful for fine-tuning).

simi2525 self-assigned this May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backforward easy revision #1

Backforward easy revision #1

simi2525 commented May 10, 2022 •

edited

Loading

Backforward easy revision #1

Backforward easy revision #1

Comments

simi2525 commented May 10, 2022 • edited Loading

simi2525 commented May 10, 2022 •

edited

Loading