Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backforward easy revision #1

Open
simi2525 opened this issue May 10, 2022 · 0 comments
Open

Backforward easy revision #1

simi2525 opened this issue May 10, 2022 · 0 comments
Assignees

Comments

@simi2525
Copy link
Member

simi2525 commented May 10, 2022

Experiment with what happens when you completely eliminate the internal covariate shift. After backpropagating the gradients, first, update the weights of the initial layer, then recalculate the gradients for the subsequent layers and sequentially update each layer after recalculations.

In order to quickly test this take an easy problem (such as MNIST), and find a sufficiently wide neural network with a single layer such that if you make a batch and make a weight update using lr=1.0 if you were to retry the same batch the loss will be 0. Do this to find a good layer size.

After that, add another layer, due to ICS, if you would do the same process as before of updating with lr=1.0 on the same batch, the loss should have increased (this can be considered the loss caused by ICS). After that, apply back-forward propagation as described in the repo to see if the loss goes back to 0 if our implementation is correct.

Reuse the Back-Forward Propagation project

After which scale up to bigger models and datasets and study the effects of reducing ICS, my intuition is that ICS might play an important role in weight exploration and have a regularization effect against overfitting/memorization. Due to this, we might come up with a training procedure that uses back-forward propagation at the beginning of training to quickly get to a good starting point after which exploration/regularization might become important and ICS is desired (or it may be completely the opposite, and back-forward might be useful for fine-tuning).

@simi2525 simi2525 self-assigned this May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant