-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GSoC Optimizers: Example program to fit a quadratic function #134
Conversation
Thanks @Spnetic-5, looks like a good start. You already have the pure SGD example. Do you need any help going forward? To allow batch and mini-batch GDs, I suggest defining |
Sure, Thank you for suggesting the approach of using 1-dimensional arrays for the dataset. I'm working on the optimizer code, I'll update the changes soon. |
Thanks @Spnetic-5 for the work so far. Please study the changes in bda1968. There were a few important fixes to the code:
I don't know if the results are correct yet, but the code compiles and produces lower errors with increasing epoch count. On my computer the minibatch GD produces very different results between debug and release profiles, so something still not quite correct there. We're getting close! |
You're welcome! I apologize for the errors and the quality of the code pushed earlier. Thank you for pointing out the changes and fixes you made to the code. It's good to hear that the code is now compiling and producing lower errors with increasing epoch count. I have carefully studied the changes you made, and I understand the modifications you've introduced. It's great to see that the code now compiles without errors and produces lower errors with increasing epoch count. I will continue to review the code and evaluate the results to ensure correctness. In order to identify the underlying cause and rectify the issue, I will investigate the discrepancies in the minibatch GD results. These are the results on my PC :
Here, BatchGD is showing a slight increase in MSE, I think as it updates the weights using the entire training dataset in each epoch. As the number of epochs increases, the model starts overfitting the train data, which leads to a higher MSE on test data. |
@Spnetic-5 in SGD subroutine, can you shuffle the mini-batches so that it's truly stochastic? Currently it loops over the mini-batches in the same order every time. Here's my suggested approach:
The outcome should be that in each epoch the order of mini-batches is random and different. You can take inspiration from neural-fortran/src/nf/nf_network_submodule.f90 Lines 532 to 535 in 8293118
but even there the mini-batches are not truly shuffled, but rather the start index is randomly selected so that in each epoch there are some data samples that may be unused and there are some that are used more than once. |
@milancurcic I have updated weekly progress on discourse, should I now proceed to next |
Thank you @Spnetic-5 for this PR. Currently this optimizer is only implemented in an example and is not available to other users through the library. Therefore my advice would be to look how to integrate this optimizer in the library. @milancurcic what should be the next step? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in good shape, thanks @Spnetic-5. I'll open an issue later today for next steps.
Solving #133 @milancurcic
Optimizers to be implemented: