Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC Optimizers: Example program to fit a quadratic function #134

Merged
merged 9 commits into from
Jun 12, 2023

Conversation

Spnetic-5
Copy link
Collaborator

@Spnetic-5 Spnetic-5 commented May 26, 2023

Solving #133 @milancurcic

Optimizers to be implemented:

  • Batch gradient descent
  • Mini-batch gradient descent
  • Stochastic gradient descent

@Spnetic-5 Spnetic-5 requested a review from milancurcic May 26, 2023 16:34
@Spnetic-5 Spnetic-5 marked this pull request as draft May 26, 2023 16:35
@milancurcic
Copy link
Member

Thanks @Spnetic-5, looks like a good start. You already have the pure SGD example. Do you need any help going forward? To allow batch and mini-batch GDs, I suggest defining x and y data as 1-d arrays that will be your entire dataset. Then for SGD, feed x and y elements one at a time, for mini-batch subset multiple batches, and for batch GD pass the entire arrays.

@Spnetic-5
Copy link
Collaborator Author

Thanks @Spnetic-5, looks like a good start. You already have the pure SGD example. Do you need any help going forward? To allow batch and mini-batch GDs, I suggest defining x and y data as 1-d arrays that will be your entire dataset. Then for SGD, feed x and y elements one at a time, for mini-batch subset multiple batches, and for batch GD pass the entire arrays.

Sure, Thank you for suggesting the approach of using 1-dimensional arrays for the dataset. I'm working on the optimizer code, I'll update the changes soon.

example/quadratic.f90 Outdated Show resolved Hide resolved
example/quadratic.f90 Outdated Show resolved Hide resolved
example/quadratic.f90 Outdated Show resolved Hide resolved
example/quadratic.f90 Outdated Show resolved Hide resolved
@Spnetic-5 Spnetic-5 requested a review from milancurcic June 5, 2023 16:21
@milancurcic milancurcic marked this pull request as ready for review June 6, 2023 19:05
@milancurcic
Copy link
Member

Thanks @Spnetic-5 for the work so far. Please study the changes in bda1968. There were a few important fixes to the code:

  • The training dataset was allocated to the size of the number of epochs (iterations), but the epochs size are really an outer loop to the sample size; I introduced the train_size parameter which I use to allocate the training x and y.
  • Not reusing the same network between 3 optimization approaches; each needs its own network instance.
  • net % forward and net % backward methods needed one sample at a time as inputs rather than whole arrays; this is something we can improve later by allowing to pass a batch of data at once.
  • Evaluating ypred for each optimization method.

I don't know if the results are correct yet, but the code compiles and produces lower errors with increasing epoch count. On my computer the minibatch GD produces very different results between debug and release profiles, so something still not quite correct there.

We're getting close!

@Spnetic-5
Copy link
Collaborator Author

Spnetic-5 commented Jun 7, 2023

You're welcome! I apologize for the errors and the quality of the code pushed earlier. Thank you for pointing out the changes and fixes you made to the code. It's good to hear that the code is now compiling and producing lower errors with increasing epoch count.

I have carefully studied the changes you made, and I understand the modifications you've introduced. It's great to see that the code now compiles without errors and produces lower errors with increasing epoch count. I will continue to review the code and evaluate the results to ensure correctness.

In order to identify the underlying cause and rectify the issue, I will investigate the discrepancies in the minibatch GD results.

These are the results on my PC :

  • For 1000 epochs:
Stochastic gradient descent MSE: 0.001104
     Batch gradient descent MSE:  0.062504
 Minibatch gradient descent MSE:  0.088675
  • For 5000 epochs:
Stochastic gradient descent MSE: 0.000449
     Batch gradient descent MSE:  0.071504
 Minibatch gradient descent MSE:  0.000996

Here, BatchGD is showing a slight increase in MSE, I think as it updates the weights using the entire training dataset in each epoch. As the number of epochs increases, the model starts overfitting the train data, which leads to a higher MSE on test data.

@milancurcic
Copy link
Member

@Spnetic-5 in SGD subroutine, can you shuffle the mini-batches so that it's truly stochastic? Currently it loops over the mini-batches in the same order every time. Here's my suggested approach:

  1. Split the dataset into mini-batches (you already have this);
  2. Shuffle the start indices of each mini-batch;
  3. Loop over the shuffled start and and indices to extract the desired mini-batch.

The outcome should be that in each epoch the order of mini-batches is random and different. You can take inspiration from

! Pull a random mini-batch from the dataset
call random_number(pos)
batch_start = int(pos * (dataset_size - batch_size + 1)) + 1
batch_end = batch_start + batch_size - 1

but even there the mini-batches are not truly shuffled, but rather the start index is randomly selected so that in each epoch there are some data samples that may be unused and there are some that are used more than once.

@Spnetic-5 Spnetic-5 requested a review from milancurcic June 7, 2023 19:05
@Spnetic-5
Copy link
Collaborator Author

@milancurcic I have updated weekly progress on discourse, should I now proceed to next RMSProp or Adam optimizer, or there are any more changes required in current optimizers

@jvdp1
Copy link
Collaborator

jvdp1 commented Jun 11, 2023

Thank you @Spnetic-5 for this PR. Currently this optimizer is only implemented in an example and is not available to other users through the library. Therefore my advice would be to look how to integrate this optimizer in the library. @milancurcic what should be the next step?

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in good shape, thanks @Spnetic-5. I'll open an issue later today for next steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants