GSoC Optimizers: Example program to fit a quadratic function #134

Spnetic-5 · 2023-05-26T16:33:52Z

Optimizers to be implemented:

Batch gradient descent
Mini-batch gradient descent
Stochastic gradient descent

… into dev

milancurcic · 2023-05-31T12:53:52Z

Thanks @Spnetic-5, looks like a good start. You already have the pure SGD example. Do you need any help going forward? To allow batch and mini-batch GDs, I suggest defining x and y data as 1-d arrays that will be your entire dataset. Then for SGD, feed x and y elements one at a time, for mini-batch subset multiple batches, and for batch GD pass the entire arrays.

Spnetic-5 · 2023-05-31T14:27:28Z

Thanks @Spnetic-5, looks like a good start. You already have the pure SGD example. Do you need any help going forward? To allow batch and mini-batch GDs, I suggest defining x and y data as 1-d arrays that will be your entire dataset. Then for SGD, feed x and y elements one at a time, for mini-batch subset multiple batches, and for batch GD pass the entire arrays.

Sure, Thank you for suggesting the approach of using 1-dimensional arrays for the dataset. I'm working on the optimizer code, I'll update the changes soon.

example/quadratic.f90

milancurcic · 2023-06-06T20:32:11Z

Thanks @Spnetic-5 for the work so far. Please study the changes in bda1968. There were a few important fixes to the code:

The training dataset was allocated to the size of the number of epochs (iterations), but the epochs size are really an outer loop to the sample size; I introduced the train_size parameter which I use to allocate the training x and y.
Not reusing the same network between 3 optimization approaches; each needs its own network instance.
net % forward and net % backward methods needed one sample at a time as inputs rather than whole arrays; this is something we can improve later by allowing to pass a batch of data at once.
Evaluating ypred for each optimization method.

I don't know if the results are correct yet, but the code compiles and produces lower errors with increasing epoch count. On my computer the minibatch GD produces very different results between debug and release profiles, so something still not quite correct there.

We're getting close!

Spnetic-5 · 2023-06-07T06:05:11Z

You're welcome! I apologize for the errors and the quality of the code pushed earlier. Thank you for pointing out the changes and fixes you made to the code. It's good to hear that the code is now compiling and producing lower errors with increasing epoch count.

I have carefully studied the changes you made, and I understand the modifications you've introduced. It's great to see that the code now compiles without errors and produces lower errors with increasing epoch count. I will continue to review the code and evaluate the results to ensure correctness.

In order to identify the underlying cause and rectify the issue, I will investigate the discrepancies in the minibatch GD results.

These are the results on my PC :

For 1000 epochs:

Stochastic gradient descent MSE: 0.001104
     Batch gradient descent MSE:  0.062504
 Minibatch gradient descent MSE:  0.088675

For 5000 epochs:

Stochastic gradient descent MSE: 0.000449
     Batch gradient descent MSE:  0.071504
 Minibatch gradient descent MSE:  0.000996

Here, BatchGD is showing a slight increase in MSE, I think as it updates the weights using the entire training dataset in each epoch. As the number of epochs increases, the model starts overfitting the train data, which leads to a higher MSE on test data.

milancurcic · 2023-06-07T15:31:11Z

@Spnetic-5 in SGD subroutine, can you shuffle the mini-batches so that it's truly stochastic? Currently it loops over the mini-batches in the same order every time. Here's my suggested approach:

Split the dataset into mini-batches (you already have this);
Shuffle the start indices of each mini-batch;
Loop over the shuffled start and and indices to extract the desired mini-batch.

The outcome should be that in each epoch the order of mini-batches is random and different. You can take inspiration from

neural-fortran/src/nf/nf_network_submodule.f90

Lines 532 to 535 in 8293118

    
           ! Pull a random mini-batch from the dataset 
        
           call random_number(pos) 
        
           batch_start = int(pos * (dataset_size - batch_size + 1)) + 1 
        
           batch_end = batch_start + batch_size - 1

but even there the mini-batches are not truly shuffled, but rather the start index is randomly selected so that in each epoch there are some data samples that may be unused and there are some that are used more than once.

example/quadratic.f90

Spnetic-5 · 2023-06-10T13:54:22Z

@milancurcic I have updated weekly progress on discourse, should I now proceed to next RMSProp or Adam optimizer, or there are any more changes required in current optimizers

jvdp1 · 2023-06-11T18:38:24Z

Thank you @Spnetic-5 for this PR. Currently this optimizer is only implemented in an example and is not available to other users through the library. Therefore my advice would be to look how to integrate this optimizer in the library. @milancurcic what should be the next step?

milancurcic

This is in good shape, thanks @Spnetic-5. I'll open an issue later today for next steps.

Spnetic-5 added 2 commits May 26, 2023 21:56

📦 UPDATE: Added program to fit quadratic function

614b6ec

Merge branch 'main' of https://github.com/modern-fortran/neural-fortran…

6fac025

… into dev

Spnetic-5 requested a review from milancurcic May 26, 2023 16:34

Spnetic-5 marked this pull request as draft May 26, 2023 16:35

👌 IMPROVE: Added optimizer subroutines

66b4e8a

milancurcic reviewed Jun 5, 2023

View reviewed changes

example/quadratic.f90 Outdated Show resolved Hide resolved

example/quadratic.f90 Outdated Show resolved Hide resolved

example/quadratic.f90 Outdated Show resolved Hide resolved

example/quadratic.f90 Outdated Show resolved Hide resolved

🐛 FIX: code refactoring & minor changes

441c1e4

Spnetic-5 requested a review from milancurcic June 5, 2023 16:21

milancurcic marked this pull request as ready for review June 6, 2023 19:05

Fixes to make the code run

bda1968

📦 Feat: Added batch shuffle in SGD

728868f

milancurcic reviewed Jun 7, 2023

View reviewed changes

example/quadratic.f90 Show resolved Hide resolved

Updated the changes in Mini-batch GD

bd6e58a

Spnetic-5 requested a review from milancurcic June 7, 2023 19:05

milancurcic added 2 commits June 12, 2023 10:07

Define the quadratic as its own function

e0313df

Cleanup

2ba47c1

milancurcic approved these changes Jun 12, 2023

View reviewed changes

milancurcic merged commit adaf9bb into modern-fortran:main Jun 12, 2023

milancurcic mentioned this pull request Jun 13, 2023

Add example program to fit a quadratic function using different optimizers #133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSoC Optimizers: Example program to fit a quadratic function #134

GSoC Optimizers: Example program to fit a quadratic function #134

Spnetic-5 commented May 26, 2023 •

edited by milancurcic

Loading

milancurcic commented May 31, 2023

Spnetic-5 commented May 31, 2023

milancurcic commented Jun 6, 2023

Spnetic-5 commented Jun 7, 2023 •

edited

Loading

milancurcic commented Jun 7, 2023

Spnetic-5 commented Jun 10, 2023

jvdp1 commented Jun 11, 2023

milancurcic left a comment

GSoC Optimizers: Example program to fit a quadratic function #134

GSoC Optimizers: Example program to fit a quadratic function #134

Conversation

Spnetic-5 commented May 26, 2023 • edited by milancurcic Loading

milancurcic commented May 31, 2023

Spnetic-5 commented May 31, 2023

milancurcic commented Jun 6, 2023

Spnetic-5 commented Jun 7, 2023 • edited Loading

milancurcic commented Jun 7, 2023

Spnetic-5 commented Jun 10, 2023

jvdp1 commented Jun 11, 2023

milancurcic left a comment

Choose a reason for hiding this comment

Spnetic-5 commented May 26, 2023 •

edited by milancurcic

Loading

Spnetic-5 commented Jun 7, 2023 •

edited

Loading