Track gradients #15

MatteoRobbiati · 2024-02-02T05:08:49Z

With this PR we enable the tracking of the gradients of the loss function $L$ w.r.t. the trainable parameters $\vec{\theta}$ during the training process (added to callbacks).
This feature can be used to detect the Barren Plateau regime, or to trigger the DBI if some average magnitude threshold value is detected.

In the function plotscripts.plot_gradients the values $$g = \frac{1}{N_{\rm params}}\sum_{i=1}^{N_{\rm params}}|\partial \text{L}_{\theta_i}|$$ are plotted as function of the optimization iteration.

An example of output follows (10 qubits, 1 layer, BFGS).

Loss:

Gradients:

andrea-pasquale

Thanks @MatteoRobbiati.
The black line corresponds to the end of the first VQE training, correct?
Perhaps instead of that when you draw the line connecting the two VQE you can use a different/color style and state in the legend that you are just connecting the points while performing DBI.

MatteoRobbiati · 2024-02-05T06:28:07Z

Thanks @MatteoRobbiati.
The black line corresponds to the end of the first VQE training, correct?
Perhaps instead of that when you draw the line connecting the two VQE you can use a different/color style and state in the legend that you are just connecting the points while performing DBI.

You mean, taking the figure as reference, the line which connects the gradients value's jump from 1e-3 to 1e-1?
Yes, I like it more. Thanks

andrea-pasquale · 2024-02-05T06:30:36Z

Thanks @MatteoRobbiati.
The black line corresponds to the end of the first VQE training, correct?
Perhaps instead of that when you draw the line connecting the two VQE you can use a different/color style and state in the legend that you are just connecting the points while performing DBI.

You mean, taking the figure as reference, the line which connects the gradients value's jump from 1e-3 to 1e-1? Yes, I like it more. Thanks

Yes, just to understand that the gradient is not increasing by itself but it is due to the DBI.

MatteoRobbiati · 2024-02-05T06:33:18Z

Thanks @MatteoRobbiati.
The black line corresponds to the end of the first VQE training, correct?
Perhaps instead of that when you draw the line connecting the two VQE you can use a different/color style and state in the legend that you are just connecting the points while performing DBI.

You mean, taking the figure as reference, the line which connects the gradients value's jump from 1e-3 to 1e-1? Yes, I like it more. Thanks

Yes, just to understand that the gradient is not increasing by itself but it is due to the DBI.

Yep, makes sense. It was my idea!
Then, making a few tests, it seems the gradients increase just the first following optimization iteration, with a subsequent decrease.
We should make more tests, especially with big models on GPUs.

andrea-pasquale · 2024-02-05T06:34:44Z

Thanks @MatteoRobbiati.
The black line corresponds to the end of the first VQE training, correct?
Perhaps instead of that when you draw the line connecting the two VQE you can use a different/color style and state in the legend that you are just connecting the points while performing DBI.

You mean, taking the figure as reference, the line which connects the gradients value's jump from 1e-3 to 1e-1? Yes, I like it more. Thanks

Yes, just to understand that the gradient is not increasing by itself but it is due to the DBI.

Yep, makes sense. It was my idea! Then, making a few tests, it seems the gradients increase just the first following optimization iteration, with a subsequent decrease. We should make more tests, especially with big models on GPUs.

Regarding GPU I might need to check if the code works properly but I think so, let me know if you see any errors.

andrea-pasquale · 2024-02-09T09:05:17Z

@MatteoRobbiati I would wait until this PR is merged to start running multiple jobs for the BP.
Just to have also the gradients stored.

MatteoRobbiati · 2024-02-09T10:00:03Z

I fixed the plot. Now we see something like:

andrea-pasquale · 2024-02-09T11:25:26Z

Thanks @MatteoRobbiati

marekgluza · 2024-03-04T08:54:30Z

Zoe:

need scaling analysis for BPs in the number of qubits
initialization strategy -> ending up in BP 'regions'
There can be 2 issues: either BP where gradients are small in all directions and also that the landscape can be 'glassy' in that there are very many local minima so it's hard to find the global minium:
see physics here https://www.nature.com/articles/s41467-022-35364-5 which should be easier to follow and the second has similar content
https://arxiv.org/abs/2109.06957

Test initialization BP:

For the ansatz compute the variance of the loss landscape with random initializations in $\theta \in[-\pi,\pi]$ (uniform distribution over the parameter space). This computes whether there is a BP in the initialization. Do this for various n (aim for minimum 12, which often is enough for seeing exponential decay, for now this is good; we might need to go further).
Repeat this
this follows: expressibility vs bp PRXQ by Zoe and the first BP paper

When you're stuck are the parameters jumping around or just making a small wiggle?

This diagnozes whether you're in a local minimum or a BP.
[ ]This can be repeated after DBI layers.

Test landscape after training #1:

Train until you get stuck with some fix-point parameter $\theta*$
For the ansatz compute the variance of the loss landscape with random initializations with distribution $\theta\in \theta^*+[-\epsilon,\epsilon]$.

Test landscape after training #2:

Add a random stochastic jump to $\theta^*$ to see if you go out of the local minima. Play around with the strength of the noise. For a BP there would be a region where nothing happens but for a local minimum the noise would have an effect of either moving basin of attraction of the same local minimum or moving to another local minimum. For local minimum there will be an effect 'it helps' but then quickly again get stuck.

Q: What happens if one does a quantum natural gradient step instead of a DBI step?
i.e. a gradient step in the geometry of the cost function, this is closer to the imaginary time evolution and gradient descent & Rayleigh quotient

BPs show up for sampling of shots but for these system sizes there should always be enough visibility to still take a gradient step (even if it's a small update). This means we only have a reduction of gradients but they are not super small yet.

Quantum imaginary evolution
https://arxiv.org/abs/2102.01544
this runs on certain examples which we can also try and compare against each other

What is the relation to this?
https://arxiv.org/abs/2202.06976

See lower bounds
https://arxiv.org/abs/2210.06796

marekgluza · 2024-03-04T09:24:34Z

Running this with shot noise can be interesting to showcase the functioning of the method. It can happen that VQE is really hard to assign if there are measurements of the cost function (evaluation of which direction improves becomes noisy). In this case the advantage of adding DBI might be amplified.

MatteoRobbiati added 5 commits February 2, 2024 11:53

plotting gradients history

b267af5

add docs to plotting function

e0395e2

add gradients tracking into callbacks

9d92d47

saving and plotting gradients also

1c3ac46

update run.sh

9e518d5

MatteoRobbiati requested review from andrea-pasquale and Edoardo-Pedicillo February 2, 2024 05:09

andrea-pasquale approved these changes Feb 2, 2024

View reviewed changes

fixing gradients plot

6850830

andrea-pasquale merged commit 1c9757c into main Feb 9, 2024

andrea-pasquale deleted the track_gradients branch February 9, 2024 11:25

marekgluza mentioned this pull request Mar 4, 2024

Boostvqe status #10

Closed

marekgluza mentioned this pull request Jul 24, 2024

Literature review #43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track gradients #15

Track gradients #15

MatteoRobbiati commented Feb 2, 2024

andrea-pasquale left a comment

MatteoRobbiati commented Feb 5, 2024

andrea-pasquale commented Feb 5, 2024

MatteoRobbiati commented Feb 5, 2024

andrea-pasquale commented Feb 5, 2024 •

edited

Loading

andrea-pasquale commented Feb 9, 2024

MatteoRobbiati commented Feb 9, 2024

andrea-pasquale commented Feb 9, 2024

marekgluza commented Mar 4, 2024 •

edited

Loading

marekgluza commented Mar 4, 2024

Track gradients #15

Track gradients #15

Conversation

MatteoRobbiati commented Feb 2, 2024

andrea-pasquale left a comment

Choose a reason for hiding this comment

MatteoRobbiati commented Feb 5, 2024

andrea-pasquale commented Feb 5, 2024

MatteoRobbiati commented Feb 5, 2024

andrea-pasquale commented Feb 5, 2024 • edited Loading

andrea-pasquale commented Feb 9, 2024

MatteoRobbiati commented Feb 9, 2024

andrea-pasquale commented Feb 9, 2024

marekgluza commented Mar 4, 2024 • edited Loading

marekgluza commented Mar 4, 2024

andrea-pasquale commented Feb 5, 2024 •

edited

Loading

marekgluza commented Mar 4, 2024 •

edited

Loading