Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track gradients #15

Merged
merged 6 commits into from
Feb 9, 2024
Merged

Track gradients #15

merged 6 commits into from
Feb 9, 2024

Conversation

MatteoRobbiati
Copy link
Collaborator

With this PR we enable the tracking of the gradients of the loss function $L$ w.r.t. the trainable parameters $\vec{\theta}$ during the training process (added to callbacks).
This feature can be used to detect the Barren Plateau regime, or to trigger the DBI if some average magnitude threshold value is detected.

In the function plotscripts.plot_gradients the values $$g = \frac{1}{N_{\rm params}}\sum_{i=1}^{N_{\rm params}}|\partial \text{L}_{\theta_i}|$$ are plotted as function of the optimization iteration.

An example of output follows (10 qubits, 1 layer, BFGS).

Loss:

image

Gradients:

image

Copy link
Collaborator

@andrea-pasquale andrea-pasquale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MatteoRobbiati.
The black line corresponds to the end of the first VQE training, correct?
Perhaps instead of that when you draw the line connecting the two VQE you can use a different/color style and state in the legend that you are just connecting the points while performing DBI.

@MatteoRobbiati
Copy link
Collaborator Author

Thanks @MatteoRobbiati.
The black line corresponds to the end of the first VQE training, correct?
Perhaps instead of that when you draw the line connecting the two VQE you can use a different/color style and state in the legend that you are just connecting the points while performing DBI.

You mean, taking the figure as reference, the line which connects the gradients value's jump from 1e-3 to 1e-1?
Yes, I like it more. Thanks

@andrea-pasquale
Copy link
Collaborator

Thanks @MatteoRobbiati.
The black line corresponds to the end of the first VQE training, correct?
Perhaps instead of that when you draw the line connecting the two VQE you can use a different/color style and state in the legend that you are just connecting the points while performing DBI.

You mean, taking the figure as reference, the line which connects the gradients value's jump from 1e-3 to 1e-1? Yes, I like it more. Thanks

Yes, just to understand that the gradient is not increasing by itself but it is due to the DBI.

@MatteoRobbiati
Copy link
Collaborator Author

Thanks @MatteoRobbiati.
The black line corresponds to the end of the first VQE training, correct?
Perhaps instead of that when you draw the line connecting the two VQE you can use a different/color style and state in the legend that you are just connecting the points while performing DBI.

You mean, taking the figure as reference, the line which connects the gradients value's jump from 1e-3 to 1e-1? Yes, I like it more. Thanks

Yes, just to understand that the gradient is not increasing by itself but it is due to the DBI.

Yep, makes sense. It was my idea!
Then, making a few tests, it seems the gradients increase just the first following optimization iteration, with a subsequent decrease.
We should make more tests, especially with big models on GPUs.

@andrea-pasquale
Copy link
Collaborator

andrea-pasquale commented Feb 5, 2024

Thanks @MatteoRobbiati.
The black line corresponds to the end of the first VQE training, correct?
Perhaps instead of that when you draw the line connecting the two VQE you can use a different/color style and state in the legend that you are just connecting the points while performing DBI.

You mean, taking the figure as reference, the line which connects the gradients value's jump from 1e-3 to 1e-1? Yes, I like it more. Thanks

Yes, just to understand that the gradient is not increasing by itself but it is due to the DBI.

Yep, makes sense. It was my idea! Then, making a few tests, it seems the gradients increase just the first following optimization iteration, with a subsequent decrease. We should make more tests, especially with big models on GPUs.

Regarding GPU I might need to check if the code works properly but I think so, let me know if you see any errors.

@andrea-pasquale
Copy link
Collaborator

@MatteoRobbiati I would wait until this PR is merged to start running multiple jobs for the BP.
Just to have also the gradients stored.

@MatteoRobbiati
Copy link
Collaborator Author

I fixed the plot. Now we see something like:

image

@andrea-pasquale
Copy link
Collaborator

Thanks @MatteoRobbiati

@andrea-pasquale andrea-pasquale merged commit 1c9757c into main Feb 9, 2024
@andrea-pasquale andrea-pasquale deleted the track_gradients branch February 9, 2024 11:25
@marekgluza
Copy link
Contributor

marekgluza commented Mar 4, 2024

Zoe:

  • need scaling analysis for BPs in the number of qubits
  • initialization strategy -> ending up in BP 'regions'
  • There can be 2 issues: either BP where gradients are small in all directions and also that the landscape can be 'glassy' in that there are very many local minima so it's hard to find the global minium:
  • see physics here https://www.nature.com/articles/s41467-022-35364-5 which should be easier to follow and the second has similar content
    https://arxiv.org/abs/2109.06957

Test initialization BP:

  • For the ansatz compute the variance of the loss landscape with random initializations in $\theta \in[-\pi,\pi]$ (uniform distribution over the parameter space). This computes whether there is a BP in the initialization. Do this for various n (aim for minimum 12, which often is enough for seeing exponential decay, for now this is good; we might need to go further).
  • Repeat this
  • this follows: expressibility vs bp PRXQ by Zoe and the first BP paper

When you're stuck are the parameters jumping around or just making a small wiggle?

  • This diagnozes whether you're in a local minimum or a BP.
  • [ ]This can be repeated after DBI layers.

Test landscape after training #1:

  • Train until you get stuck with some fix-point parameter $\theta*$
  • For the ansatz compute the variance of the loss landscape with random initializations with distribution $\theta\in \theta^*+[-\epsilon,\epsilon]$.

Test landscape after training #2:

  • Add a random stochastic jump to $\theta^*$ to see if you go out of the local minima. Play around with the strength of the noise. For a BP there would be a region where nothing happens but for a local minimum the noise would have an effect of either moving basin of attraction of the same local minimum or moving to another local minimum. For local minimum there will be an effect 'it helps' but then quickly again get stuck.

Q: What happens if one does a quantum natural gradient step instead of a DBI step?
i.e. a gradient step in the geometry of the cost function, this is closer to the imaginary time evolution and gradient descent & Rayleigh quotient

BPs show up for sampling of shots but for these system sizes there should always be enough visibility to still take a gradient step (even if it's a small update). This means we only have a reduction of gradients but they are not super small yet.

Quantum imaginary evolution
https://arxiv.org/abs/2102.01544
this runs on certain examples which we can also try and compare against each other

What is the relation to this?
https://arxiv.org/abs/2202.06976

See lower bounds
https://arxiv.org/abs/2210.06796

@marekgluza
Copy link
Contributor

Running this with shot noise can be interesting to showcase the functioning of the method. It can happen that VQE is really hard to assign if there are measurements of the cost function (evaluation of which direction improves becomes noisy). In this case the advantage of adding DBI might be amplified.

@marekgluza marekgluza mentioned this pull request Mar 4, 2024
@marekgluza marekgluza mentioned this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants