-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn caching off by default when max_diff==1
#5243
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5243 +/- ##
==========================================
- Coverage 99.68% 99.67% -0.01%
==========================================
Files 399 399
Lines 36853 36575 -278
==========================================
- Hits 36736 36457 -279
- Misses 117 118 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @albi3ro!
Out of curiosity, I wonder if the caching affected any first-order gradient workflows that used qml.jacobian()
? That is, any workflows where multiple identical tapes would be executed to build up the Jacobian?
We add an additional mandatory level of caching for autograd, so |
i'm not sure what the current state of the benchmarking suite is, but perhaps this PR would be a good opportunity to test it out? |
[sc-57404] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Thank you for that!
Current benchmarks are using a github runner and too noisy to extract reliable data. |
Context:
Recent benchmarks (see #5211 (comment)) have shown that caching adds massive classical overheads, but often does not actually lead to a reduction in the the number of executions in normal workflows.
Because of this, we want to make smart choices about when to use caching. Higher-order derivatives often result in duplicate circuits, so we need to keep caching when calculating higher-order derivatives. But, we can make caching opt-in for normal workflows. This will lead to reduced overheads in the vast majority of workflows.
Description of the Change:
The
QNode
keyword argumentcache
defaults toNone
. This is interpreted asTrue
ifmax_diff > 1
andFalse
otherwise.Benefits:
Vastly reduced classical overheads in most cases.
Possible Drawbacks:
Increased number of executions in a few edge cases. But these edge cases would be fairly convoluted. Somehow a transform would have to turn the starting tape into two identical tapes.
Related GitHub Issues:
Performance Numbers:
For
n_wires = 20
But for
n_wires= 10
:For
n_wires=20 n_layers=5
, we have:While the cache version does seem to be faster here, that does seem to be statistical fluctuations.
For
n_wires=10 n_layers=20
: