-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalidate PackedInstruction
cache upon modification
#13543
base: main
Are you sure you want to change the base?
Conversation
- Make all attributes of the `PackedInstruction` private to allow for control of the underlying cached `PyObject` reference stored in Python. This ensures that no dead references are left as soon as the gate is modified from the Rust side. - Reimplement `Clone` trait for `PackedInstruction`s to clear the cached gate whenever we decide make calls to `clone()`. - Re-organize rest of the code to adapt to the new changes.
PackedInstruction
cached upon modificationPackedInstruction
cache upon modification
While I'm not sure how to test the original bug in CI (since we turn Qiskit in parallel off) we can test @jakelishman's reproducer from: #13532 (comment) 🙂 |
Pull Request Test Coverage Report for Build 12280844236Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
These changes do fix things but there is a big performance regression in circuit construction. Since I removed a lot of the pre-caching that was being done prior to introducing these changes, I thought adding those back would fix the performance regression, however, local testing proved that including those made things even slower. I will post those numbers later, but slowdowns were as high as 2x slower. This is the current regression:
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
|
It seems to work well with this based on local tests. |
Other than the circuit instruction benchmarks, not much seems to have changed in the utility scale, although a couple benchmarks seem to have slowed down. Utility scale benchmarks:
BENCHMARKS NOT SIGNIFICANTLY CHANGED.Transpiler levels
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. |
- Add views for `py_op`. - Remove stale saching being done in `apply_operation_*` in `DAGCircuit`. - Adapt the code to new changes. - Add missing docstring
There seems to be a strange effect in which some benchmarks improve while others get worse, which is strange:
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. |
One or more of the following people are relevant to this code:
|
PackedInstruction
cache upon modificationPackedInstruction
cache upon modification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good so far, but I've noted a few concerns inline.
Specifically, it isn't entirely clear to me when we can preserve the py_op
, if at all. In its current state, it looks like this PR always throws it out, even when building new instances of PackedInstruction
via ::new
, and even when cloning a singleton gate. If we're doing so unnecessarily, that is certainly performance we're leaving (/putting back) on the table. Do you know if that's the case @raynelfss?
It'd be great if @jakelishman or @mtreinish can confirm prior to the potential inclusion of this PR in 1.3.1 this week, though they're both on holiday.
#[cfg(feature = "cache_pygates")] | ||
{ | ||
inst.py_op = py_op.unbind().into(); | ||
*inst.py_op_mut() = py_op.unbind().into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we consider constructing a new PackedInstruction
with a user-provided Python op for the cache as unsafe, then we should also consider this to be unsafe, correct? Since the instance we get by calling func
is something the user may have a reference to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your reasoning, I'm okay with making the py_op
attribute fully invisible since it is still unsafe (even when immutably borrowed).
params: (!py_op.params.is_empty()).then(|| Box::new(py_op.params)), | ||
extra_attrs: py_op.extra_attrs, | ||
#[cfg(feature = "cache_pygates")] | ||
py_op: op.unbind().into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to throw away the py_op
when creating new PackedInstruction
instances?
Surely we anticipated that it would be an issue for Python-side code to modify the cached objects at the time we introduced the py_op
cache. That makes me think we should preserve py_op
as part of PackedOperation::new
. Isn't the issue only via the default Clone
for PackedInstruction
?
I realize that @jakelishman is on holiday, but perhaps if he sees this he can clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but also think of it this way.
If you look at the PackedInstruction::unpack_py_op()
method, in which a reference is built to a python object (basically what we do when the cache_pygate
flag is disabled), in the case of a standard gate, we build a new instance of py_op based on the current parameters or the extra attributes of our gate, and not only that, but the instance we get from this function whenever dealing with any other type of gate comes directly from the OperationRef
type if it contains a python object within and isn't really affected by changes unless our op
type is StandardGate
.
Why do I mention this? Well, whenever a gate is initialized without a cached pygate, this is the function in charge of building that gate for us. Since this gate is disconnected and changes won't propagate, we'd need to update it by creating a new one whenever our params or attributes are modified (or at least that's what I understand from here).
We could technically optimize further by only re-building if we have an instance of Standard
gate, since what we do in other cases is a call to clone_ref(py)
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python doesn't enforce ownership in the way Rust does, but we can still think of it in similar terms. DAGCircuit.apply_operation_back
has always behaved as if calling it transferred ownership of the gate object, as opposed to QuantumCircuit.append
, which copied the object if it detected that you might want to modify it (i.e. it was parametric).
Since the Python object coming in is logically "owned" by the function now, I think it's fine to put it in the cache.
With all the Python stuff, we can't compile-time (or even, in most cases, run-time) forbid modifications to Python objects, so there's always going to be the possibility for a user to break us a bit, and some of our Python-based will be pragmatics. I think in this case, saving the incoming object in the cache is no more dangerous than having the cache at all, so it should be fine to keep.
// Setters | ||
|
||
/// Retrieves an immutable reference to the instruction's underlying operation. | ||
pub fn op_mut(&mut self) -> &mut PackedOperation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that all of these _mut
methods are only used within the qiskit-circuit
crate. Should we introduce these with visibility pub(crate)
instead?
Originally PackedInstruction
was more of a Rust POD, but now it feels like we should lock down its internals where possible since it is managing its own cache invalidation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this makes more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making the methods pub(crate)
is a bit needlessly restrictive. Anywhere you can get a &mut PackedInstruction
, you can effectively change any fields you need to, because you can *instruction = my_new_packed_instruction
, so the actual external access restriction for the circuit needs to be purely managed by when you can get a &mut PackedInstruction
.
Otherwise, it's just needlessly restrictive - if you can create a new PackedInstruction
with pub
visibility, then making these methods pub(crate)
doesn't actually restrict any capability.
params: self.params.clone(), | ||
extra_attrs: self.extra_attrs.clone(), | ||
#[cfg(feature = "cache_pygates")] | ||
py_op: OnceLock::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Toward the beginning of the discussion here, @jakelishman says:
the right solution is to manually impl Clone for PackedInstruction to not propagate the cache for gates that aren't Python-space singletons.
Should we be preserving the py_op
in the case of singleton gates to avoid a more significant performance regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we be able to identify which ones are singletons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking mostly about what's necessary, rather than what's easy. Technically it's safe to propagate the cache if the Python object itself is immutable. If it's too hard to tell, or we deem it too risky, we can just zero them all out.
Fwiw, we'd basically just apply a Rust-only heuristic: check the PackedOperation
is a Standard
and it's a 0-parameter gate with extra_attrs == None
. That won't catch every single singleton, but it'll do the vast majority and fails safe.
@kevinhartman I'd honestly be okay with not keeping an initialized As for what happens if we bring back the deleted assigned
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. It's very puzzling as to why this is happening, but if we could make the attribute fully private to the |
releasenotes/notes/fix-packed-instruction-f2620d5aff61269d.yaml
Outdated
Show resolved
Hide resolved
Co-authored-by: Kevin Hartman <[email protected]>
I'm very surprised to see performance dropping if some of the caches are being propagated but not all - it might be a good idea to try and dig heavily into that and really understand why it's happened, especially since this is being considered for a bugfix. Fwiw, this feels like a very major change to be targetted at a bugfix to me, especially a release that might go out tomorrow. It might worth considering squashing the |
We're all in agreement that this PR should not merge until after 1.3.1. @raynelfss is working on a separate more localized hotfix for #13504 for this release, and we'll revisit this one in the future when we have time to understand the performance characteristics. |
Summary
This should supercede #13532, and should correctly fix #13504
A previous mis-implementation allowed shared references between cloned instances of
PackedInstruction
when it came to their Python counterparts. The following commits re-implement theClone
trait and makesPackedInstruction
attributes accessible only through views. This is done to allow the cachedPyGate
to be discarded upon modification ofop
,params
orextra_attrs
attributes of said instance ofPackedInstruction
.Details and comments
As referenced by @Cryoris in #13504, we were finding unknown parameters when performing translations in parallel. This was due to modifications coming from
BasisTranslator
helper functioncompose_transforms
which would modify certain parameters from a circuit. The reason for this was that we had been unintentionally copying references to the cached python gates stored within these operation instances.The following commits perform a couple of changes on the
PackedInstruction
class to prevent this from happening:Clone
trait forPackedInstruction
such that the cachedpy_gate
reference is discarded.cached_pygate
by discarding the cache whenever these attributes are borrowed mutably. This was done by:PackedInstruction
private.These changes had a propagated effect in the codebase and a lot of other pieces of code had to be modified for it to work with it.
Notes:
Co-authored-by: Julien Gacon [email protected]