Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow immutable borrow to access QuantumCircuit.parameters #12918

Merged
merged 4 commits into from
Aug 14, 2024

Conversation

jakelishman
Copy link
Member

Summary

QuantumCircuit.parameters is logically a read-only operation on QuantumCircuit. For efficiency in multiple calls to assign_parameters, we actually cache the sort order of the internal ParameterTable on access. This is purely a caching effect, and should not leak out to users.

The previous implementation took a Rust-space mutable borrow out in order to (potentially) mutate the cache. This caused problems if multiple Python threads attempted to call assign_parameters simultaneously; it was possible for one thread to give up the GIL during its initial call to CircuitData::copy (so an immutable borrow was still live), allowing another thread to continue on to the getter CircuitData::get_parameters, which required a mutable borrow, which failed due to the paused thread in copy.

This moves the cache into a RefCell, allowing the parameter getters to take an immutable borrow as the receiver. We now write the cache out only if we can take the mutable borrow out necessary. This can mean that other threads will have to repeat the work of re-sorting the parameters, because their borrows were blocking the saving of the cache, but this will not cause failures.

The methods on ParameterTable that invalidate the cache all require a mutable borrow on the table itself. This makes it impossible for an immutable borrow to exist simultaneously on the cache, so these methods should always succeed to acquire the cache lock to invalidate it.

Details and comments

This won't be the last time we have problems with Python threading letting people see Rust reject conditions that allow data races. This particular case isn't the user's fault, though - QuantumCircuit.parameters shouldn't need a mutable borrow, and should be thread-safe to be combined with other thread-safe methods on QuantumCircuit. The issue was an internal caching detail, and that's something Qiskit should help.

An example script that produced errors before:

from concurrent.futures import ThreadPoolExecutor
from qiskit.circuit import QuantumCircuit

qc = QuantumCircuit(1, 1)
for _ in [None]*100_000:
    qc.measure(0, 0)

workers = 2

with ThreadPoolExecutor(max_workers=workers) as executor:
    fs = [executor.submit(qc.assign_parameters, []) for _ in [None]*workers]
    for future in fs:
        future.result()

The thread workers in this example aren't doing anything wrong; assign_parameters(inplace=False) should be safe to run in multiple threads. Before the Rust-space work, Python would silently ignore the potential data race, and the GIL would have prevented anything catastrophic happening (though multiple threads might have separately calculated the sort order).

In that example, the QuantumCircuit.copy call within QuantumCircuit.assign_parameters takes an immutable borrow in Rust space, but the Rust space method can become interrupted part way through, while copying the Measure instructions (since this requires yielding control to the Python interpreter to run Measure.copy). This allows another thread to progress to QuantumCircuit.parameters, which needed a mutable borrow, which could not be taken out due to the other paused thread.

`QuantumCircuit.parameters` is logically a read-only operation on
`QuantumCircuit`.  For efficiency in multiple calls to
`assign_parameters`, we actually cache the sort order of the internal
`ParameterTable` on access.  This is purely a caching effect, and should
not leak out to users.

The previous implementation took a Rust-space mutable borrow out in
order to (potentially) mutate the cache.  This caused problems if
multiple Python threads attempted to call `assign_parameters`
simultaneously; it was possible for one thread to give up the GIL during
its initial call to `CircuitData::copy` (so an immutable borrow was
still live), allowing another thread to continue on to the getter
`CircuitData::get_parameters`, which required a mutable borrow, which
failed due to the paused thread in `copy`.

This moves the cache into a `RefCell`, allowing the parameter getters to
take an immutable borrow as the receiver.  We now write the cache out
only if we *can* take the mutable borrow out necessary.  This can mean
that other threads will have to repeat the work of re-sorting the
parameters, because their borrows were blocking the saving of the cache,
but this will not cause failures.

The methods on `ParameterTable` that invalidate the cache all require a
mutable borrow on the table itself.  This makes it impossible for an
immutable borrow to exist simultaneously on the cache, so these methods
should always succeed to acquire the cache lock to invalidate it.
@jakelishman jakelishman added Changelog: None Do not include in changelog mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library labels Aug 7, 2024
@jakelishman jakelishman added this to the 1.2.0 milestone Aug 7, 2024
@jakelishman jakelishman requested a review from a team as a code owner August 7, 2024 23:55
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @kevinhartman
  • @mtreinish

@jakelishman jakelishman added the stable backport potential The bug might be minimal and/or import enough to be port to stable label Aug 8, 2024
@coveralls
Copy link

coveralls commented Aug 8, 2024

Pull Request Test Coverage Report for Build 10316796986

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 49 of 52 (94.23%) changed or added relevant lines in 1 file are covered.
  • 169 unchanged lines in 7 files lost coverage.
  • Overall coverage decreased (-0.004%) to 89.739%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/parameter_table.rs 49 52 94.23%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 94.02%
qiskit/primitives/statevector_estimator.py 3 94.44%
qiskit/primitives/estimator.py 3 96.2%
crates/qasm2/src/lex.rs 5 91.98%
crates/qasm2/src/parse.rs 6 97.15%
crates/circuit/src/circuit_data.rs 42 94.44%
crates/accelerate/src/two_qubit_decompose.rs 109 90.69%
Totals Coverage Status
Change from base Build 10300985902: -0.004%
Covered Lines: 67375
Relevant Lines: 75079

💛 - Coveralls

In several cases, the previous code was using the runtime-checked
`RefCell::borrow_mut` in locations that can be statically proven to be
safe to take the mutable reference.  Using the correct function for this
makes the logic clearer (as well as technically removing a small amount
of runtime overhead).
`OnceCell` has less runtime checking than `RefCell` (only whether it is
initialised or not, which is an `Option` check), and better represents
the dynamic extensions to the borrow checker that we actually need for
the caching in this method.

All methods that can invalidate the cache all necessarily take `&mut
ParameterTable` already, since they will modify Rust-space data.  A
`OnceCell` can be deinitialised through a mutable reference, so this is
fine.  The only reason a `&ParameterTable` method would need to mutate
the cache is to create it, which is the allowed set of `OnceCell`
operations.
Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

I'm much happier now with the OnceCell being used instead of RefCell, which had felt like it required a little bit too much manual coordination when managing the cache.

As you mentioned in the PR description, this won't be the last time we encounter borrowing issues between threads. In this case, our caching was the reason we needed a mutable borrow. But, in cases where we actually have these semantics (where it is the user's fault, in that they're doing something that requires multiple threads to borrow the same thing concurrently), are we planning to replicate the behavior of the old Python code (doing extra work but providing a correct result), or to produce a runtime failure?

) -> impl Iterator<Item = (Py<PyAny>, HashSet<ParameterUse>)> + '_ {
self.ensure_sorted();
&mut self,
) -> impl ExactSizeIterator<Item = (Py<PyAny>, HashSet<ParameterUse>)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to use an abstract return type here, or can we just return ParameterTableDrain?

In the case of Vec::drain, a public Drain struct gets returned explicitly. That seems like a useful pattern for the general case since we can then implement additional traits for the iterator if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly I'd done it because I didn't want ParameterTableDrain to exist at all, so I kept the type private. The original version of this function had the iterator type unnameable, because I could borrow everything out of the struct without needing a manual implementation. The Rust stdlib exports all its iterator type objects possibly in part because anonymous impl Trait in return position only turned up in 1.26, so there was no alternative before that.

I can make it a public type if you think it's better that way, but I'm not sure I agree with the justification "can implementation additional traits [...] if needed" in this case - if we need them, then we can name the type and make it public, but at the moment, we've no need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's fine as is, given that we can swap in an explicit struct if needed.

(Though I'd be curious if Rust would have used abstract return types for this kind of thing in the standard lib if they'd be around from the beginning.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't know if they would have either. The anonymous types don't make for the prettiest reading of documentation, tbf, though equally, all the structs named Iter just mean you have to go to a mostly empty separate docs page to find which of the Iterator extension traits it implements.

@jakelishman
Copy link
Member Author

jakelishman commented Aug 12, 2024

I'm much happier now with the OnceCell being used instead of RefCell, which had felt like it required a little bit too much manual coordination when managing the cache.

Oddly, I'm unhappy with this PR because I think this actually requires more manual co-ordination when managing the cache, because now you have to take a lot of care (that the compiler won't protect you from) not to yield control to the Python interpreter during the initialiser, which makes get_or_init unsafe for us in most cases. Hopefully a swap to GILOnceCell or something similar is possible for us in the future, though, which would remove my concern.

In the meantime, I'm happy though because it uses less memory, the point-of-use semantics are clearer, and it requires less runtime checking haha.

@kevinhartman
Copy link
Contributor

kevinhartman commented Aug 12, 2024

you have to take a lot of care (that the compiler won't protect you from) not to yield control to the Python interpreter during the initialiser, which makes get_or_init unsafe for us in most cases.

Can we add a code comment that warns devs of this?

With RefCell, the coordination I was mostly referring to was what I'd remembered from the original version of this PR, where there were uses of try_borrow_mut and uses of borrow(_mut), which I see now wasn't the case in the final RefCell-based approach.

@jakelishman
Copy link
Member Author

Oh, I got confused with the two PRs sorry - in this particular PR, the creation of the cache object doesn't require yielding to the Python interpreter. The operations in #12942 do, though, and they include big explanatory code comments about the problem.

Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jakelishman!

@kevinhartman kevinhartman added this pull request to the merge queue Aug 14, 2024
Merged via the queue into Qiskit:main with commit 2d3db9a Aug 14, 2024
15 checks passed
mergify bot pushed a commit that referenced this pull request Aug 14, 2024
* Allow immutable borrow to access `QuantumCircuit.parameters`

`QuantumCircuit.parameters` is logically a read-only operation on
`QuantumCircuit`.  For efficiency in multiple calls to
`assign_parameters`, we actually cache the sort order of the internal
`ParameterTable` on access.  This is purely a caching effect, and should
not leak out to users.

The previous implementation took a Rust-space mutable borrow out in
order to (potentially) mutate the cache.  This caused problems if
multiple Python threads attempted to call `assign_parameters`
simultaneously; it was possible for one thread to give up the GIL during
its initial call to `CircuitData::copy` (so an immutable borrow was
still live), allowing another thread to continue on to the getter
`CircuitData::get_parameters`, which required a mutable borrow, which
failed due to the paused thread in `copy`.

This moves the cache into a `RefCell`, allowing the parameter getters to
take an immutable borrow as the receiver.  We now write the cache out
only if we *can* take the mutable borrow out necessary.  This can mean
that other threads will have to repeat the work of re-sorting the
parameters, because their borrows were blocking the saving of the cache,
but this will not cause failures.

The methods on `ParameterTable` that invalidate the cache all require a
mutable borrow on the table itself.  This makes it impossible for an
immutable borrow to exist simultaneously on the cache, so these methods
should always succeed to acquire the cache lock to invalidate it.

* Use `RefCell::get_mut` where possible

In several cases, the previous code was using the runtime-checked
`RefCell::borrow_mut` in locations that can be statically proven to be
safe to take the mutable reference.  Using the correct function for this
makes the logic clearer (as well as technically removing a small amount
of runtime overhead).

* Use `OnceCell` instead of `RefCell`

`OnceCell` has less runtime checking than `RefCell` (only whether it is
initialised or not, which is an `Option` check), and better represents
the dynamic extensions to the borrow checker that we actually need for
the caching in this method.

All methods that can invalidate the cache all necessarily take `&mut
ParameterTable` already, since they will modify Rust-space data.  A
`OnceCell` can be deinitialised through a mutable reference, so this is
fine.  The only reason a `&ParameterTable` method would need to mutate
the cache is to create it, which is the allowed set of `OnceCell`
operations.

(cherry picked from commit 2d3db9a)
@jakelishman jakelishman deleted the concurrent-assign-parameters branch August 14, 2024 16:51
github-merge-queue bot pushed a commit that referenced this pull request Aug 14, 2024
…#12958)

* Allow immutable borrow to access `QuantumCircuit.parameters`

`QuantumCircuit.parameters` is logically a read-only operation on
`QuantumCircuit`.  For efficiency in multiple calls to
`assign_parameters`, we actually cache the sort order of the internal
`ParameterTable` on access.  This is purely a caching effect, and should
not leak out to users.

The previous implementation took a Rust-space mutable borrow out in
order to (potentially) mutate the cache.  This caused problems if
multiple Python threads attempted to call `assign_parameters`
simultaneously; it was possible for one thread to give up the GIL during
its initial call to `CircuitData::copy` (so an immutable borrow was
still live), allowing another thread to continue on to the getter
`CircuitData::get_parameters`, which required a mutable borrow, which
failed due to the paused thread in `copy`.

This moves the cache into a `RefCell`, allowing the parameter getters to
take an immutable borrow as the receiver.  We now write the cache out
only if we *can* take the mutable borrow out necessary.  This can mean
that other threads will have to repeat the work of re-sorting the
parameters, because their borrows were blocking the saving of the cache,
but this will not cause failures.

The methods on `ParameterTable` that invalidate the cache all require a
mutable borrow on the table itself.  This makes it impossible for an
immutable borrow to exist simultaneously on the cache, so these methods
should always succeed to acquire the cache lock to invalidate it.

* Use `RefCell::get_mut` where possible

In several cases, the previous code was using the runtime-checked
`RefCell::borrow_mut` in locations that can be statically proven to be
safe to take the mutable reference.  Using the correct function for this
makes the logic clearer (as well as technically removing a small amount
of runtime overhead).

* Use `OnceCell` instead of `RefCell`

`OnceCell` has less runtime checking than `RefCell` (only whether it is
initialised or not, which is an `Option` check), and better represents
the dynamic extensions to the borrow checker that we actually need for
the caching in this method.

All methods that can invalidate the cache all necessarily take `&mut
ParameterTable` already, since they will modify Rust-space data.  A
`OnceCell` can be deinitialised through a mutable reference, so this is
fine.  The only reason a `&ParameterTable` method would need to mutate
the cache is to create it, which is the allowed set of `OnceCell`
operations.

(cherry picked from commit 2d3db9a)

Co-authored-by: Jake Lishman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library stable backport potential The bug might be minimal and/or import enough to be port to stable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants