Port standard instructions to Rust. #13486

kevinhartman · 2024-11-25T05:21:32Z

Summary

Adds a new Rust enum StandardInstruction representation for standard instructions (i.e. Barrier, Delay, Measure and Reset) and updates the encoding of PackedOperation to support storing it compactly at rest.

The size of a PackedOperation has now been anchored to always be 64 bits, even on 32-bit systems. This is necessary to encode the data payload of standard instructions like Barrier and Delay. See the docstrings within packed_instruction.rs for details.

Details and comments

The implementation works very similarly to what we currently do for StandardGate, but with a bit of extra consideration for an additional data payload used by variadic/template instructions like Barrier and Delay.

Similarly to StandardGate, the newly added StandardInstruction enum serves as the first-class Rust interface for working with standard instructions. The existing OperationRef enum returned by PackedOperation::view has a new variant StandardInstruction which exposes this.

Unlike StandardGate, StandardInstruction is not a pyclass because it is not directly used to tag instructions as standard in our Python class definitions. Instead, the simple integral enum StandardInstructionType is used as the tag, and OperationFromPython::extract_bound further queries the incoming Python object for variadic/template data when building the complete StandardInstruction.

Encoding

The PackedOperation encoding has been updated as follows:

A PackedOperation is now always 64 bits wide even on 32 bit systems, and is now a wrapper around a custom union type, BitField. This union contains three bitfield!(u64) types, StandardGateBits, StandardInstructionBits, and PointerBits, defined using the bitfield-struct crate, which gives us compile-time validation of bitfield layout and eliminates the need for manual masking and bit manipulation. Similar to a CPU ISA, the discriminant identifies which bitfield / layout should be used when decoding the rest of the operation. See inline doc strings for more detail.
The discriminant has been widened from 2 to 3 bits (it is now at its maximum width, but we still have room for 3 more variants).
The discriminant now has additional variant StandardInstruction.

Todo:

Add more detail to PR description.
Actually test this on 32 bit systems and a big endian 64 bit arch.

This way, we can also use this enum to tag the Python classes.

jakelishman · 2024-11-25T07:42:36Z

I'm concerned that the LoHi struct and PackedOperation union are making it easier to make mistakes on BE and/or 32-bit systems, because the encodings now change between them.

I feel like the bitshift and masking operations could be extended over the whole u64, and that'll all automatically work on BE and 32-bit systems, especially with the size of PackedOperation now fixed to 64 bits - we can even have everything be aligned. The shifts of the payload of instructions can be set such that a u32 payload is always in the high bits, with the padding bits between it and the rest of the discriminants if you're concerned about aligned access, because then the bitshifts will be compiled out into a single register load (just read the u32 from the right place) and the compiler will handle endianness for us. The pointer doesn't need to be handled any differently to any other payload - everything's just shifts and masks anyway (LoHi can't avoid that), and I think introducing the union makes that harder to follow.

The LoHi struct to me seems to be forcing every method of PackedOperation to think about the partial register reads/loads. If we use shifts and masks on a u64 with const masks and shifts, there's no logic split where some of it is done with shifts and some with endianness-dependent partial register access in our own code.

jakelishman · 2024-11-25T09:03:08Z

I guess my main point is this: the 64-bit PackedOperation can always be thought about as a collection of payloads each of which is stored in a particular mask and needs a particular shift. These are:

the PackedOperation discriminant at (0b111, 0)
a pointer at (usize::MAX as u64 & 0b000, 0) (regardless of 32-bit or 64-bit)
a StandardGate discriminant at (0xff << 3, 3)
a StandardInstruction discriminant at (0xf << 3, 3)
a DelayUnit payload at (for example) (0x7 << 32, 32)
a u32 payload at (u32::MAX as u64 << 32, 32)

Introducing LoHi doesn't remove the need to bitshift and mask for most items, it just means that some of my above list are done one way, some are done another, and the union means that the pointer now has more ways to access it. LoHi also restricts the payload size to u32, when we already have payloads that exceed that (the pointers).

If you want a shade more encapsulation than my loose const shift and mask associated items, would something like this look better to you?

struct PayloadLocation<T: SomeBytemuckOrCustomTrait> {
    mask: u64,
    shift: u32,
    datatype: PhantomData<T>,
}
impl PayloadLocation {
    // Note we _mustn't_ accept `PackedOperation` until we've got a valid one,
    // because holding a `PackedOperation` implies ownership over its stored
    // pointer, and it mustn't `Drop` or attempt to interpret partial data.

    #[inline]
    fn store(&self, src: T, target: u64) -> u64 {
        let src: u64 = bytemuck::cast(src);
        target | ((src << self.shift) & self.mask)
    }
    #[inline]
    fn load(&self, target: u64) -> T {
        bytemuck::cast((target & self.mask) >> self.shift)
    }
}

const PACKED_OPERATION_DISCRIMINANT = PayloadLocation { mask: 0b111, shift: 0 };
const STANDARD_GATE_DISCRIMINANT = PayloadLocation { mask: 0bff << 3, shift: 3 };
const POINTER_PAYLOAD = PayloadLocation { mask: usize::MAX as u64 & 0b000, shift: 0 };
// and so on

(Bear in mind I just typed that raw without trying it, and I didn't think all that hard about what the trait bound should be.)

If everything's inlined and constant, the compiler absolutely should be able to compile out 32 bit shifts and all-ones masks into partial register reads/loads itself, so there oughtn't to be any overhead.

This reverts commit b6d8f92.

…hs." This reverts commit 6049498.

Trying out a neat crate for Rust bitfields. The caveats are: * Custom enums used in the bitfield must specify const funcs for bit conversion, and bytemuck's cast functions aren't const. * The bitfield crate implements Clone for you, but does not seem to have a way to disable this. We can't rely on their clone, since for pointer types we need to allocate a new Box. To get around this, PackedOperation is a wrapper around an internal bitfield struct (rather than being a bitfield struct itself). (Note: I'm not yet happy with this. Specifically, I think the abstraction may be cleaner if OpBitField is defined entirely in terms of raw native types and any reinterpretation / transmutation is done from PackedOperation. Consider this a first pass.)

I can't easily test this on my own machine, but I think it'll work.

qiskit-bot · 2024-12-17T20:10:25Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core

jakelishman

I haven't thought hugely hugely about all the implications of this all the way through, but here's comments on the implementation.

The rough summary is:

this does look like a successful way of doing it
I agree that you can kind of view these as bitfield structs
I'm very nervous that bitfield-struct is causing us to invoke undefined behaviour, introduce more unsafe, and I'm not certain it's making the code a huge amount clearer.

My biggest worry about bitfield-struct as written here is about accessing the PackedOperation discriminant - getting it from the StandardGateBits representation is, I believe, UB. We can swap that to the pointer form, but the fact that it's so easy to produce UB like that makes me nervous.

My other hesitation is that generally these aren't entirely bitfields, because of the overlap. All the extra code needed to support the masking out of the low bits of the pointer feels more complex to me than it was before, and I'm not convinced that bitfield_struct is offering us enough in return. If anything, it feels like we might have spread the dangerous assumptions further around across more types and more (type-level) indirection than there was before.

That said, I'm not really strongly against it. I worry that it's using a pneumatic drill to solve a hammer-and-nail problem here since there really are not many fields at all in our structs, and these structs should not need much manipulation other than construction. My perception of the complexity of manual bitshift-and-mask operations compared to this might be different to other people's.

jakelishman · 2024-12-18T00:16:21Z

crates/circuit/src/circuit_instruction.rs

@@ -695,6 +701,37 @@ impl<'py> FromPyObject<'py> for OperationFromPython {
                extra_attrs: extract_extra()?,
            });
        }
+        'standard_instr: {


If you're adding this label as a separate one, you probably want to rename the 'standard label above to 'standard_gate.

jakelishman · 2024-12-18T00:17:03Z

crates/circuit/src/circuit_instruction.rs

+            // Our Python standard instructions have a `_standard_instruction_type` field at the
+            // class level so we can quickly identify them here without an `isinstance` check.
+            // Once we know the type, we query the object for any type-specific fields we need to
+            // read (e.g. a Barrier's number of qubits) to build the Rust representation.
+            let Some(standard_type) = ob_type
+                .getattr(intern!(py, "_standard_instruction_type"))


Why _standard_instruction_type and not _standard_instruction, which is consistent with _standard_gate?

I can change this to _standard_instruction, if you prefer. I did this because there's also a Rust-side StandardInstruction enum, and even though it isn't exposed to Python, my thinking was that this would be more consistent to readers of the Qiskit codebase if the name here aligned to the StandardInstructionType Rust enum that backs it.

I can only speak to my interpretation, but to me, the mismatched attributed names jumped out as odd. I get that it's not super pleasant with needing the StandardInstruction/StandardInstructionType split anyway, and I don't feel strongly about which way you want to keep it.

jakelishman · 2024-12-18T00:21:39Z

crates/circuit/src/operations.rs

+#[derive(Clone, Copy, Debug, PartialEq, Eq)]
+#[pyclass(module = "qiskit._accelerate.circuit", eq, eq_int)]
+#[repr(u8)]
+pub(crate) enum StandardInstructionType {


Personally, I don't think I've seen any compelling reasons to use pub(crate) yet. Either it should be entirely private to this module, or it's useful outside the module, in which case it's likely to be useful outside the crate too. Feels to me that almost every time we've introduced a pub(crate), we've had a PR shortly after making it a pub, and in this case, it's logically exposed publicly to Python space anyway.

Should this type live here at all, or is it just an implementation detail of PackedOperation, and could move there?

(Though see other questions about the new PackedOperation implementation - maybe other stuff should move here.)

It is pub(crate) here only so that we can expose it to Python from lib.rs, IIRC. Perhaps that is not necessary and I was mistaken.

If it's needed to export for Python, it's fine (though if it's public to Python, is there no chance that it's useful elsewhere in Rust too?). I mostly just knee-jerk against pub(crate), I think, because it's caused us a fair amount of churn in the last few months.

jakelishman · 2024-12-18T00:24:47Z

crates/circuit/src/operations.rs

+        if label.is_some() || unit.is_some() || duration.is_some() || condition.is_some() {
+            let mut mutable = false;
+            if let Some(condition) = condition {
+                if !mutable {
+                    out = out.call_method0("to_mutable")?;
+                    mutable = true;
+                }
+                out.setattr("condition", condition)?;
+            }
+            if let Some(duration) = duration {
+                if !mutable {
+                    out = out.call_method0("to_mutable")?;
+                    mutable = true;
+                }
+                out.setattr("_duration", duration)?;
+            }
+            if let Some(unit) = unit {
+                if !mutable {
+                    out = out.call_method0("to_mutable")?;
+                }
+                out.setattr("_unit", unit)?;
+            }
+        }


How does the unit/duration stuff here interact with Delay, which expects those fields already?

jakelishman · 2024-12-18T00:25:38Z

crates/circuit/src/operations.rs

+    // Remember to update StandardGate::is_valid_bit_pattern below
+    // if you add or remove this enum's variants!


Weird that you added a comment here, but not on the StandardInstructionType introduced in this PR with the same consideration, but thanks for catching it!

This is more an artifact of my workflow, which tends to be fairly iterative. I noticed these were out of sync, so I left a comment and assumed I'd fix up the rest of the docs / comments for the PR in accordance once everything else was fleshed out.

jakelishman · 2024-12-18T00:59:58Z