Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified Primitive Container Types #53

Merged
merged 11 commits into from
Nov 7, 2023
145 changes: 145 additions & 0 deletions 0016-base-primitive-unification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Base Primitive and Units of Primitive Work

| **Status** | **Proposed/Accepted/Deprecated** |
|:------------------|:---------------------------------------------|
| **RFC #** | 0016 |
| **Authors** | Lev Bishop ([email protected]) |
| | Ian Hincks ([email protected]) |
| | Blake Johnson ([email protected]) |
| | Chris Wood ([email protected]) |
| **Submitted** | 2023-10-26 |
| **Updated** | YYYY-MM-DD

## Summary

The primitives form an execution framework for QPUs and their simulators.
Each primitive specializes at some computational task.
For example, the `Estimator` specializes at computing expectation values of user-provider observables.

Different types of primitives are implemented as different abstract subclasses of `BasePrimitive` (e.g. `BaseEstimator`), and different primitive implementations as further subclasses (e.g. `qiskit_aer.primitives.Estimator`).
This RFC proposes the structure and format of various container types that can be used by base primitives to store input and output values.

## Motivation

The central motivation of this RFC is to more clearly emphasize that primitives are an execution framework on top of which applications (not just algorithms) can be built.
We aim to help each type of primitive to clearly state what units of quantum work it is able to perform.
We do this by proposing a set of containers that various `run()` can use (possibly subclassed) as input and output types.
ihincks marked this conversation as resolved.
Show resolved Hide resolved
In doing so, we are also proposing a set of names that can help us standardize the way we talk about execution in documentation, and throughout the stack.

## User Benefit

We expect that users will generally benefit from a more homogenous experience across the primitives, in terms of how they would submit jobs, and in how they would collect the data out of their results.
These containers would also make it possible for base primitives to have multiple, named output types.
This would enable, for example, primitives to have separate output fields for each output variable of a particular circuit, or to take standard error/confidence intervals out of the metadata and put it right alongside the first moment data.

## Design Proposal

We propose to encourage base primitives, like `BaseSampler` and `BaseEstimator`, to have a `run` method typed as

```python
class BaseFoo:
def run(self, tasks: FooTaskLike | Iterable[FooTaskLike]) -> Job[PrimitiveResult[FooTaskResult]]:
"""Run one or more input tasks and return a result for each one."""
```

where `FooTask` is equal to or derives from `Task`, `FooTaskLike` is a union type that is easily coercible into a `FooTask` via the static method `FooTask.coerce`, and where `FooTaskResult` derives from `TaskResult`.

The base containers `PrimitiveResult,`, `Task`, and `TaskResult` are described in the next sections.
`Job[T]` is any `qiskit.provider.JobV1` whose result method returns type `T`.

Any primitive following the above pattern could be used as follows:

```python
# instantiate the primitive
foo = Foo()

# run the primitive with three tasks
job = foo.run([task0, task1, task2])

# block for results
result = job.result()

# get data from second task
result1 = result[1]

# get particular data from this result (the available fields depend on the primitive type and task,
# and this example is not proposing any specific names)
alpha_data = result1.data.alpha
beta_data = result1.data.beta
expectation_values = result1.data.expectation_values
```

## Detailed Design

### Task

We propose the concept of a _task_, which we define as _a single circuit along with auxiliary data required to execute the circuit relative to the primitive in question_. This concept is general enough that it can be used for all primitive types, current and future, where we stress that what the “auxiliary data” is can vary between primitive types.

```python
# prototype implementation for the RFC
@dataclass(frozen=True)
class Task:
circuit: QuantumCircuit
```

Different primitive types (such as `Estimator`) are intended to subclass this class, adding auxiliary fields as
required.

### DataBin

A data bin is a namespace for storing data.
The fields in the namespace, in general, depend on the task that was executed (not the `type` of the task).
For example, a `Sampler` will have fields for each output (as defined by the OpenQASM 3 spec, but in Qiskit, you can currently take "output" to mean the names of the classical registers) of the task's circuit.
The value of each field will store the corresponding data.

All primitives will store their data in `DataBin`s.
There will not be subclassing to the effect of `SamplerDataBin < DataBin`; look instead to `TaskResult` for such needs.

```python
# Make a new DataBin class. make_data_bin() is analagous to dataclasses.make_dataclass().
blakejohnson marked this conversation as resolved.
Show resolved Hide resolved
# The shape is an optional argument which indicates that all values are to share the same leading shape.
# To put the following generic example into context, if it helps, imagine that this code lives in the
# BaseSampler implementation, and a data bin class is being created to store the results from
# particular SamplerTask instance whose circuit has two output registers named alpha and beta, and
# that the task itself has shape (5, 4).
data_bin_cls = make_data_bin({"alpha": NDArray[np.float], "beta": NDArray[np.uint8]}, shape=(5, 4))

# make an instance with particular data
data_bin = data_bin_cls(
alpha=np.empty((5, 4, 1024), dtype=np.float),
beta=np.empty((5, 4, 1024, 127))
)

# access items as attributes
alpha_data = data_bin.alpha

# access items with subscripts
alpha_data = data_bin["alpha"]
```

### TaskResult

A `TaskResult` is the result of running a single `Task` and does three things:

* Stores a `DataBin` instance containing the data from execution.
* Stores the metadata that is possibly implementation-specific, and always specific to the executed task.
* (subclasses) Contains methods to help transform the data into standard formats, including migration helpers.

We generally expect each primitive type to define its own subclass of `TaskResult` to accomodate the third item, though `TaskResult` has no methods that need to be abstract.

We elect to have a special container for the data (`DataBin`) so as not to pollute the `TaskResult` namespace with task-specific names, and to keep data quite distinct from metadata.

```python
# return a DataBin
task_result.data

# return a metadata dictionary
task_result.metadata
```

Comment on lines +120 to +139
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this implying that all subclasses of TaskResult must have exactly two stateful attributes: data and metadata? I think the implication of the above paragraphs is "yes", but it might be worth spelling out in the interface. It leads to some tricks, though - if the intent is to leave the class open to later expansion, the allowance of subclasses to define arbitrary methods gets in the way; without explicitly reserving other names, there's no safe point for expansion.

If the answer is instead "no", what's the intended purpose of forcibly putting metadata in a separate namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding. This is not how I'd actually do it, but close enough for discussion:

@dataclass(<appropriate choices>)
class TaskResult:
    data: DataBin
    metadata: dict[str, Any] = field(default_factory=dict)

the allowance of subclasses to define arbitrary methods gets in the way

I'm not sure how having a get_counts() method, for example, would get in the way.

what's the intended purpose of forcibly putting metadata in a separate namespace?

Is the alternative being considered here just putting the contents of metadata as attributes on TaskResult? This would mean that every implementation of Estimator would need its own EstimatorTaskResult in order to configure its own possible metadata values. It would also be quite annoying for workflow for IBM primitives, because if you wanted to modify the allowed metadata, you'd have to get the change into a tagged release of qiskit_ibm_runtime, then you'd need to get the tagged release to be the default version on the server-side, then you could do the thing you wanted to.


### PrimitiveResult

`PrimitiveResult` is the type returned by `Job.result()` and is primarily a `Sequence[TaskResult]`, with one entry for each `Task` input into the primitive's run method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super super minor, but Python's Sequence requires a few extra methods that we maybe don't want to actually imply - index and count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I don't want cluttering methods in the classes if we can avoid it. We can chat later about protocols.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of annoying how Sequence is something close but not equal to the thing everyone wants it to be.


Secondarily, it has a metadata field for those metadata which are not specific to any single task.