Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified Primitive Container Types #53

Merged
merged 11 commits into from
Nov 7, 2023

Conversation

ihincks
Copy link
Contributor

@ihincks ihincks commented Oct 26, 2023

No description provided.

@blakejohnson
Copy link
Contributor

Comments due on this by EOD today.

@1ucian0 1ucian0 added the RFC proposal New RFC is proposed label Oct 27, 2023
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm kind of confused by this proposal. To me it feels like extra class heirarchy and inheritance for things that don't share much in common. Like besides the name run() the interface in BasePrimitive is so generic I feel like you define any function as long as you wrapped it's input in a Task and output in a Job and call it a primitive. I think defining common container classes for the I/O to primitives is reasonable but I'm unsure of what such a generic base class provides us in practice. Like specifically if you had the container format definitions in a shared module and just defined BaseEstimatorV2 and BaseSamplerV2 using those in the type signature wouldn't that work just as well with less code and complexity?

I guess maybe what my disconnect is that this is treating both the sampler and estimator as something you would want to potentially swap between, but in practice I don't think there is a use case for that. They're too unique in their usage to facilitate that in a generic way.


The central motivation of this RFC is to more clearly emphasize that primitives are a well-defined execution framework on top of which applications can be built.
We aim to force each category of primitive to clearly state what units of quantum work (tasks) it is able to perform, and what outputs can be expected.
It is understood that having the proposed abstract base method alone will not necessarily improve the day-to-day life of a typical user of the primitives, and neither will it necessarily improve the ease of implementing new primitive types or implementations: this change is mainly about clarifying the shared nature of the primitives, and reflecting this view in abstractions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand this. The motivation here is just to have a common subclass and to say that all primitives have a run() method via that common parent?

As an example, `qiskit_experiments` defines a suite of common diagnostic, calibration, and characterization experiments.
Currently, it is designed around the `Backend.run` execution interface.
One could imagine that, for example, instead of each `Experiment` providing a `circuits` attribute specifying the circuits to run through `Backend.run`, they could rather each specify a `tasks` attribute indicating which primitive units of work need performed.
The `qiskit_experiments` execution and analysis machinery could then rely on the abstraction of `BasePrimitive.run` when reasoning about dispatching and collecting results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the abstract classes defined below are so broadly defined I can't imagine anything ever saying anything accepted BasePrimitive as a type Specifically the estimator and sampler behave quite differently both in the allowed inputs and the data returned. Like for qiskit-experiments do you have a specific case where you think this applies?


class BasePrimitive(ABC, Generic[In, Out]):
@abstractmethod
def run(self, tasks: In | Iterable[In]) -> Job[PrimitiveResult[Out]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which Job class is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qiskit.provider.JobV1, where I am pretending it has a Generic[T]. Aside: any reason it couldn't?

@blakejohnson
Copy link
Contributor

I guess maybe what my disconnect is that this is treating both the sampler and estimator as something you would want to potentially swap between, but in practice I don't think there is a use case for that. They're too unique in their usage to facilitate that in a generic way.

Ian's example of qiskit-experiments is already a case where we'd like to accept user workloads that consist of either sampler or estimator calls. With this proposal, we at least have some very basic assumptions of how to execute those workloads. The basic promises are:

  • primitives have a run method
  • primitives operate on Tasks.
  • primitives return Sequence[TaskResult].

The proposal here seems to provide nice functionality to facilitate dispatch on the specific types of TaskResult in order to write generic code.

@ihincks
Copy link
Contributor Author

ihincks commented Oct 30, 2023

To me it feels like extra class heirarchy and inheritance

Note that this RFC doesn't actually call for any new classes or inheritance, it just calls for 3 new methods on the BasePrimitive, which already exists and is inherited from. I've heard the argument before, I forget from whom, that BasePrimitive doesn't actually exist to be a common structural abstraction, but rather to avoid code duplication of common features. I'm not sure I totally understand the distinction being made in that perspective, but in any case, I imagine it could be argued that result_type(cls, task: In) is doing the same thing.

@ihincks
Copy link
Contributor Author

ihincks commented Oct 30, 2023

my disconnect is that this is treating both the sampler and estimator as something you would want to potentially swap between, but in practice I don't think there is a use case for that

In case it helps, any particular algorithm or experiment certainly can't swap between the two of them without, if even possible at all, a re-implementation. However, application frameworks built on top of the primitives may wish to reason about the primitives more abstractly without explicitly enumerating them, or talking about their inputs as union types of various signatures.

@mtreinish
Copy link
Member

To me it feels like extra class heirarchy and inheritance

Note that this RFC doesn't actually call for any new classes or inheritance, it just calls for 3 new methods on the BasePrimitive, which already exists and is inherited from. I've heard the argument before, I forget from whom, that BasePrimitive doesn't actually exist to be a common structural abstraction, but rather to avoid code duplication of common features. I'm not sure I totally understand the distinction being made in that perspective, but in any case, I imagine it could be argued that result_type(cls, task: In) is doing the same thing.

It does implicitly mean new classes, because of #51 going to a new v2 version and also the proposed BasePrimitive here also would need to be done in a new v2 base class. The changes proposed here are incompatible with the existing BasePrimitive class and would cause issues with the current primitives interface. That being said from my perspective the existing BasePrimitive class doesn't actually provide any value, for the same reasons I outlined above. The methods that exist on it could easily have been handled as standalone validation functions, and IMO that would have been a cleaner interface.

@jakelishman
Copy link
Member

jakelishman commented Oct 30, 2023

TaskResult doesn't actually contain anything concrete in its interface after unwrapping the first abstract level - "DataBundle, metadata and any helper methods" have no concrete items or usable behaviour defined - so at the level of BasePrimitive, that's an opaque object. Similarly, Task is nearly completely opaque (and requiring that there's a circuits field typed QuantumCircuit feels like it could cause problems for interface expansions, like accepting OQ3 / a handle to a manually saved compiled circuit). Both give the impression of applying structure, but on those fronts, this RFC is only splitting one opaque object into three for TaskResult; the semantics of the inner objects remain opaque for abstract use. The result_type and _output_fields methods give a way to programmatically find the names of defined fields, but cannot specify what those fields actually mean and how they can be used, so any user of them must already be dispatching to something that is subclass aware. It's not clear to me that they have an abstract use, so I'd be concerned about including them in an interface.

The abstract part that Experiments could actually use from this interface (if I understand correctly) is that they want to do something like:

def run_experiment(primitive: BasePrimitive[TaskT, ResultT], tasks: Iterable[TaskT]) -> list[ResultT]:
    return primitive.run(tasks)

If the goal of this RFC is purely to ensure that the innermost "call" operation just has the same name between each primitive, that's ok by me, though personally I'd err on the side of being less constraining. It's not clear to me that this run method will certainly have exactly the same semantics for any new primitives that might be defined; could there be a future primitive that wants "reduction" semantics on its inputs rather than "broadcast", that might mean that only TaskResult and not Sequence[TaskResult] is appropriate as a return?

Given that Experiments will already be needing to dispatch on "is input Sampler or Estimator?", is it still worth having this abstraction to enable:

if is_sampler(primitive):
    tasks = generate_sampler_tasks(inputs)
else:
    tasks = generate_estimator_tasks(inputs)
# Typing is the abstract `Primitive.run`
results = primitive.run(tasks)
if is_sampler(primitive):
    # Required for strong typing, because the above call was abstract.
    results = typing.cast(results, SamplerResults)
    process_sampler_results(results)
else:
    results = typing.cast(results, EstimatorResults)
    process_estimator_results(results)

when the alternative would be:

if is_sampler(primitive):
    tasks = generate_sampler_tasks(inputs)
    # Now constrained to `Sampler.run`.
    results = primitive.run(tasks)
    process_sampler_results(results)
else:
    tasks = generate_estimator_tasks(inputs)
    # Constrained to `Estimator.run`
    results = primitive.run(tasks)
    process_estimator_results(results)

My feeling is that in this example, the former is not (sufficiently) better than the latter to motivate inclusion in a base interface.

@mtreinish
Copy link
Member

my disconnect is that this is treating both the sampler and estimator as something you would want to potentially swap between, but in practice I don't think there is a use case for that

In case it helps, any particular algorithm or experiment certainly can't swap between the two of them without, if even possible at all, a re-implementation. However, application frameworks built on top of the primitives may wish to reason about the primitives more abstractly without explicitly enumerating them, or talking about their inputs as union types of various signatures.

This is where I'm struggling with the value here TBH, because I'm not sure there is a use case where you would not want to use a union type. Lets use a concrete example. If you look at what exists in qiskit-algorithms in the abstract (not for specifics) there are two VQE implementations SamplingVQE and VQE which take either a sampler or estimator respectively. You could argue that one could wrap both of those in an OmniVQE that would take either a sampler or estimator and then internally dispatch to an implementation based on the input primitive (although there are enough caveats between the two that it's not necessarily practical). But it would never be safe to use BasePrimitive as an input type, only the union of types as the input type because it only works with BaseSampler or BaseEstimator and they're the only valid choices for an input. If someone were to locally create a FourierTransform primitive class that subclassed BasePrimitive and defined run using Task and TaskResult it would not be a valid input to OmniVQE. I believe this pattern would be true for anything that would potential want either sampler or estimator.

I'm struggling to think of a case where someone would create something that could handle any BasePrimitive subclass where the only definition is that it has a run() method which take Task as an input and a sequence of TaskResult as an output with the interface being so generic. I feel like it's a safer and cleaner interface to just declare a common interface around Task and TaskResult on each concrete primitive interface definition and document that as a requirement for any potential future new primitives instead of trying to enforce it via subclassing.

@blakejohnson
Copy link
Contributor

Based upon the feedback here, this RFC will be updated to focus on new primitive-centric container types instead of introducing a run(...) method to BasePrimitive.

@ihincks
Copy link
Contributor Author

ihincks commented Nov 2, 2023

Thanks @mtreinish and @jakelishman for your thoughtful comments. @jakelishman, I think you are exactly right to focus specifically on the mechanics of dispatching. While it might intuitively feel like, unifying to a common BasePrimitive.run would help with this problem, when you actually sit down and try it, it's not so obvious. Given that it also constrains what we can do in the future, it would be best not to make this unification until there's a concrete need for it.

@ihincks ihincks changed the title BasePrimitive.run Abstraction Unified Primitive Container Types Nov 6, 2023
Copy link
Contributor

@blakejohnson blakejohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, @ihincks

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this now in the reduced scope to just defining common data containers. Thanks for updating this.

0016-base-primitive-unification.md Outdated Show resolved Hide resolved
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Viewing this as a sort of "advisory" for new primitives interfaces, I'm fine with it bar the two minor comments I had, and that I don't feel really strongly about.

I think it's very good that this design document explicitly spells out that it's about the look and feel of the objects, rather than trying to enforce abstractions that don't permit any concrete use, and I can see the value in providing the helper classes.

I'm fine for this to merge as-is - neither of my comments are likely worth holding up the merge.

Comment on lines +120 to +139
### TaskResult

A `TaskResult` is the result of running a single `Task` and does three things:

* Stores a `DataBin` instance containing the data from execution.
* Stores the metadata that is possibly implementation-specific, and always specific to the executed task.
* (subclasses) Contains methods to help transform the data into standard formats, including migration helpers.

We generally expect each primitive type to define its own subclass of `TaskResult` to accomodate the third item, though `TaskResult` has no methods that need to be abstract.

We elect to have a special container for the data (`DataBin`) so as not to pollute the `TaskResult` namespace with task-specific names, and to keep data quite distinct from metadata.

```python
# return a DataBin
task_result.data

# return a metadata dictionary
task_result.metadata
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this implying that all subclasses of TaskResult must have exactly two stateful attributes: data and metadata? I think the implication of the above paragraphs is "yes", but it might be worth spelling out in the interface. It leads to some tricks, though - if the intent is to leave the class open to later expansion, the allowance of subclasses to define arbitrary methods gets in the way; without explicitly reserving other names, there's no safe point for expansion.

If the answer is instead "no", what's the intended purpose of forcibly putting metadata in a separate namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding. This is not how I'd actually do it, but close enough for discussion:

@dataclass(<appropriate choices>)
class TaskResult:
    data: DataBin
    metadata: dict[str, Any] = field(default_factory=dict)

the allowance of subclasses to define arbitrary methods gets in the way

I'm not sure how having a get_counts() method, for example, would get in the way.

what's the intended purpose of forcibly putting metadata in a separate namespace?

Is the alternative being considered here just putting the contents of metadata as attributes on TaskResult? This would mean that every implementation of Estimator would need its own EstimatorTaskResult in order to configure its own possible metadata values. It would also be quite annoying for workflow for IBM primitives, because if you wanted to modify the allowed metadata, you'd have to get the change into a tagged release of qiskit_ibm_runtime, then you'd need to get the tagged release to be the default version on the server-side, then you could do the thing you wanted to.


### PrimitiveResult

`PrimitiveResult` is the type returned by `Job.result()` and is primarily a `Sequence[TaskResult]`, with one entry for each `Task` input into the primitive's run method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super super minor, but Python's Sequence requires a few extra methods that we maybe don't want to actually imply - index and count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I don't want cluttering methods in the classes if we can avoid it. We can chat later about protocols.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of annoying how Sequence is something close but not equal to the thing everyone wants it to be.

@blakejohnson blakejohnson merged commit 0561dbf into Qiskit:master Nov 7, 2023
@blakejohnson
Copy link
Contributor

Further discussion may continue in issues, PRs, and Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC proposal New RFC is proposed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants