Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary circuit metadata #1315

Merged

Conversation

nkanazawa1989
Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 commented Nov 9, 2023

Summary

As the number of qubits in the parallel experiment increases, the memory footprint of job payload communicated through wire becomes more serious issue. Some built-in experiment generates circuits with unnecessary metadata, which is

  • duplicated information existing in experiment metadata
  • duplicated information existing in every circuit metadata
  • un-used information in analysis

This PR removes such unnecessary fields from the circuit metadata to reduce payload size. For example, if an experiment is paired with a curve analysis, it only requires xval and other minimum keys necessary for model matching.

Details and comments

Tutorial is also updated.

Copy link
Collaborator

@coruscating coruscating left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up all the experiments and tests. This looks good overall, just some minor points.

qiskit_experiments/curve_analysis/curve_analysis.py Outdated Show resolved Hide resolved
qiskit_experiments/curve_analysis/curve_analysis.py Outdated Show resolved Hide resolved
docs/tutorials/custom_experiment.rst Outdated Show resolved Hide resolved
@coruscating coruscating added this to the Release 0.6 milestone Nov 14, 2023
github-merge-queue bot pushed a commit that referenced this pull request Dec 22, 2023
### Summary

This PR modifies `ScatterTable` which is introduced in #1253.

This change resolves some code issues in #1315 and #1243.

### Details and comments

In the original design `ScatterTable` is tied to the fit models, and the
columns contains `model_name` (str) and `model_id` (int). Also the fit
module only allows to have three categorical data; "processed",
"formatted", "fitted". However, #1243 breaks this assumption, namely,
the `StarkRamseyXYAmpScanAnalysis` fitter defines two fit models which
are not directly mapped to the results data. The data fed into the
models is synthesized by consuming the input results data. The fitter
needs to manage four categorical data; "raw", "ramsey" (raw results),
"phase" (synthesized data for fit), and "fitted".

This PR relaxes the tight coupling of data to the fit model. In above
example, "raw" and "ramsey" category data can fill new fields `name`
(formally model_name) and `class_id` (model_id) without indicating a
particular fit model. Usually, raw category data is just classified
according to the `data_subfit_map` definition, and the map doesn't need
to match with the fit models. The connection to fit models is only
introduced in a particular category defined by new option value
`fit_category`. This option defaults to "formatted", but
`StarkRamseyXYAmpScanAnalysis` fitter would set "phase" instead. Thus
fit model assignment is effectively delayed until the formatter
function.

Also the original scatter table is designed to store all circuit
metadata which causes some problem in data formatting, especially when
it tries to average the data over the same x value in the group.
Non-numeric data is averaged by builtin set operation, but this assumes
the metadata value is hashable object, which is not generally true. This
PR also drops all metadata from the scatter table. Note that important
metadata fields for the curve analysis are one used for model
classification (classifier fields), and other fields just decorate the
table with unnecessary memory footprint requirements. The classifier
fields and `name` (`class_id`) are sort of duplicated information. This
implies the `name` and `class_id` fields are enough for end-users to
reuse the table data for further analysis once after it's saved as an
artifact.

---------

Co-authored-by: Will Shanks <[email protected]>
nkanazawa1989 and others added 2 commits January 10, 2024 11:35
@nkanazawa1989 nkanazawa1989 force-pushed the cleanup/circuit_metadata branch from 6b98f05 to e0129d2 Compare January 10, 2024 02:39
Copy link
Collaborator

@coruscating coruscating left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

@coruscating coruscating added this pull request to the merge queue Jan 10, 2024
Merged via the queue into qiskit-community:main with commit 79e0a69 Jan 11, 2024
11 checks passed
nkanazawa1989 added a commit to nkanazawa1989/qiskit-experiments that referenced this pull request Jan 17, 2024
### Summary

This PR modifies `ScatterTable` which is introduced in qiskit-community#1253.

This change resolves some code issues in qiskit-community#1315 and qiskit-community#1243.

### Details and comments

In the original design `ScatterTable` is tied to the fit models, and the
columns contains `model_name` (str) and `model_id` (int). Also the fit
module only allows to have three categorical data; "processed",
"formatted", "fitted". However, qiskit-community#1243 breaks this assumption, namely,
the `StarkRamseyXYAmpScanAnalysis` fitter defines two fit models which
are not directly mapped to the results data. The data fed into the
models is synthesized by consuming the input results data. The fitter
needs to manage four categorical data; "raw", "ramsey" (raw results),
"phase" (synthesized data for fit), and "fitted".

This PR relaxes the tight coupling of data to the fit model. In above
example, "raw" and "ramsey" category data can fill new fields `name`
(formally model_name) and `class_id` (model_id) without indicating a
particular fit model. Usually, raw category data is just classified
according to the `data_subfit_map` definition, and the map doesn't need
to match with the fit models. The connection to fit models is only
introduced in a particular category defined by new option value
`fit_category`. This option defaults to "formatted", but
`StarkRamseyXYAmpScanAnalysis` fitter would set "phase" instead. Thus
fit model assignment is effectively delayed until the formatter
function.

Also the original scatter table is designed to store all circuit
metadata which causes some problem in data formatting, especially when
it tries to average the data over the same x value in the group.
Non-numeric data is averaged by builtin set operation, but this assumes
the metadata value is hashable object, which is not generally true. This
PR also drops all metadata from the scatter table. Note that important
metadata fields for the curve analysis are one used for model
classification (classifier fields), and other fields just decorate the
table with unnecessary memory footprint requirements. The classifier
fields and `name` (`class_id`) are sort of duplicated information. This
implies the `name` and `class_id` fields are enough for end-users to
reuse the table data for further analysis once after it's saved as an
artifact.

---------

Co-authored-by: Will Shanks <[email protected]>
nkanazawa1989 added a commit to nkanazawa1989/qiskit-experiments that referenced this pull request Jan 17, 2024
### Summary

As the number of qubits in the parallel experiment increases, the memory
footprint of job payload communicated through wire becomes more serious
issue. Some built-in experiment generates circuits with unnecessary
metadata, which is
- duplicated information existing in experiment metadata
- duplicated information existing in every circuit metadata
- un-used information in analysis

This PR removes such unnecessary fields from the circuit metadata to
reduce payload size. For example, if an experiment is paired with a
curve analysis, it only requires `xval` and other minimum keys necessary
for model matching.

### Details and comments

Tutorial is also updated.

---------

Co-authored-by: Will Shanks <[email protected]>
Co-authored-by: Helena Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants