Remove unnecessary circuit metadata #1315

nkanazawa1989 · 2023-11-09T03:03:23Z

Summary

As the number of qubits in the parallel experiment increases, the memory footprint of job payload communicated through wire becomes more serious issue. Some built-in experiment generates circuits with unnecessary metadata, which is

duplicated information existing in experiment metadata
duplicated information existing in every circuit metadata
un-used information in analysis

This PR removes such unnecessary fields from the circuit metadata to reduce payload size. For example, if an experiment is paired with a curve analysis, it only requires xval and other minimum keys necessary for model matching.

Details and comments

Tutorial is also updated.

…tips to tutorial.

coruscating

Thanks for cleaning up all the experiments and tests. This looks good overall, just some minor points.

qiskit_experiments/curve_analysis/curve_analysis.py

docs/tutorials/custom_experiment.rst

releasenotes/notes/upgrade-remove-circuit-metadata-ec7d3c6b08781184.yaml

### Summary This PR modifies `ScatterTable` which is introduced in #1253. This change resolves some code issues in #1315 and #1243. ### Details and comments In the original design `ScatterTable` is tied to the fit models, and the columns contains `model_name` (str) and `model_id` (int). Also the fit module only allows to have three categorical data; "processed", "formatted", "fitted". However, #1243 breaks this assumption, namely, the `StarkRamseyXYAmpScanAnalysis` fitter defines two fit models which are not directly mapped to the results data. The data fed into the models is synthesized by consuming the input results data. The fitter needs to manage four categorical data; "raw", "ramsey" (raw results), "phase" (synthesized data for fit), and "fitted". This PR relaxes the tight coupling of data to the fit model. In above example, "raw" and "ramsey" category data can fill new fields `name` (formally model_name) and `class_id` (model_id) without indicating a particular fit model. Usually, raw category data is just classified according to the `data_subfit_map` definition, and the map doesn't need to match with the fit models. The connection to fit models is only introduced in a particular category defined by new option value `fit_category`. This option defaults to "formatted", but `StarkRamseyXYAmpScanAnalysis` fitter would set "phase" instead. Thus fit model assignment is effectively delayed until the formatter function. Also the original scatter table is designed to store all circuit metadata which causes some problem in data formatting, especially when it tries to average the data over the same x value in the group. Non-numeric data is averaged by builtin set operation, but this assumes the metadata value is hashable object, which is not generally true. This PR also drops all metadata from the scatter table. Note that important metadata fields for the curve analysis are one used for model classification (classifier fields), and other fields just decorate the table with unnecessary memory footprint requirements. The classifier fields and `name` (`class_id`) are sort of duplicated information. This implies the `name` and `class_id` fields are enough for end-users to reuse the table data for further analysis once after it's saved as an artifact. --------- Co-authored-by: Will Shanks <[email protected]>

…nup/circuit_metadata

Co-authored-by: Will Shanks <[email protected]> Co-authored-by: Helena Zhang <[email protected]>

coruscating

LGTM👍

### Summary This PR modifies `ScatterTable` which is introduced in qiskit-community#1253. This change resolves some code issues in qiskit-community#1315 and qiskit-community#1243. ### Details and comments In the original design `ScatterTable` is tied to the fit models, and the columns contains `model_name` (str) and `model_id` (int). Also the fit module only allows to have three categorical data; "processed", "formatted", "fitted". However, qiskit-community#1243 breaks this assumption, namely, the `StarkRamseyXYAmpScanAnalysis` fitter defines two fit models which are not directly mapped to the results data. The data fed into the models is synthesized by consuming the input results data. The fitter needs to manage four categorical data; "raw", "ramsey" (raw results), "phase" (synthesized data for fit), and "fitted". This PR relaxes the tight coupling of data to the fit model. In above example, "raw" and "ramsey" category data can fill new fields `name` (formally model_name) and `class_id` (model_id) without indicating a particular fit model. Usually, raw category data is just classified according to the `data_subfit_map` definition, and the map doesn't need to match with the fit models. The connection to fit models is only introduced in a particular category defined by new option value `fit_category`. This option defaults to "formatted", but `StarkRamseyXYAmpScanAnalysis` fitter would set "phase" instead. Thus fit model assignment is effectively delayed until the formatter function. Also the original scatter table is designed to store all circuit metadata which causes some problem in data formatting, especially when it tries to average the data over the same x value in the group. Non-numeric data is averaged by builtin set operation, but this assumes the metadata value is hashable object, which is not generally true. This PR also drops all metadata from the scatter table. Note that important metadata fields for the curve analysis are one used for model classification (classifier fields), and other fields just decorate the table with unnecessary memory footprint requirements. The classifier fields and `name` (`class_id`) are sort of duplicated information. This implies the `name` and `class_id` fields are enough for end-users to reuse the table data for further analysis once after it's saved as an artifact. --------- Co-authored-by: Will Shanks <[email protected]>

### Summary As the number of qubits in the parallel experiment increases, the memory footprint of job payload communicated through wire becomes more serious issue. Some built-in experiment generates circuits with unnecessary metadata, which is - duplicated information existing in experiment metadata - duplicated information existing in every circuit metadata - un-used information in analysis This PR removes such unnecessary fields from the circuit metadata to reduce payload size. For example, if an experiment is paired with a curve analysis, it only requires `xval` and other minimum keys necessary for model matching. ### Details and comments Tutorial is also updated. --------- Co-authored-by: Will Shanks <[email protected]> Co-authored-by: Helena Zhang <[email protected]>

nkanazawa1989 added 4 commits November 9, 2023 11:50

Remove unnecessary circuit metadata from builtin experiments and add …

34c7bff

…tips to tutorial.

Better handling of union of metadata in curve analysis.

7cf91f1

reno

64e8186

Remove metadata dependency from experiment helper

8d8c4ba

coruscating reviewed Nov 13, 2023

View reviewed changes

qiskit_experiments/curve_analysis/curve_analysis.py Outdated Show resolved Hide resolved

qiskit_experiments/curve_analysis/curve_analysis.py Outdated Show resolved Hide resolved

docs/tutorials/custom_experiment.rst Outdated Show resolved Hide resolved

coruscating added this to the Release 0.6 milestone Nov 14, 2023

nkanazawa1989 mentioned this pull request Nov 14, 2023

Scatter table refactoring #1319

Merged

wshanks reviewed Nov 15, 2023

View reviewed changes

releasenotes/notes/upgrade-remove-circuit-metadata-ec7d3c6b08781184.yaml Outdated Show resolved Hide resolved

nkanazawa1989 and others added 2 commits January 10, 2024 11:35

Merge branch 'main' of github.com:Qiskit/qiskit-experiments into clea…

38f04d1

…nup/circuit_metadata

Wording suggestion

e0129d2

Co-authored-by: Will Shanks <[email protected]> Co-authored-by: Helena Zhang <[email protected]>

nkanazawa1989 force-pushed the cleanup/circuit_metadata branch from 6b98f05 to e0129d2 Compare January 10, 2024 02:39

Fix tutorial code

dcedd4f

coruscating approved these changes Jan 10, 2024

View reviewed changes

coruscating added this pull request to the merge queue Jan 10, 2024

Merged via the queue into qiskit-community:main with commit 79e0a69 Jan 11, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary circuit metadata #1315

Remove unnecessary circuit metadata #1315

nkanazawa1989 commented Nov 9, 2023 •

edited

Loading

coruscating left a comment

coruscating left a comment

Remove unnecessary circuit metadata #1315

Remove unnecessary circuit metadata #1315

Conversation

nkanazawa1989 commented Nov 9, 2023 • edited Loading

Summary

Details and comments

coruscating left a comment

Choose a reason for hiding this comment

coruscating left a comment

Choose a reason for hiding this comment

nkanazawa1989 commented Nov 9, 2023 •

edited

Loading