-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove marshmallow from Qobj #3383
Conversation
After a cursory glance, this is looking good @mtreinish, thank you for taking this on! In my mind, this really opens up the questions what is a |
That is basically the next step that I alluded to in the PR summary/commit message. My working plan here is for step one to be remove marshmallow from the existing structure (which this PR is the start of) so it is easier to maintain (and faster) and then keep that simplified interface around as a providers interface v1. We keep that v1 around for the foreseeable future for backwards compat (maybe deprecate it eventually) because that was our declared interface for defining different backends for some time. Then we start working on a new v2 providers interface that we can start encouraging people to switch to that basically just passes the |
That sounds excellent! 💯 |
Do you have some speedup comparisons here compared to marshmallow? i.e. do we get much further than what #3370 provided? Also have you looked at an example where validation fails and tried to debug it? My experience with
This is no longer the case with this PR: Qiskit/qiskit-aer#399 |
With or without validation? I started writing this before before #3370 so I'll have to regenerate numbers but I was testing this locally with 2 cases a 100x experiments of the same 53x2048 circuit with conditionals and 100x experiments of the same 1024x15 circuits with no conditionals. For these I was getting these number before: 100x 1024x15 (no conditionals):This PR (with validate=False):~33 seconds This PR (with validate=True):~1min 10 seconds Without this PR prior to #3370:~1mins 30seconds 100x 53x2048 (w/ conditionals):This PR (with validate=False):~2 min 10 seconds This PR (with validate=True):~7 min Without this PR prior to #3370:~8min 45seconds I'll run more benchmarks of things to get updated numbers and leave a comment here when I do. But from some quick testing the speed advantage is there but much less significant without validation on marshmallow, since the validation is by far the slowest thing. At least in the CI job for the benchmarks which are super noisy asv has confidence that two of the benchmarks are faster than noise: https://travis-ci.com/Qiskit/qiskit-terra/jobs/252812817#L1051
They're still harder to debug, I had to debug failures writing this by just calling
Actually it's still true even with that PR. You can see this in: https://github.com/Qiskit/qiskit-aer/pull/399/files#diff-48d2d11d8e8a070df52bfd3ba674c65bR122 the issue is that c aligned types like numpy arrays are not something the default json encoder can serialize (although its easy to work around as I documented in the PR). The to_dict() function with marshmallow is converting to the numpy arrays to python lists in all cases (both here and on the return where it's more important). The numpy arrays are much more efficient to pass back and forth. That's the big change of the output format in this PR is that it stops build pseudo-serialization output (it's not actually serialized it just an input for a bare |
Just finished running some numbers with my worst case assemble test case:
Running this on my desktop yielded these results over 10 runs each: Without this PRAverage assemble time of 3min 40.6 seconds With this PRAverage assemble time of 1min 58.8 seconds |
I'm actually a bit surprised how much quicker this PR is. I assumed marshmallow would be faster than that without validation enabled. I also made some recent changes to the PR which I thought would make it marginally slower. But, I guess other optimizations I've made to the branch since the last time I ran numbers countered that. |
Also I spent some time looking at the jsonschema error messages a bit. It turns out the jsonschema library has a method on raised exceptions It doesn't look like fastjsonschema has an equivalent option (https://horejsek.github.io/python-fastjsonschema/#fastjsonschema.JsonSchemaException ) and given how it generates code to execute the validation as fast as it can in python it doesn't seem like something we could likely to easily add there (but I haven't really investigated what it would involve). So the tradeoff becomes validation speed versus easier debugging. I chose fastjsonschema here because it's significantly faster than the jsonschema library and was trying to optimize when we do validate for speed. But since we only use the schema in testing and debugging I can change validation option here to use jsonschema if people prefer that. Another option is to use fastjsonschema and if/when we hit an error we fallback to use jsonschema to get a better error message. In the normal case it's fast but then in the failure mode it's extra slow. :) |
Does this PR fix #2900 too? |
It would for qobj by virtue of not using the marshmallow validators at all. But, it would still be valid for any of the other types that are still using marshmallow after this (backends, result, job status, etc). But once this merges I'll start working on similar conversion patches to those as well. This is the start of trying to remove marshmallow from all of those as a first step to decouple the serialization format from the interface between terra and providers/backends. |
From a bird's eye perspective, I think the change has some strong implications: if both Which can be solvable and/or desirable, but I would suggest not underestimating the maintenance/compatibility/updating factors if going through that route. Specs and stability in general are two problems we have been historically struggling with, and I think there is a risk of making the situation worse instead of better. For example, by ending up with a "v1.5" version of Qobj that does not really follow the finer points of the spec as "v1", but at the same time does not have the benefits of "v2" ... and in the process, couple the updates of each component more tightly (ie. having to maintain a terra spec, an agnostic spec, and a conversion spec, and keep them all in sync while they evolve, passing the burden to all providers, as the v1 specs are not really for ibmq - they are meant to be generic). My suggestion would be placing the focus in defining the "v2" specs (ie. the one that uses objects) if that has been decided it would be the way to go - and try to get them as stable as possible before replacing, without breaking backwards compatibility in the meantime if possible (or at least giving a very ample window for full compatibility). |
This commit removes Marshmallow from qobj. Marshamllow is used primarily to abstract away the conversion from serialization format from the python objects. However, this abstraction has a large cost. Primarily it makes it very difficult to use or figure out exactly how to construct these objects. This is mostly because of how marshmallow is used in terra which separates the schema definition from the object type. It means that there is no way to actually discover what fields (if any) are required. In many cases the marshmallow objects are just essential wrappers around dicts and only used to coerce types into the serialization format. But this is actually undesireable because qobj is not strictly a serialization format as it's used by terra. It is used as the transport object passed between different backends to describe circuits. When the backend is not remote (such as a local simulator) or not IBMQ and doesn't use qobj schema as the wire protocol this makes it more difficult to use. Especially for local backends because it means we have to do an expensive serialization for c native types and then convert it back just to run experiments in memory. This is also overkill because there are only really 2 types that need to be coerced from the output dict and that's not difficult enough to warrant an entire abstraction layer. This commit removes marshmallow and replaces Qobj with a class built using native python types. This is a drop in replacement for the marshmallow types with the exception that it doesn't json serialize the output dict (which as described before is an advantage). The actual serialization should be handled in the providers and in the case of qobj there are only 2 types which need to be coearced numpy arrays and complex numbers which can be accomplished with 6 lines defining a custom json encoder (this was needed for only 1 test). Also, by default validation is disabled everywhere since this is exceedingly slow and we should only need validation in testing to verify that things work as expected. There is flag on the to_dict() method for Qobj 'validate' which when set true will run it through fastjsonschema which is the fastest way we can validate the output. Follow on work from here is to look at using this model in all places marshmallow is used. We also need to revisit the class heirarchy because there are too many unecessary layers to this model which makes it harder to debug or use. But for backwards compatibility this was left alone for now.
Co-Authored-By: Lauren Capelluto <[email protected]>
This commit updates the __eq__ methods to simplify the logic slightly and make it easier to read. Additionally there were several classes in qiskit/qobj/qasm_qobj.py that were all essentially the same implementing a generic qobj/marsmallow wrapper to an arbitrary dictionary. This commit creates a single class, QobjDictField, that has all that logic and makes all those duplicate classes a subclass with overloaded __eq__ methods to have a stricter type check.
The 2 instruction clasess PulseQobjInstruction and QasmQobjInstruction were previously built around a dictionary attribute and all the attributes were either added or removed from this dictionary. This was done to simplifying the to_dict() method to basically be returning that dict. However, because of the normal attribute based access pattern for instructions and that the all the fields are defined up front (as opposed to the config and header classes which allow free form attributes) this makes the code needlessly complex. This commit switches both classes to use normal instance attributes to store the various fields and changes the to_dict() method to just build a dict by looping over the allowed attributes. While marginally slower this way it should not be noticeable.
The previous commit adding exception messages to the raised AttributeErrors had a small copy and paste issue where one __getattr__ method used the arg attr instead of the arg name. This caused an error because the variable name was not defined. This commit fixes this by updating the variable name used in the error message.
46b49cd
to
130be5d
Compare
In Qiskit#3383 we moved away from using marshmallow for Qobj objects to flat classes with normal attributes. However, in that migration we missed one parameter for the qasm qobj instruction class which had no test coverage in terra (and is only used by Aer), snapshot_type. This commit corrects the oversight and adds test coverage to ensure we don't miss it in the future.
* Add missing parameter from QasmQobjInstruction In #3383 we moved away from using marshmallow for Qobj objects to flat classes with normal attributes. However, in that migration we missed one parameter for the qasm qobj instruction class which had no test coverage in terra (and is only used by Aer), snapshot_type. This commit corrects the oversight and adds test coverage to ensure we don't miss it in the future. * Fix lint
After Qiskit/qiskit#3383 the conversion of types outside the standard set of types understood by the standard json encoder is the responsibility of the ibmq provider. However, there was one type missing from the custom encoder class added to handle this, ParameterExpression. For bound parameterized circuit the ParameterExpression class will be castable to a float to get the bound numeric value that should be sent on the wire with the job. However, the encoder doesn't have handling for this. Qiskit/qiskit#3959 is adding conversion for the to_dict() qobj method so that floats are output, but this a stop gap workaround while a better solution is being developed. We should have handling in qiskit-ibmq-provider if/when ParameterExpression objects get passed in via qobj object.
In Qiskit#3383 we removed marshmallow from the qobj objects a side effect of this was the type coercion to something that was serializable with a bare json.dumps() call was removed. In the majority of cases this wasn't necessary as this was either something easily handled in a json serializer class, or for aer (and other local simulators) are native types that casting adds unnecessary overhead. However, there was one edge case missed, the ParameterExpression class which is used for parameterized circuits. This is a terra class and while it's easy to fix in the ibmq provider for json serialization (although it hasn't been yet, a PR will be pushed for this shortly) Aer doesn't have native handling of this class. As a workaround until this is fixed the QasmQobjInstruction's to_dict() method will loop over gate parameters to check if there are any ParameterExpression objects and convert them to a float. There is likely a performance penalty for doing this, and when a more longstanding fix is built we should remove this.
…#3959) In #3383 we removed marshmallow from the qobj objects a side effect of this was the type coercion to something that was serializable with a bare json.dumps() call was removed. In the majority of cases this wasn't necessary as this was either something easily handled in a json serializer class, or for aer (and other local simulators) are native types that casting adds unnecessary overhead. However, there was one edge case missed, the ParameterExpression class which is used for parameterized circuits. This is a terra class and while it's easy to fix in the ibmq provider for json serialization (although it hasn't been yet, a PR will be pushed for this shortly) Aer doesn't have native handling of this class. As a workaround until this is fixed the QasmQobjInstruction's to_dict() method will loop over gate parameters to check if there are any ParameterExpression objects and convert them to a float. There is likely a performance penalty for doing this, and when a more longstanding fix is built we should remove this.
This commit adds a printable string representation for Qobj objects. In Qiskit#3383 we rebuilt the Qobj objects without using marshmallow for performance and ease of working with. However, as part of that no human readable representation for the objects was added. This commit corrects that oversight by adding a __str__ parameter to these new classes.
After Qiskit/qiskit#3383 the conversion of types outside the standard set of types understood by the standard json encoder is the responsibility of the ibmq provider. However, there was one type missing from the custom encoder class added to handle this, ParameterExpression. For bound parameterized circuit the ParameterExpression class will be castable to a float to get the bound numeric value that should be sent on the wire with the job. However, the encoder doesn't have handling for this. Qiskit/qiskit#3959 is adding conversion for the to_dict() qobj method so that floats are output, but this a stop gap workaround while a better solution is being developed. We should have handling in qiskit-ibmq-provider if/when ParameterExpression objects get passed in via qobj object.
As a follow on to Qiskit#3383 this continues the process of removing marshmallow from the provider interface in an effort to stabilize it before freezing it and introducing a new version. This commit removes all the usage of marshmallow schemas and models from the providers/ directory and replaces all the classes using it with flat classes and/or SimpleNamespace classes (if the models allowed arbitrary key value pairs).
* Add printable string representation for new Qobj objects This commit adds a printable string representation for Qobj objects. In #3383 we rebuilt the Qobj objects without using marshmallow for performance and ease of working with. However, as part of that no human readable representation for the objects was added. This commit corrects that oversight by adding a __str__ parameter to these new classes. * Add __repr__ in addition to the __str__ * Add missing kwarg key Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Remove marshmallow from providers/ As a follow on to #3383 this continues the process of removing marshmallow from the provider interface in an effort to stabilize it before freezing it and introducing a new version. This commit removes all the usage of marshmallow schemas and models from the providers/ directory and replaces all the classes using it with flat classes and/or SimpleNamespace classes (if the models allowed arbitrary key value pairs). * Handle parameters correctly in GateConfig.to_dict() The parameters attribute of the GateConfig class is a list of Nduv objects not a single Nduv. So when running to_dict() on a GateConfig object it's necessary to run to_dict() on each Nduv, not the list. This commit corrects this oversight, and adds a test for it. * Fix nesting for properties.qubits.to_dict() * Fix docs * Add release notes * Fix new docs issues * Correct oversights in new classes * Handle wire protocol input for PulseLibraryItems PulseLibraryItems are normally constructed either manually or by a provider with a pulse backend. In the case of ibmq and the mock backends these are constructed from the deserialized json dicts of the response (or the stored response) from the iqx api. While the pulselibraryitem class expects a list of complex numbers, the json format for iqx sends complex numbers as a list of the form [real, imag]. To ease the transition to being able to just directly passing a list of complex numbers this commit handles the case where a PulseLibraryItem is attempted to be created with samples in the json format. When a list of lists is receieved for the samples parameter the class will convert that to an array of complex numbers. Ideally in the longer term this will not be necessary and providers (both ibmq and the fake provider) will do this conversion for us. * Update qiskit/providers/models/backendproperties.py Co-Authored-By: Lauren Capelluto <[email protected]> * Don't force the 5 base job states Job statuses are often outside the 5 status that were previously defined in the marshmallow schema. This commit removes this restriction so providers can just set the status instead of overloading the class to change which status are allowed. * Fix lint Co-authored-by: Lauren Capelluto <[email protected]>
* Remove marshmallow from Qobj This commit removes Marshmallow from qobj. Marshamllow is used primarily to abstract away the conversion from serialization format from the python objects. However, this abstraction has a large cost. Primarily it makes it very difficult to use or figure out exactly how to construct these objects. This is mostly because of how marshmallow is used in terra which separates the schema definition from the object type. It means that there is no way to actually discover what fields (if any) are required. In many cases the marshmallow objects are just essential wrappers around dicts and only used to coerce types into the serialization format. But this is actually undesireable because qobj is not strictly a serialization format as it's used by terra. It is used as the transport object passed between different backends to describe circuits. When the backend is not remote (such as a local simulator) or not IBMQ and doesn't use qobj schema as the wire protocol this makes it more difficult to use. Especially for local backends because it means we have to do an expensive serialization for c native types and then convert it back just to run experiments in memory. This is also overkill because there are only really 2 types that need to be coerced from the output dict and that's not difficult enough to warrant an entire abstraction layer. This commit removes marshmallow and replaces Qobj with a class built using native python types. This is a drop in replacement for the marshmallow types with the exception that it doesn't json serialize the output dict (which as described before is an advantage). The actual serialization should be handled in the providers and in the case of qobj there are only 2 types which need to be coearced numpy arrays and complex numbers which can be accomplished with 6 lines defining a custom json encoder (this was needed for only 1 test). Also, by default validation is disabled everywhere since this is exceedingly slow and we should only need validation in testing to verify that things work as expected. There is flag on the to_dict() method for Qobj 'validate' which when set true will run it through fastjsonschema which is the fastest way we can validate the output. Follow on work from here is to look at using this model in all places marshmallow is used. We also need to revisit the class heirarchy because there are too many unecessary layers to this model which makes it harder to debug or use. But for backwards compatibility this was left alone for now. * Apply suggestions from code review Co-Authored-By: Lauren Capelluto <[email protected]> * Simplify __eq__ methods and dedup dict classes This commit updates the __eq__ methods to simplify the logic slightly and make it easier to read. Additionally there were several classes in qiskit/qobj/qasm_qobj.py that were all essentially the same implementing a generic qobj/marsmallow wrapper to an arbitrary dictionary. This commit creates a single class, QobjDictField, that has all that logic and makes all those duplicate classes a subclass with overloaded __eq__ methods to have a stricter type check. * Refactor instruction classes to be attribute based The 2 instruction clasess PulseQobjInstruction and QasmQobjInstruction were previously built around a dictionary attribute and all the attributes were either added or removed from this dictionary. This was done to simplifying the to_dict() method to basically be returning that dict. However, because of the normal attribute based access pattern for instructions and that the all the fields are defined up front (as opposed to the config and header classes which allow free form attributes) this makes the code needlessly complex. This commit switches both classes to use normal instance attributes to store the various fields and changes the to_dict() method to just build a dict by looping over the allowed attributes. While marginally slower this way it should not be noticeable. * Add error messages to raised attribute errors * Fix copy paste error The previous commit adding exception messages to the raised AttributeErrors had a small copy and paste issue where one __getattr__ method used the arg attr instead of the arg name. This caused an error because the variable name was not defined. This commit fixes this by updating the variable name used in the error message. * Rebase fixes * Add backwards compat shim for Qobj class * Fix lint * Fix lint again * Fix lint failure * Add support for parametric pulses * Add back missing type hint * Fix tests * Make PulseQobjInstruction.discriminators a QobjMeasurementOption list * Preserve required fields in PulseQobjConfig * Remove unnecessary super calls * Fix config type on PulsqQobj docstring * Add missing attributes from PulseQobj * Update discriminators from_dict() too * Fix docstring typo in QasmQobjInstruction * Update input checks for QasmQobjInstruction.__init__ * Deduplicate eq check for QobjDictField subclasses * Make qobj_id and config required for pulse * Use attributes instead of dict for QobjMeasurementOption This commit updates the structure of the QobjMeasurementOption class so that it uses attributes instead of a private dict. The dict usage only is only needed for classes that take arbitrary key value input (which is typically only config and header classes). We should be using standard attributes if the class doesn't need to handle arbitrary fields. * Make experiments a required field on PulseQobj * Make instructions a required param in PulseQobjExperiment The instructions parameter is required for PulseQobjExperiment, it also was in the wrong order for backwards compat (it should have been the first arg, not the last). This commit corrects the oversight. * Add missing kwargs from PulseQobjExperimentConfig * Move pulse jsonschema validation into a separate method * Fix lint again * Use QobjDictField for pulse qobj classes This commit makes the pulse qobj classes that take in arbitrary key value fields (mainly config) inherit from QobjDictField. QobjDictField contains the definitions needed for getattr, eq etc. The pulse qobj objects that inherit override things as necessary to ensure functionality but now rely on the base class where possible. * Pivot arbitrary key value classes to SimpleNamespace The arbitrary key value classes (namely headers and some config) were using a custom built class which basically just duplicated why the SimpleNamespace class offers from stdlib. This commit pivots those classes to just reuse SimpleNamespace class as the base instead of reinventing it. * Fix lint again * Apply suggestions from code review Co-Authored-By: Luciano Bello <[email protected]> * Fix docs warnings * Update SimpleNamespace usage * Fix pulse qobj docstring * Update qasm qobj based on review comments * Fix docs builds Co-authored-by: Lauren Capelluto <[email protected]> Co-authored-by: Luciano Bello <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Add missing parameter from QasmQobjInstruction In Qiskit#3383 we moved away from using marshmallow for Qobj objects to flat classes with normal attributes. However, in that migration we missed one parameter for the qasm qobj instruction class which had no test coverage in terra (and is only used by Aer), snapshot_type. This commit corrects the oversight and adds test coverage to ensure we don't miss it in the future. * Fix lint
…Qiskit#3959) In Qiskit#3383 we removed marshmallow from the qobj objects a side effect of this was the type coercion to something that was serializable with a bare json.dumps() call was removed. In the majority of cases this wasn't necessary as this was either something easily handled in a json serializer class, or for aer (and other local simulators) are native types that casting adds unnecessary overhead. However, there was one edge case missed, the ParameterExpression class which is used for parameterized circuits. This is a terra class and while it's easy to fix in the ibmq provider for json serialization (although it hasn't been yet, a PR will be pushed for this shortly) Aer doesn't have native handling of this class. As a workaround until this is fixed the QasmQobjInstruction's to_dict() method will loop over gate parameters to check if there are any ParameterExpression objects and convert them to a float. There is likely a performance penalty for doing this, and when a more longstanding fix is built we should remove this.
* Add printable string representation for new Qobj objects This commit adds a printable string representation for Qobj objects. In Qiskit#3383 we rebuilt the Qobj objects without using marshmallow for performance and ease of working with. However, as part of that no human readable representation for the objects was added. This commit corrects that oversight by adding a __str__ parameter to these new classes. * Add __repr__ in addition to the __str__ * Add missing kwarg key Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Remove marshmallow from providers/ As a follow on to Qiskit#3383 this continues the process of removing marshmallow from the provider interface in an effort to stabilize it before freezing it and introducing a new version. This commit removes all the usage of marshmallow schemas and models from the providers/ directory and replaces all the classes using it with flat classes and/or SimpleNamespace classes (if the models allowed arbitrary key value pairs). * Handle parameters correctly in GateConfig.to_dict() The parameters attribute of the GateConfig class is a list of Nduv objects not a single Nduv. So when running to_dict() on a GateConfig object it's necessary to run to_dict() on each Nduv, not the list. This commit corrects this oversight, and adds a test for it. * Fix nesting for properties.qubits.to_dict() * Fix docs * Add release notes * Fix new docs issues * Correct oversights in new classes * Handle wire protocol input for PulseLibraryItems PulseLibraryItems are normally constructed either manually or by a provider with a pulse backend. In the case of ibmq and the mock backends these are constructed from the deserialized json dicts of the response (or the stored response) from the iqx api. While the pulselibraryitem class expects a list of complex numbers, the json format for iqx sends complex numbers as a list of the form [real, imag]. To ease the transition to being able to just directly passing a list of complex numbers this commit handles the case where a PulseLibraryItem is attempted to be created with samples in the json format. When a list of lists is receieved for the samples parameter the class will convert that to an array of complex numbers. Ideally in the longer term this will not be necessary and providers (both ibmq and the fake provider) will do this conversion for us. * Update qiskit/providers/models/backendproperties.py Co-Authored-By: Lauren Capelluto <[email protected]> * Don't force the 5 base job states Job statuses are often outside the 5 status that were previously defined in the marshmallow schema. This commit removes this restriction so providers can just set the status instead of overloading the class to change which status are allowed. * Fix lint Co-authored-by: Lauren Capelluto <[email protected]>
* Remove marshmallow from providers/ As a follow on to Qiskit/qiskit#3383 this continues the process of removing marshmallow from the provider interface in an effort to stabilize it before freezing it and introducing a new version. This commit removes all the usage of marshmallow schemas and models from the providers/ directory and replaces all the classes using it with flat classes and/or SimpleNamespace classes (if the models allowed arbitrary key value pairs). * Handle parameters correctly in GateConfig.to_dict() The parameters attribute of the GateConfig class is a list of Nduv objects not a single Nduv. So when running to_dict() on a GateConfig object it's necessary to run to_dict() on each Nduv, not the list. This commit corrects this oversight, and adds a test for it. * Fix nesting for properties.qubits.to_dict() * Fix docs * Add release notes * Fix new docs issues * Correct oversights in new classes * Handle wire protocol input for PulseLibraryItems PulseLibraryItems are normally constructed either manually or by a provider with a pulse backend. In the case of ibmq and the mock backends these are constructed from the deserialized json dicts of the response (or the stored response) from the iqx api. While the pulselibraryitem class expects a list of complex numbers, the json format for iqx sends complex numbers as a list of the form [real, imag]. To ease the transition to being able to just directly passing a list of complex numbers this commit handles the case where a PulseLibraryItem is attempted to be created with samples in the json format. When a list of lists is receieved for the samples parameter the class will convert that to an array of complex numbers. Ideally in the longer term this will not be necessary and providers (both ibmq and the fake provider) will do this conversion for us. * Update qiskit/providers/models/backendproperties.py Co-Authored-By: Lauren Capelluto <[email protected]> * Don't force the 5 base job states Job statuses are often outside the 5 status that were previously defined in the marshmallow schema. This commit removes this restriction so providers can just set the status instead of overloading the class to change which status are allowed. * Fix lint Co-authored-by: Lauren Capelluto <[email protected]>
Summary
This commit removes Marshmallow from qobj. Marshamllow is used
primarily to abstract away the conversion from serialization format from
the python objects. However, this abstraction has a large cost.
Primarily it makes it very difficult to use or figure out exactly how
to construct these objects. This is mostly because of how marshmallow is
used in terra which separates the schema definition from the object
type. It means that there is no way to actually discover what fields (if
any) are required. In many cases the marshmallow objects are just
essential wrappers around dicts and only used to coerce types into the
serialization format. But this is actually undesirable because qobj is
not strictly a serialization format as it's used by terra. It is used as
the transport object passed between different backends to describe
circuits. When the backend is not remote (such as a local simulator) or
not IBMQ and doesn't use qobj schema as the wire protocol this makes it
more difficult to use. Especially for local backends because it means we
have to do an expensive serialization for c native types and then
convert it back just to run experiments in memory. This is also overkill
because there are only really 2 types that need to be coerced from the
output dict and that's not difficult enough to warrant an entire
abstraction layer.
This commit removes marshmallow and replaces Qobj with a class built
using native python types. This is a drop in replacement for the
marshmallow types with the exception that it doesn't json serialize the
output dict (which as described before is an advantage). The actual
serialization should be handled in the providers and in the case of qobj
there are only 2 types which need to be coerced numpy arrays and
complex numbers which can be accomplished with 6 lines defining a custom
json encoder (this was needed for only 1 test). Also, by default
validation is disabled everywhere since this is exceedingly slow and we
should only need validation in testing to verify that things work as
expected. There is flag on the to_dict() method for Qobj 'validate'
which when set true will run it through fastjsonschema which is the
fastest way we can validate the output.
Follow on work from here is to look at using this model in all places
marshmallow is used. After that we also need to revisit the class hierarchy
because there are too many unnecessary layers to this model which makes it
harder to debug or use. As part of this we should stop coupling the ibmq
payload format from what we pass between providers and terra. But for
backwards compatibility this was left alone for now.
Details and comments