-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rpcv2 cbor protocol support #3391
base: rpcv2-cbor-protocol-support
Are you sure you want to change the base?
Rpcv2 cbor protocol support #3391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial thoughts, I'll let @jonathan343 do a full pass.
@@ -750,6 +752,164 @@ def _parse_body_as_json(self, body_contents): | |||
# the literal string as the message | |||
return {'message': body} | |||
|
|||
class BaseCBORParser(ResponseParser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a docstring covering what this is for and intended usage like we have for the other parsers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, absolutely :)
Just to make sure we are on the same page- I don't see doc strings on most of the parser classes. There's one on the parent ResponseParser
class, and a low-value one on the JSON parser:
"""Response parser for the "json" protocol."""
The JSON one is likely there to clarify that this is the json
protocol as opposed to the rest-json
or the BaseJsonParser
which implements shared json logic.
Am I looking in the right place for the docstrings? Should I include more than just "Base class for protocols which use CBOR encoding"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON as a concept is pretty straight forward and ubiquitous. CBOR is a bit more involved and the code in general here is opaque to anyone that isn't at least moderately familiar with the spec. If we put this in front of Alessandra or Ujjwal, do you think they would be able to read through it without you there and without the spec?
That's what I'd like to try to get covered with comment depth. It doesn't need to be every line, but the reader also shouldn't need to go look up byte sequences to understand the line either.
botocore/parsers.py
Outdated
elif major_type == 7: | ||
return self._parse_simple_and_float(stream, additional_info) | ||
else: | ||
raise ResponseParserError(f"Unsupported major type: {major_type}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this all the info we need to debug in this case? Should we be providing more of the stream?
botocore/parsers.py
Outdated
if tag == 1: # Epoch-based date/time | ||
return self._parse_datetime(value) | ||
else: | ||
raise ResponseParserError(f"Unknown CBOR tag: {tag}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like we capture this anywhere, what is the expected user experience for this? If I get this error, how do I understand what happened or debug it?
botocore/parsers.py
Outdated
# TODO test this | ||
code = self._do_query_compatible_error_parse( | ||
code, headers, error | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we testing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we can't test this against a production service yet. No services support this yet. I've removed the TODO.
I brought this up to the author of the specification who said that this will be flagged and we will test this before a service goes live with support for the x-amzn-query-error
header using RPCv2/CBOR
# We ignore all the tag tests since none of them are supported tags in AWS. | ||
# The majority of these aren't even defined tags in CBOR and just test that we | ||
# can properly parse the number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does support within AWS matter for these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests were written with generic language support in mind and are meant for SDKs where users are likely to take a dependency on their CBOR encoder/decoder without using AWS. These tests make more sense in modular or Smithy SDKs where using their encoder/decoder without AWS will be supported.
We can add support for more of these tags if necessary, but in our specification, only tag 1 is supported for the moment. Each tag requires its own parsing logic; there's not really value in us maintaining support for most tags.
As an example, it's hard to see a good reason for us to differentiate tag 257 (binary MIME messages) from major type 2 (byte strings).
These tests are checking that we can properly consume arbitrary tags like tag 257, but do not actually check any of the parsing logic. IMO we should fail if we encounter a tag we don't recognize - it means we won't be able to properly parse the data contained in the tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in our specification, only tag 1 is supported for the moment. Each tag requires its own parsing logic; there's not really value in us maintaining support for most tags.
Is this reflected in the spec that SDKs are not implementing the full spec and exactly what is/isn't supported? I'm not sure I understand what's stopping a service from using a new tag within the CBOR spec and having all old clients break. Did we already discuss this from a forwards compatibility standpoint? What do we do in that case?
What is the amount of code/effort to support a broader set of tags and what kinds of things do we need to handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The spec also mentions tags 2 through 4, so it may make sense to add support for those either in this PR or a follow up.
In an earlier (never published) version of this code, there was support for 2 and 3, but it was pretty hard to read code so I removed it since it's optional. It involves moderately intense math and is not considered required by the spec, so it may make more sense to leave them out for now. It was complex enough that I wasn't fully confident in what I wrote, but it passed some initial testing.
Forward compatibility was something I hadn't considered, but I'm not sure what the use case is for tags 2-4 and how likely it is an AWS service will use them. It seems unlikely that we will support any other tags than those four for the moment.
If you have a strong opinion on if we should add support for these, please share it. I'm pretty split.
for expected_key, value in expected.items(): | ||
if expected_key in ['null', 'undefined']: | ||
assert actual is None | ||
elif expected_key == 'bytestring': | ||
assert actual == bytes(value) | ||
elif expected_key == 'list': | ||
assert isinstance(actual, list) | ||
for act_val, exp_val in zip(actual, value): | ||
_assert_expected_value(act_val, exp_val) | ||
elif expected_key == 'map': | ||
assert isinstance(actual, dict) | ||
for key, val in value.items(): | ||
assert key in actual | ||
_assert_expected_value(actual[key], val) | ||
elif expected_key == 'tag': | ||
assert actual.tag == value['id'] | ||
assert actual.value == value['value']['uint'] | ||
elif expected_key in ['float32', 'float64']: | ||
struct_format = '<f' if expected_key == 'float32' else '<d' | ||
packed_value = struct.pack( | ||
'<I' if expected_key == 'float32' else '<Q', value | ||
) | ||
unpacked_value = struct.unpack(struct_format, packed_value)[0] | ||
assert actual == unpacked_value | ||
else: | ||
assert actual == value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can, we really want to avoid test cases like this. I becomes very difficult to reason about what should fire when and you may end up with failing cases that are silent. We should have discrete assertions for each kind of test that don't rely on conditionals and are static for a given type of input.
Also a high level comment, we have a considerable amount of magic numbers/values in the PR. It may be good to create some data structure defining what everything is and referencing it there instead of repeating the numbers in each spot. |
These will fail until #boto#3247 is merged and this is properly rebased. This is intentional - because there's this parallel work going on, this is an easy way to make sure we don't forget the step of merging the protocol tests
These tests are specific to CBOR and the CBOR parser and prescribed upstream as part of the spec.
- whitespace changes - removed a bunch of copy/pasted comments - removed some code that was also copy/pasted and not relevant to the new classes - removed old TODOS, added some new ones
Add the removed tests back in and add an ignore list to skip the tests which are for generic, non AWS specific CBOR behavior that is not supported by us
* Add smithy generated response parsing protocol tests * Implement granular protocol tests ignore list * Resolve some parser issues
5cff2bd
to
ed6a74b
Compare
ed6a74b
to
51f1d36
Compare
Co-authored-by: Nate Prewitt <[email protected]>
Co-authored-by: Nate Prewitt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments after skimming through the PR a bit in parallel to reading the CBOR spec for the first time. I think I have a general understanding and will be able to give a more thorough review tomorrow.
botocore/parsers.py
Outdated
+----------+----------+ +------+-------+ +-------+------+ +------+-------+ +------+--------+ | ||
|BaseXMLResponseParser| |BaseRestParser| |BaseJSONParser| |BaseCBORParser| |BaseRpcV2Parser| | ||
+---------------------+ +--------------+ +--------------+ +----------+---+ +-+-------------+ | ||
^ ^ ^ ^ ^ ^ | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - the arrows pointing away from RpcV2CBORParser
are missing arrows.
|
||
The diagram above shows that there is a base class, ``ResponseParser`` that | ||
contains logic that is similar amongst all the different protocols (``query``, | ||
``json``, ``rest-json``, ``rest-xml``). Amongst the various services there | ||
is shared logic that can be grouped several ways: | ||
``json``, ``rest-json``, ``rest-xml``, ``RPCv2CBOR``). Amongst the various services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``json``, ``rest-json``, ``rest-xml``, ``RPCv2CBOR``). Amongst the various services | |
``json``, ``rest-json``, ``rest-xml``, ``smithy-rpc-v2-cbor``). Amongst the various services |
if additional_info < 24: | ||
return additional_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the additional information is < 24, the parsed value is the value of the additional information:
Less than 24: The argument's value is the value of the additional information
The RFC linked below also provides more information for the other additional values in this method.
https://www.rfc-editor.org/rfc/rfc8949.html#section-3-3.2
botocore/parsers.py
Outdated
) | ||
|
||
def _parse_byte_string(self, stream, additional_info): | ||
if additional_info == 31: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indefinite-length strings are represented by a byte containing the major type for byte string or text string with an additional information value of 31, followed by a series of zero or more strings of the specified type ("chunks") that have definite lengths, and finished by the "break" stop code (Section 3.2.1).
Source: https://www.rfc-editor.org/rfc/rfc8949.html#name-indefinite-length-byte-stri
This check is handling indefinite-length byte strings. We should have some comments here to make this more obvious in the code.
botocore/parsers.py
Outdated
items = [] | ||
while True: | ||
initial_byte = self._read_bytes_as_int(stream, 1) | ||
if initial_byte == 0xFF: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make 0xFF
into a constant to make it more obvious this is a break/terminal byte
Note that it is currently expected that a number of tests will fail as this needs to be rebased/updated after #3247 is merged. The protocol tests are non-functional and will continue to be until this rebase.
This also will fail some of the checks from the recently merged
protocols
trait support- we will need to add the Serializer in another PR as well as RPCv2CBOR to the priority ordered protocols list before this passes.