Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rpcv2 cbor protocol support #3391

Open
wants to merge 20 commits into
base: rpcv2-cbor-protocol-support
Choose a base branch
from

Conversation

SamRemis
Copy link
Contributor

@SamRemis SamRemis commented Feb 13, 2025

Note that it is currently expected that a number of tests will fail as this needs to be rebased/updated after #3247 is merged. The protocol tests are non-functional and will continue to be until this rebase.

This also will fail some of the checks from the recently merged protocols trait support- we will need to add the Serializer in another PR as well as RPCv2CBOR to the priority ordered protocols list before this passes.

@SamRemis SamRemis marked this pull request as ready for review February 17, 2025 18:28
Copy link
Contributor

@nateprewitt nateprewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts, I'll let @jonathan343 do a full pass.

@@ -750,6 +752,164 @@ def _parse_body_as_json(self, body_contents):
# the literal string as the message
return {'message': body}

class BaseCBORParser(ResponseParser):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a docstring covering what this is for and intended usage like we have for the other parsers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, absolutely :)

Just to make sure we are on the same page- I don't see doc strings on most of the parser classes. There's one on the parent ResponseParser class, and a low-value one on the JSON parser:

    """Response parser for the "json" protocol."""

The JSON one is likely there to clarify that this is the json protocol as opposed to the rest-json or the BaseJsonParser which implements shared json logic.

Am I looking in the right place for the docstrings? Should I include more than just "Base class for protocols which use CBOR encoding"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON as a concept is pretty straight forward and ubiquitous. CBOR is a bit more involved and the code in general here is opaque to anyone that isn't at least moderately familiar with the spec. If we put this in front of Alessandra or Ujjwal, do you think they would be able to read through it without you there and without the spec?

That's what I'd like to try to get covered with comment depth. It doesn't need to be every line, but the reader also shouldn't need to go look up byte sequences to understand the line either.

botocore/parsers.py Show resolved Hide resolved
elif major_type == 7:
return self._parse_simple_and_float(stream, additional_info)
else:
raise ResponseParserError(f"Unsupported major type: {major_type}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this all the info we need to debug in this case? Should we be providing more of the stream?

botocore/parsers.py Outdated Show resolved Hide resolved
if tag == 1: # Epoch-based date/time
return self._parse_datetime(value)
else:
raise ResponseParserError(f"Unknown CBOR tag: {tag}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like we capture this anywhere, what is the expected user experience for this? If I get this error, how do I understand what happened or debug it?

botocore/parsers.py Outdated Show resolved Hide resolved
Comment on lines 1275 to 1326
# TODO test this
code = self._do_query_compatible_error_parse(
code, headers, error
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we testing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can't test this against a production service yet. No services support this yet. I've removed the TODO.

I brought this up to the author of the specification who said that this will be flagged and we will test this before a service goes live with support for the x-amzn-query-error header using RPCv2/CBOR

Comment on lines +10 to +12
# We ignore all the tag tests since none of them are supported tags in AWS.
# The majority of these aren't even defined tags in CBOR and just test that we
# can properly parse the number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does support within AWS matter for these cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests were written with generic language support in mind and are meant for SDKs where users are likely to take a dependency on their CBOR encoder/decoder without using AWS. These tests make more sense in modular or Smithy SDKs where using their encoder/decoder without AWS will be supported.

We can add support for more of these tags if necessary, but in our specification, only tag 1 is supported for the moment. Each tag requires its own parsing logic; there's not really value in us maintaining support for most tags.

As an example, it's hard to see a good reason for us to differentiate tag 257 (binary MIME messages) from major type 2 (byte strings).

These tests are checking that we can properly consume arbitrary tags like tag 257, but do not actually check any of the parsing logic. IMO we should fail if we encounter a tag we don't recognize - it means we won't be able to properly parse the data contained in the tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in our specification, only tag 1 is supported for the moment. Each tag requires its own parsing logic; there's not really value in us maintaining support for most tags.

Is this reflected in the spec that SDKs are not implementing the full spec and exactly what is/isn't supported? I'm not sure I understand what's stopping a service from using a new tag within the CBOR spec and having all old clients break. Did we already discuss this from a forwards compatibility standpoint? What do we do in that case?

What is the amount of code/effort to support a broader set of tags and what kinds of things do we need to handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The spec also mentions tags 2 through 4, so it may make sense to add support for those either in this PR or a follow up.

In an earlier (never published) version of this code, there was support for 2 and 3, but it was pretty hard to read code so I removed it since it's optional. It involves moderately intense math and is not considered required by the spec, so it may make more sense to leave them out for now. It was complex enough that I wasn't fully confident in what I wrote, but it passed some initial testing.

Forward compatibility was something I hadn't considered, but I'm not sure what the use case is for tags 2-4 and how likely it is an AWS service will use them. It seems unlikely that we will support any other tags than those four for the moment.

If you have a strong opinion on if we should add support for these, please share it. I'm pretty split.

tests/unit/cbor/test_cbor_decoding.py Outdated Show resolved Hide resolved
Comment on lines 68 to 93
for expected_key, value in expected.items():
if expected_key in ['null', 'undefined']:
assert actual is None
elif expected_key == 'bytestring':
assert actual == bytes(value)
elif expected_key == 'list':
assert isinstance(actual, list)
for act_val, exp_val in zip(actual, value):
_assert_expected_value(act_val, exp_val)
elif expected_key == 'map':
assert isinstance(actual, dict)
for key, val in value.items():
assert key in actual
_assert_expected_value(actual[key], val)
elif expected_key == 'tag':
assert actual.tag == value['id']
assert actual.value == value['value']['uint']
elif expected_key in ['float32', 'float64']:
struct_format = '<f' if expected_key == 'float32' else '<d'
packed_value = struct.pack(
'<I' if expected_key == 'float32' else '<Q', value
)
unpacked_value = struct.unpack(struct_format, packed_value)[0]
assert actual == unpacked_value
else:
assert actual == value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can, we really want to avoid test cases like this. I becomes very difficult to reason about what should fire when and you may end up with failing cases that are silent. We should have discrete assertions for each kind of test that don't rely on conditionals and are static for a given type of input.

@nateprewitt
Copy link
Contributor

Also a high level comment, we have a considerable amount of magic numbers/values in the PR. It may be good to create some data structure defining what everything is and referencing it there instead of repeating the numbers in each spot.

SamRemis and others added 14 commits February 17, 2025 20:06
These will fail until #boto#3247 is merged and this is properly rebased.  This is intentional - because there's this parallel work going on, this is an easy way to make sure we don't forget the step of merging the protocol tests
These tests are specific to CBOR and the CBOR parser and prescribed upstream as part of the spec.
- whitespace changes
- removed a bunch of copy/pasted comments
- removed some code that was also copy/pasted and not relevant to the new classes
- removed old TODOS, added some new ones
Add the removed tests back in and add an ignore list to skip the tests which are for generic, non AWS specific CBOR behavior that is not supported by us
* Add smithy generated response parsing protocol tests

* Implement granular protocol tests ignore list

* Resolve some parser issues
@SamRemis SamRemis force-pushed the rpcv2-cbor-protocol-support branch from 5cff2bd to ed6a74b Compare February 18, 2025 01:12
@SamRemis SamRemis closed this Feb 18, 2025
@SamRemis SamRemis force-pushed the rpcv2-cbor-protocol-support branch from ed6a74b to 51f1d36 Compare February 18, 2025 01:25
@SamRemis SamRemis reopened this Feb 18, 2025
Copy link
Contributor

@jonathan343 jonathan343 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments after skimming through the PR a bit in parallel to reading the CBOR spec for the first time. I think I have a general understanding and will be able to give a more thorough review tomorrow.

+----------+----------+ +------+-------+ +-------+------+ +------+-------+ +------+--------+
|BaseXMLResponseParser| |BaseRestParser| |BaseJSONParser| |BaseCBORParser| |BaseRpcV2Parser|
+---------------------+ +--------------+ +--------------+ +----------+---+ +-+-------------+
^ ^ ^ ^ ^ ^ | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - the arrows pointing away from RpcV2CBORParser are missing arrows.


The diagram above shows that there is a base class, ``ResponseParser`` that
contains logic that is similar amongst all the different protocols (``query``,
``json``, ``rest-json``, ``rest-xml``). Amongst the various services there
is shared logic that can be grouped several ways:
``json``, ``rest-json``, ``rest-xml``, ``RPCv2CBOR``). Amongst the various services
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``json``, ``rest-json``, ``rest-xml``, ``RPCv2CBOR``). Amongst the various services
``json``, ``rest-json``, ``rest-xml``, ``smithy-rpc-v2-cbor``). Amongst the various services

Comment on lines +795 to +811
if additional_info < 24:
return additional_info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the additional information is < 24, the parsed value is the value of the additional information:

Less than 24: The argument's value is the value of the additional information

The RFC linked below also provides more information for the other additional values in this method.
https://www.rfc-editor.org/rfc/rfc8949.html#section-3-3.2

)

def _parse_byte_string(self, stream, additional_info):
if additional_info == 31:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indefinite-length strings are represented by a byte containing the major type for byte string or text string with an additional information value of 31, followed by a series of zero or more strings of the specified type ("chunks") that have definite lengths, and finished by the "break" stop code (Section 3.2.1).

Source: https://www.rfc-editor.org/rfc/rfc8949.html#name-indefinite-length-byte-stri

This check is handling indefinite-length byte strings. We should have some comments here to make this more obvious in the code.

items = []
while True:
initial_byte = self._read_bytes_as_int(stream, 1)
if initial_byte == 0xFF:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make 0xFF into a constant to make it more obvious this is a break/terminal byte

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants