Rpcv2 cbor protocol support #3391

SamRemis · 2025-02-13T02:32:52Z

Note that it is currently expected that a number of tests will fail as this needs to be rebased/updated after #3247 is merged. The protocol tests are non-functional and will continue to be until this rebase.

This also will fail some of the checks from the recently merged protocols trait support- we will need to add the Serializer in another PR as well as RPCv2CBOR to the priority ordered protocols list before this passes.

nateprewitt

Some initial thoughts, I'll let @jonathan343 do a full pass.

nateprewitt · 2025-02-17T20:46:08Z

botocore/parsers.py

@@ -750,6 +752,164 @@ def _parse_body_as_json(self, body_contents):
            # the literal string as the message
            return {'message': body}

+class BaseCBORParser(ResponseParser):


Can we get a docstring covering what this is for and intended usage like we have for the other parsers?

Yeah, absolutely :)

Just to make sure we are on the same page- I don't see doc strings on most of the parser classes. There's one on the parent ResponseParser class, and a low-value one on the JSON parser:

"""Response parser for the "json" protocol."""

The JSON one is likely there to clarify that this is the json protocol as opposed to the rest-json or the BaseJsonParser which implements shared json logic.

Am I looking in the right place for the docstrings? Should I include more than just "Base class for protocols which use CBOR encoding"?

JSON as a concept is pretty straight forward and ubiquitous. CBOR is a bit more involved and the code in general here is opaque to anyone that isn't at least moderately familiar with the spec. If we put this in front of Alessandra or Ujjwal, do you think they would be able to read through it without you there and without the spec?

That's what I'd like to try to get covered with comment depth. It doesn't need to be every line, but the reader also shouldn't need to go look up byte sequences to understand the line either.

botocore/parsers.py

nateprewitt · 2025-02-17T20:50:12Z

botocore/parsers.py

+        elif major_type == 7:
+            return self._parse_simple_and_float(stream, additional_info)
+        else:
+            raise ResponseParserError(f"Unsupported major type: {major_type}")


Is this all the info we need to debug in this case? Should we be providing more of the stream?

botocore/parsers.py

nateprewitt · 2025-02-17T20:53:40Z

botocore/parsers.py

+        if tag == 1:  # Epoch-based date/time
+            return self._parse_datetime(value)
+        else:
+            raise ResponseParserError(f"Unknown CBOR tag: {tag}")


It doesn't look like we capture this anywhere, what is the expected user experience for this? If I get this error, how do I understand what happened or debug it?

botocore/parsers.py

nateprewitt · 2025-02-17T21:22:54Z

botocore/parsers.py

+                # TODO test this
+                code = self._do_query_compatible_error_parse(
+                    code, headers, error
+                )


How are we testing this?

Unfortunately we can't test this against a production service yet. No services support this yet. I've removed the TODO.

I brought this up to the author of the specification who said that this will be flagged and we will test this before a service goes live with support for the x-amzn-query-error header using RPCv2/CBOR

nateprewitt · 2025-02-17T21:24:38Z

tests/unit/cbor/test_cbor_decoding.py

+    # We ignore all the tag tests since none of them are supported tags in AWS.
+    # The majority of these aren't even defined tags in CBOR and just test that we
+    # can properly parse the number


Why does support within AWS matter for these cases?

These tests were written with generic language support in mind and are meant for SDKs where users are likely to take a dependency on their CBOR encoder/decoder without using AWS. These tests make more sense in modular or Smithy SDKs where using their encoder/decoder without AWS will be supported.

We can add support for more of these tags if necessary, but in our specification, only tag 1 is supported for the moment. Each tag requires its own parsing logic; there's not really value in us maintaining support for most tags.

As an example, it's hard to see a good reason for us to differentiate tag 257 (binary MIME messages) from major type 2 (byte strings).

These tests are checking that we can properly consume arbitrary tags like tag 257, but do not actually check any of the parsing logic. IMO we should fail if we encounter a tag we don't recognize - it means we won't be able to properly parse the data contained in the tag.

but in our specification, only tag 1 is supported for the moment. Each tag requires its own parsing logic; there's not really value in us maintaining support for most tags.

Is this reflected in the spec that SDKs are not implementing the full spec and exactly what is/isn't supported? I'm not sure I understand what's stopping a service from using a new tag within the CBOR spec and having all old clients break. Did we already discuss this from a forwards compatibility standpoint? What do we do in that case?

What is the amount of code/effort to support a broader set of tags and what kinds of things do we need to handle?

Good question. The spec also mentions tags 2 through 4, so it may make sense to add support for those either in this PR or a follow up.

In an earlier (never published) version of this code, there was support for 2 and 3, but it was pretty hard to read code so I removed it since it's optional. It involves moderately intense math and is not considered required by the spec, so it may make more sense to leave them out for now. It was complex enough that I wasn't fully confident in what I wrote, but it passed some initial testing.

Forward compatibility was something I hadn't considered, but I'm not sure what the use case is for tags 2-4 and how likely it is an AWS service will use them. It seems unlikely that we will support any other tags than those four for the moment.

If you have a strong opinion on if we should add support for these, please share it. I'm pretty split.

tests/unit/cbor/test_cbor_decoding.py

nateprewitt · 2025-02-17T22:11:30Z

tests/unit/cbor/test_cbor_decoding.py

+    for expected_key, value in expected.items():
+        if expected_key in ['null', 'undefined']:
+            assert actual is None
+        elif expected_key == 'bytestring':
+            assert actual == bytes(value)
+        elif expected_key == 'list':
+            assert isinstance(actual, list)
+            for act_val, exp_val in zip(actual, value):
+                _assert_expected_value(act_val, exp_val)
+        elif expected_key == 'map':
+            assert isinstance(actual, dict)
+            for key, val in value.items():
+                assert key in actual
+                _assert_expected_value(actual[key], val)
+        elif expected_key == 'tag':
+            assert actual.tag == value['id']
+            assert actual.value == value['value']['uint']
+        elif expected_key in ['float32', 'float64']:
+            struct_format = '<f' if expected_key == 'float32' else '<d'
+            packed_value = struct.pack(
+                '<I' if expected_key == 'float32' else '<Q', value
+            )
+            unpacked_value = struct.unpack(struct_format, packed_value)[0]
+            assert actual == unpacked_value
+        else:
+            assert actual == value


If we can, we really want to avoid test cases like this. I becomes very difficult to reason about what should fire when and you may end up with failing cases that are silent. We should have discrete assertions for each kind of test that don't rely on conditionals and are static for a given type of input.

nateprewitt · 2025-02-17T22:13:07Z

Also a high level comment, we have a considerable amount of magic numbers/values in the PR. It may be good to create some data structure defining what everything is and referencing it there instead of repeating the numbers in each spot.

These will fail until #boto#3247 is merged and this is properly rebased. This is intentional - because there's this parallel work going on, this is an easy way to make sure we don't forget the step of merging the protocol tests

These tests are specific to CBOR and the CBOR parser and prescribed upstream as part of the spec.

- whitespace changes - removed a bunch of copy/pasted comments - removed some code that was also copy/pasted and not relevant to the new classes - removed old TODOS, added some new ones

Add the removed tests back in and add an ignore list to skip the tests which are for generic, non AWS specific CBOR behavior that is not supported by us

* Add smithy generated response parsing protocol tests * Implement granular protocol tests ignore list * Resolve some parser issues

Co-authored-by: Nate Prewitt <[email protected]>

jonathan343

Left a few comments after skimming through the PR a bit in parallel to reading the CBOR spec for the first time. I think I have a general understanding and will be able to give a more thorough review tomorrow.

jonathan343 · 2025-02-18T21:29:48Z

botocore/parsers.py

+----------+----------+       +------+-------+     +-------+------+  +------+-------+   +------+--------+
+|BaseXMLResponseParser|       |BaseRestParser|     |BaseJSONParser|  |BaseCBORParser|   |BaseRpcV2Parser|
+---------------------+       +--------------+     +--------------+  +----------+---+   +-+-------------+
+          ^         ^          ^           ^        ^        ^                  |         |


nit - the arrows pointing away from RpcV2CBORParser are missing arrows.

jonathan343 · 2025-02-18T21:30:39Z

botocore/parsers.py


 The diagram above shows that there is a base class, ``ResponseParser`` that
 contains logic that is similar amongst all the different protocols (``query``,
-``json``, ``rest-json``, ``rest-xml``).  Amongst the various services there
-is shared logic that can be grouped several ways:
+``json``, ``rest-json``, ``rest-xml``, ``RPCv2CBOR``).  Amongst the various services


Suggested change

``json``, ``rest-json``, ``rest-xml``, ``RPCv2CBOR``). Amongst the various services

``json``, ``rest-json``, ``rest-xml``, ``smithy-rpc-v2-cbor``). Amongst the various services

jonathan343 · 2025-02-19T01:09:12Z

botocore/parsers.py

+        if additional_info < 24:
+            return additional_info


When the additional information is < 24, the parsed value is the value of the additional information:

Less than 24: The argument's value is the value of the additional information

The RFC linked below also provides more information for the other additional values in this method.
https://www.rfc-editor.org/rfc/rfc8949.html#section-3-3.2

jonathan343 · 2025-02-19T01:15:43Z

botocore/parsers.py

+            )
+
+    def _parse_byte_string(self, stream, additional_info):
+        if additional_info == 31:


Indefinite-length strings are represented by a byte containing the major type for byte string or text string with an additional information value of 31, followed by a series of zero or more strings of the specified type ("chunks") that have definite lengths, and finished by the "break" stop code (Section 3.2.1).

Source: https://www.rfc-editor.org/rfc/rfc8949.html#name-indefinite-length-byte-stri

This check is handling indefinite-length byte strings. We should have some comments here to make this more obvious in the code.

jonathan343 · 2025-02-19T01:19:38Z

botocore/parsers.py

+            items = []
+            while True:
+                initial_byte = self._read_bytes_as_int(stream, 1)
+                if initial_byte == 0xFF:


Maybe we should make 0xFF into a constant to make it more obvious this is a break/terminal byte

SamRemis marked this pull request as ready for review February 17, 2025 18:28

nateprewitt reviewed Feb 17, 2025

View reviewed changes

SamRemis and others added 14 commits February 17, 2025 20:06

Adds the protocol tests

ffa3932

These will fail until #boto#3247 is merged and this is properly rebased. This is intentional - because there's this parallel work going on, this is an easy way to make sure we don't forget the step of merging the protocol tests

Add cbor decoding tests

0c1b19a

These tests are specific to CBOR and the CBOR parser and prescribed upstream as part of the spec.

Add initial draft of parsers.py

1d102bc

Run pre-commit and delete outdated TODOs

db3d8b0

general cleanup

5e64220

- whitespace changes - removed a bunch of copy/pasted comments - removed some code that was also copy/pasted and not relevant to the new classes - removed old TODOS, added some new ones

Update decoder tests

06c1152

Add the removed tests back in and add an ignore list to skip the tests which are for generic, non AWS specific CBOR behavior that is not supported by us

Properly handle event streams

f120ef3

Remove unnecessary method

cb427e2

minor readability improvement for integer parsing

80a15fa

refactor int parsing to use int.from_bytes

c75b1bb

Update Response Parsing Protocol Tests (boto#3247)

f08f9bf

* Add smithy generated response parsing protocol tests * Implement granular protocol tests ignore list * Resolve some parser issues

Remove unnecessary method

2e28e23

Add protocol tests support, add support for tagged unions

5688c4b

git rebase

ed6a74b

SamRemis force-pushed the rpcv2-cbor-protocol-support branch from 5cff2bd to ed6a74b Compare February 18, 2025 01:12

SamRemis closed this Feb 18, 2025

SamRemis force-pushed the rpcv2-cbor-protocol-support branch from ed6a74b to 51f1d36 Compare February 18, 2025 01:25

SamRemis reopened this Feb 18, 2025

SamRemis and others added 4 commits February 17, 2025 20:55

Delete enhancement-Parsers-84233.json

5789660

Apply suggestions from code review

ce7a58a

Co-authored-by: Nate Prewitt <[email protected]>

First round of updates based on feedback

8d455de

Update botocore/parsers.py

7221ab1

Co-authored-by: Nate Prewitt <[email protected]>

jonathan343 reviewed Feb 19, 2025

View reviewed changes

SamRemis added 2 commits February 18, 2025 21:34

More updates based on feedback

8f3fadf

readability improvements

0a8ee5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rpcv2 cbor protocol support #3391

Rpcv2 cbor protocol support #3391

SamRemis commented Feb 13, 2025 •

edited

Loading

nateprewitt left a comment

nateprewitt Feb 17, 2025

SamRemis Feb 18, 2025

nateprewitt Feb 18, 2025

nateprewitt Feb 17, 2025

nateprewitt Feb 17, 2025

nateprewitt Feb 17, 2025

SamRemis Feb 18, 2025

nateprewitt Feb 17, 2025

SamRemis Feb 18, 2025

nateprewitt Feb 18, 2025

SamRemis Feb 19, 2025

nateprewitt Feb 17, 2025

nateprewitt commented Feb 17, 2025

jonathan343 left a comment

jonathan343 Feb 18, 2025

jonathan343 Feb 18, 2025

jonathan343 Feb 19, 2025

jonathan343 Feb 19, 2025

jonathan343 Feb 19, 2025

	``json``, ``rest-json``, ``rest-xml``, ``RPCv2CBOR``). Amongst the various services
	``json``, ``rest-json``, ``rest-xml``, ``smithy-rpc-v2-cbor``). Amongst the various services

Rpcv2 cbor protocol support #3391

Are you sure you want to change the base?

Rpcv2 cbor protocol support #3391

Conversation

SamRemis commented Feb 13, 2025 • edited Loading

nateprewitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nateprewitt commented Feb 17, 2025

jonathan343 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamRemis commented Feb 13, 2025 •

edited

Loading