-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a valid command id in TC_IDM_1_4.py #33556
Conversation
The test TC_IDM_1_4.py used 0xffff_ffff as test command id. In fact this is not valid MEI according to specification. While chip seems to validate this fact later the test works. In matter.js such datatype validations are handled earlier and thats why we throw an contraint error because the value is wrong data type wise. I did not found any place in the specification where it is defined when such "semantic" data value checks should be done (unless some very specific error cases in some adapters). I think this is a theoretical topic in this case because it will hopefully never happen in real live devices 8and when it does the exact error should be irrelevant) for this test it matters because the test fails. With the change of the command ID to be a still unknown, but valid, value the issue is gone. I also think that tests should still use basically "correct" values. ;-) Thank you for considering this change.
PR #33556: Size comparison from eb515e1 to 58a14b0 Decreases (1 build for efr32)
Full report (83 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, mbed, nrfconnect, nxp, psoc6, qpg, stm32, telink)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding Request changes for the moment. If my understanding of section "8.8.2.3. Incoming Invoke Request Action" is correct, the order of operations are intended to be followed in that order. Adjusting the test here because a device is failing in an unexpected way, I don't think is the right way to fix this.
From my perspective we should really be able to fuzz any sort of invalid command path for the values and we should expect the same non-path specific error. It just so happens that this value we are sending here is tripping up a device, but some other invalid path is not.
Before I remove my request for change I would like confirmation that 8.8.2.3 is not intended to be followed in that order otherwise I think this is catching a real out of order spec violating issue.
Note: Lots of the parts of this test are intentionally sending out spec violating thing making sure the DUT behaves appropriately
@tehampson Hi and thank you for looking at my change. Your understanding of the specification as you reference relies on an iterative Tlv decoding/determination, so first you decode just the "action field" listed in 8.8.2.1 (matter spec 1.3) and not looking into the list content at all ... then you look into the list content when handling the values. That leads to the followup up question how an "semantically invalid MEI" should be handled afterwards? Falls it into the "Valid Command path" check (spec do not tell anything here about invalid values, but field combinations) or should it be "just" a non-existing path (so completely ignoring that the value is invalid semantically regarding the datamodel definition)? On a pure "data type" level it is correct: eg command-id is just defined as a uint32, not mentioning any semantics like MEI value range limitations which lead to the question above: Where and when such validation should be done and with which results/consequences? One example: ACL test TC_ACL_2_4 step 39 requires to decline an invalid cluster id (0xFFFFFFFF) with a Constraint error" ... and spec just tells very generically:
So in this case "invalid contents" is also considered an invalid MEI value which is like an invalid value. Thank you, Ingo |
Let me see if I am understanding this correctly: You are saying that the spec is unspecified with respect to decode order vs processing order for interactions. That in this case for InvokeRequestMessage we should fail with a decode error, and not even start processing it as an incoming request. If that is what you are saying then I somewhat agree that as of today this is unspecified in the spec. I can open a ticket about that and there might be specification added. Where I still don't know I fully have come around to is that you are returning a A Based on that description I do NOT think you should be sending a From my outside persective it looks like you have decode logic catering to passing a cluster related test for the values there, but re-using that decoder for something lower in the stack to return error codes that are different. Essentially I am coming around to your change, because I think because of the ambiguity between decode and processing currently is unspecified, and so we might be catching the wrong error here for some implementation. But I still don't think you are returning the right error code when the command_id is 0xFFFF_FFFF. |
No longer want to be blocking, but not yet approved as I want to see a comment added to test
Thank you very much and yes regarding contrains error vs invalid action you have a valid point. let me dive into spec for this. I come back :-) |
@tehampson I looked into it. In fact it would be awesome to have options to handle such validations generically, but in fact it's not. ACL cluster, as already told, expects a contraint error when any invalid cluster/group/deviceType id is provided in acl data on a write interaction. I adjusted my side already to be like the spec seems to be interpreted: The decoding mainly checks only data types, so decoding is not checking values on a semantic level. All more deep checks need to be done later in the code flows. So with this topic clarified I would mainly misss in the specs how an "semantically invalid command id (or clusterid)" should be handled fpor invokes (or other for read/write/subscribe. Here, again as I interpret the specs because it is not defined, this is simply irrelevant because for device side such invalid values should never be used for cluster-ids or such and so this is simply an "unsupported cluster" (in that example) error which would happen. means: no explicit validation required in this case. Sure, anything just assumptions on how I think the specs want to get interpreted ;-) If you do not see that different we could close here because in fact it is just another unusual edge case checked here which could be relevant to just make sure noone else stumbles over it with a differnt understanding. if the spec could use an enhancement to clarify such things is not on me to decide. WDYT? Thank you very much for the open and constructive chat about the topic! |
Let me start off by saying I am not an expert in the spec at all. I simply have worked heavily on batch commands, and am more familiar with that specific section of the spec
The ACL cluster is already at the application level. So it giving a
Like I said, I am okay with this change going in as is because of the unspecified nature of the spec between when decode should happen and when processing the invoke interaction. This test is trying to catch an error in the processing portion of spec, and because of the ambiguity some implementation might give a decode error of I just want to see my comment about adding a comment about why we are selecting 0xfff4_00ff addressed before I approve. |
Alright, before we iterate in code commits ... would that comment fulfill your requirements:
WDYT? Any better ideas? |
Looks good to me. Upload that and I will approve |
The test TC_IDM_1_4.py used 0xffff_ffff as test command id. In fact this is not valid MEI according to specification.
While chip seems to validate this fact later the test works.
In matter.js such datatype validations are handled earlier and thats why we throw an constraint error because the value is wrong data type wise.
I did not found any place in the specification where it is defined when such "semantic" data value checks should be done (unless some very specific error cases in some adapters).
I think this is a theoretical topic in this case because it will hopefully never happen in real live devices 8and when it does the exact error should be irrelevant) for this test it matters because the test fails.
With the change of the command ID to be a still unknown, but valid, value the issue is gone. I also think that tests should still use basically "correct" values. ;-)
Thank you for considering this change.