Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Platform] kUnknownEnumValue grabs first unused enum value which need to be enforced in perpetuity #24368

Closed
tehampson opened this issue Jan 11, 2023 · 13 comments · Fixed by #26299

Comments

@tehampson
Copy link
Contributor

Reproduction steps

Introduced in: #20907

We are starting to see structs like this, where author adding to the enum to the xml needs to know that kUnknownEnumValue was already taken in the second spot so they intentionally skip over it:

enum class DlAlarmCode : uint8_t
{
    kLockJammed              = 0x00,
    kLockFactoryReset        = 0x01,
    kLockRadioPowerCycled    = 0x03,
    kWrongCodeEntryLimit     = 0x04,
    kFrontEsceutcheonRemoved = 0x05,
    kDoorForcedOpen          = 0x06,
    kDoorAjar                = 0x07,
    kForcedUser              = 0x08,
    kUnknownEnumValue        = 2,
};

The concern is what happens when there is a mismatch between client and server. How are we enforcing that once an enum has claimed a kUnknownEnumValue as a particular value that it never changes? How are people aware that when they add a new value to the end of an enum list for 1.y version that an older 1.x doesn't interpret that value as kUnknownEnumValue

Platform

core (please add to version below)

Platform Version(s)

1.0

Type

Manually tested with SDK

(Optional) If manually tested please explain why this is only manually tested

No response

Anything else?

No response

@tehampson tehampson self-assigned this Jan 11, 2023
@bzbarsky-apple
Copy link
Contributor

We discussed this back when we introduced this, in fact, and it's fine. There are four possible scenarios when a new value is added to the spec that collides with the generated unknown value:

  1. Client and server are both on the old SDK revision. There is no problem.
  2. Client and server are both on the new SDK revision. There is no problem.
  3. Client is on the old SDK revision and server is on the new one. Server will select a different unknown value, client never sends the old unknown value (since it does not know about it), and there is no problem.
  4. Client is on the new SDK revision, server is on the old one. If the client sends the old unknown (now standard) value, it should in fact be treated as unknown by the server, because the server does not know about it. So the fact that it collides with "unknown value" is not a problem: that's what it should be treated as anyway.

This all works as long as the server does not store the unknown value (which it should not, since it's not a spec-defined value).

@tehampson
Copy link
Contributor Author

Not sure I agree with #3:

Client is on the old SDK revision and server is on the new one. Server will select a different unknown value, client never sends the old unknown value (since it does not know about it), and there is no problem.

Lets say old client has enum:

enum class Foobar : uint8_t
{
    kFirst = 0x00,
    kUnknownEnumValue = 1,
};

New server has:

enum class Foobar : uint8_t
{
    kFirst = 0x00,
    kSecond = 0x01,
    kThird = 0x02,
    kUnknownEnumValue = 3,
};

If the server sets the value as kThird = 0x02 the client might assign it the value of kUnknownEnumValue = 1, when server goes to read back the value it thinks the client has a value of kSecond = 0x01

@bzbarsky-apple
Copy link
Contributor

How would the server "go to read back the value"? The server can't read from the client.

You're right that there is an issue if the client does a read from the server, gets "unknown value", does not check for that, and then echoes it back to the server. But the client should not be doing that, because then from its point of view it's sending a value not allowed per spec to the server....

@bzbarsky-apple
Copy link
Contributor

I suppose one thing we could do to mitigate this is instead of looking for the first (lowest) available value pick the highest one. But just to be clear this would be mitigation only, would only help with cases where clients and/or servers are in fact violating the spec, and would not help with 100% of those cases, because enum values can get added near the end too, not just near the beginning.

@tehampson
Copy link
Contributor Author

Sorry I flipped the terminology there. Swap client and server in my previous comment

@bzbarsky-apple
Copy link
Contributor

the client might assign it the value of kUnknownEnumValue = 1,

If I flip the terminology, then the server is "assigning it the value of kUnknownEnumValue". This is something servers must not do, per spec.

An unknown value must either result in an error return or be treated as one of the known values if the spec defines that behavior instead.

We might have some existing clusters that are buggy in that regard, and if so we should fix those ASAP.

@tehampson
Copy link
Contributor Author

Let me try to outline the issue again. Because I used incorrect terminology my issue is actually with point number 4.

Lets say old server has enum:

enum class Foobar : uint8_t
{
    kFirst = 0x00,
    kUnknownEnumValue = 1,
};

New client has:

enum class Foobar : uint8_t
{
    kFirst = 0x00,
    kSecond = 0x01,
    kThird = 0x02,
    kUnknownEnumValue = 3,
};

If the client sets the value as kThird = 0x02. The server might assign it the value of kUnknownEnumValue = 1, because it assigns to this value when it gets something out of range. When client goes to read back the value it thinks the server has a value of kSecond = 0x01

@bzbarsky-apple
Copy link
Contributor

The server might assign it the value of kUnknownEnumValue = 1, because it assigns to this value when it gets something out of range

Again: a spec-compliant server must not do this. It must either treat unknown values as kFirst or as error, depending on what the spec says.

@tehampson
Copy link
Contributor Author

tehampson commented Jan 11, 2023

Again: a spec-compliant server must not do this.

Unfortunately it looks like the current 1.0 version of the spec in SDK is doing exactly that.

Interestingly enough the PR that introduce this was aware of the requirements:

When version 2 client sends those to version 1 server, the server will treat both 2 and 3 as 2, which it considers unknown.
The unknown value must not ever be sent on the wire, so the version 1 thing will never send it to a version 2 thing.

Yet the PR itself added this test case expecting exactly the opposite of that stated requirement. Note the test expects a response where we are getting a return value of 4 which is kUnknownEnumValue in enum class SimpleEnum .


Moving forward:

  • We should actually enforce "The unknown value must not ever be sent on the wire, so the version 1 thing will never send it to a version 2 thing"
    • Right now I am not exactly sure how. When a cluster does have the kUnknownEnumValue set, what should it be returning? What should the response value that TestCluster.yaml should actually be validating here?
  • How do we enforce issues like this don't creep up in the future?

Edit more info:

@bzbarsky-apple
Copy link
Contributor

Right now I am not exactly sure how.

We could have Encode fail for that value?

When a cluster does have the kUnknownEnumValue set, what should it be returning?

The cluster spec needs to say whether unknown values are treated as one of the known values. If it does not say anything to that effect, unknown values should be treated as invalid in incoming messages.

@bzbarsky-apple
Copy link
Contributor

but perhaps we can leverage some checks that seem to be happening there

The problem is that decoding 255 into a nullable enum8 is blanket disallowed by the spec (the range is defined to be 1-254), but an out-of-range enum value might need to be mapped to a valid one depending on what the cluster spec says to do. And it might say different things in different places where the enum is used.

Which means that decoding cannot blanket enforce valid-only enum values. It can only collapse all invalid values into an easy-to-check-for one (which we do) and then the cluster impl needs to do the relevant checks as needed. We might be able to auto-generate checks in some cases if we express in the XML somehow what should be going on, but it might be pretty complex to do that (think enum inside struct, and two commands that take the struct but want different behavior).

@tehampson
Copy link
Contributor Author

We could have Encode fail for that value?

Okay we are on the same page here and this will be one of that action items for this issue.

The problem is that decoding 255 into a nullable enum8 is blanket disallowed by the spec (the range is defined to be 1-254), but an out-of-range enum value might need to be mapped to a valid one depending on what the cluster spec says to do. And it might say different things in different places where the enum is used.

Perhaps I should have been a little more verbose here (and also tested value 254 with the nullable enum8). What I have found in my quick testing is that I was able to write attribute value of 255 to enumAttr which is a the non-nullable version of the nullableEnumAttr. Both of these are SimpleEnum which is enum8. It looks like I am brushing up against some other issue that should be a constraint error. I am going to put this issue aside for now because I feel like I am yak shaving quite a bit already.

Which means that decoding cannot blanket enforce valid-only enum values

I see what you are saying here. But I do think the SDK needs a little more guarding for enum attributes attributes that seem to be essentially a cluster variable. It looks like NumericAttributeTraits::CanRepresentValue seems like a good spot potentially. Currently the comments of CanRepresentValue seems to insist that this is more of a checks around null value, but if there is no objection we can extend for this to potentially check for out-of-range enum value.

@bzbarsky-apple
Copy link
Contributor

but if there is no objection we can extend for this to potentially check for out-of-range enum value.

Again: some cluster specs have text like "if an out of range value is written to this enum-typed attribute, treat it as if this in-range value had been written"... We need to make sure that can be supported by clusters and/or applications.

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Apr 28, 2023
…nums.

This should help prevent those values leaking out on the wire.

Fixes project-chip#24368
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Apr 28, 2023
…nums.

This should help prevent those values leaking out on the wire.

Fixes project-chip#24368
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Apr 28, 2023
…nums.

This should help prevent those values leaking out on the wire.

Fixes project-chip#24368
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Apr 28, 2023
…nums.

This should help prevent those values leaking out on the wire.

Fixes project-chip#24368
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Apr 29, 2023
…nums.

This should help prevent those values leaking out on the wire.

Fixes project-chip#24368
bzbarsky-apple added a commit that referenced this issue Apr 29, 2023
…nums. (#26299)

This should help prevent those values leaking out on the wire.

Fixes #24368
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants