Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include an Unspecified Value in Enums #171

Closed
gmaclennan opened this issue Feb 19, 2024 · 9 comments · Fixed by #198
Closed

Include an Unspecified Value in Enums #171

gmaclennan opened this issue Feb 19, 2024 · 9 comments · Fixed by #198
Assignees
Labels

Comments

@gmaclennan
Copy link
Member

gmaclennan commented Feb 19, 2024

For each enum in our protobuf definitions, add UNSPECIFIED as the 0 enum, and increment the values for other enum values by 1.

e.g.

enum AttachmentType {
    ATTACHMENT_TYPE_UNSPECIFIED = 0;
    photo = 1;
    video = 2;
    audio = 3;
}

We either need to map this to a default enum during decode, or we need to include this in the JSON schema / decoded type, and then add handling to the frontend.

@gmaclennan gmaclennan added the mvp label Feb 19, 2024
@EvanHahn
Copy link
Contributor

I think it depends on the use case, but I generally feel we should have some "unknown" default value for enums.

I agree that our decoder code can treat unknown as some other default. For example, we might decide that unrecognized invite decisions are equivalent to "REJECT".

@gmaclennan
Copy link
Member Author

Just to be clear, because "unknown" is possibly unspecific, there are two different cases:

  • "unrecognized" - this is an enum value that is not recognized, e.g. a newer version of a proto def adds a new enum value, and an older client would not recognize this value. We already handle this (and decode to UNRECOGNIZED)
  • "unspecified" - this is when a newer proto def adds a new enum field, and decodes protobufs from the older def. These enum values would be missing, and protobuf defaults to decode the enum value to the first enum defined (=0).

It's the latter that's the focus of this issue.

@gmaclennan
Copy link
Member Author

It seems like this slipped through the cracks, and I think it's valuable to include this in MVP. We would not have any enums that are actually unspecified in the initial release, because it only happens when a new enum field is added and you read a previous version of the proto, however it seems like consistency is good, rather than having some enums whose first value is "unspecified", and other enums don't have that.

I'm going to update the issue description to be specific.

@tomasciccola
Copy link
Contributor

Trying to understand this issue, I made a couple of tests using deviceInfo as an example:

Test 1 (Loading a new doc with added enum from old build)

  1. Add a enum to deviceInfo and re-build everything
  2. Encode a deviceInfo protobuf with this new enum, save it to disk
  3. remove the enum, re-build every thing
  4. Load the file, the enum is not there

Takeaway -> forwards compatibility regarding enum themselves -> you don't get anything

Test 2 (Loading a new doc with added enum field from old build)

  1. Add a enum field to deviceInfo and re-build everything
  2. Encode a deviceInfo protobuf with this new enum field, save it to disk
  3. remove the enum field, re-build every thing
  4. Load the file, the enum is there with value "UNRECOGNIZED"

Takeaway -> forwards compatibility regarding enum fields -> you get "UNRECOGNIZED"

Test 3 (Loading a old doc from new build with new enum field on last position)

  1. encode a deviceInfo and save it to disk
  2. add an enum field on the last position and re-build everything
  3. Load and decode the older protobuf
  4. the enum field get correctly decoded

Takeway -> adding enum fields at the last position -> correctly decoded

Test 4 (Loading an old doc from new build with new enum field on arbitrary position)

  1. encode a deviceInfo and save it to disk
  2. add an enum field but NOT on the last position and re-build everything
  3. Load and decode the older protobuf
  4. the enum field get incorrectly decoded (is offseted by one)

Takeway -> adding enum fields at an arbitrary last position -> incorrectly decoded (offseted)

I'm still struggling to reproduce the case you are talking about @gmaclennan, mainly:

  • this is when a newer proto def adds a new enum field, and decodes protobufs from the older def. These enum values would be missing, and protobuf defaults to decode the enum value to the first enum defined (=0).

It seems that it doesn't default to decoding it with value 0. It decodes the document with the original enum index, which can be correct if we only add values at the end of an enum and incorrect if we insert a field in an arbitrary position

@EvanHahn
Copy link
Contributor

tl;dr: I don't think we need to do this (ever). Our existing behavior seems correct.

I don't think we should ever address this issue. Let's run through the worrisome cases and see how they aren't a problem. (A lot of this is redundant with Tomás's tests.)

Case 1: adding a new enum value

Let's say we add a new DeviceType to DeviceInfo. For example:

 enum DeviceType {
   mobile = 0;
   tablet = 1;
   desktop = 2;
+  server = 3;
 }
  • Old versions will parse mobile, tablet, and desktop correctly. If they receive server (set by new versions), the value will be UNRECOGNIZED.
  • New versions will be able to parse all values from old versions just fine.

✅ This is all good! Unrecognized enum values are parsed as unrecognized.

Case 2: adding a new optional enum field

Let's say we add a new optional enum field, Color, to DeviceInfo. For example:

 message DeviceInfo_1 {
   // ...
+  enum Color {
+    red = 0;
+    blue = 1;
+  }
+  optional Color color = 7;
 }
  • Old versions will not see the color field at all.
  • New versions will parse messages from old versions and set color to undefined. To be clear, it won't be UNRECOGNIZED or a default value.

✅ This is all good! There's nothing special about enums here; unknown fields are ignored.

Case 3: adding a new non-optional enum field

Let's say we add a new non-optional enum field, Status, to DeviceInfo. For example:

 message DeviceInfo_1 {
   // ...
+  enum Status {
+    active = 0;
+    inactive = 1;
+  }
+  Status status = 8;
 }
  • Old versions will not see the status field at all.
  • New versions will parse messages from old versions and set status to active.

✅ This is all good! This is the expected behavior for default values in protobuf.


Unless there's a requirement I'm unaware of, I don't think this is necessary.

Does that seem right?

@gmaclennan
Copy link
Member Author

gmaclennan commented Jul 30, 2024

Thanks for the detailed research into this, this is really helpful. It's into the nitty-gritty, but valuable! I initially thought this was unnecessary, but then re-considered.

  1. Using unspecified is recommended as best practice.

  2. Case 2 (optional enum) I believe is behavior specific to the parser we are using (ts-proto), which is behavior we want, but not guaranteed for protobuf parsing.

  3. Case 3 is the key reason for using unspecified. I believe it's valuable at the protobuf level to know the difference between an old version that did not have this enum field, and new versions that do have this set. I think the "decision" about whether there should be a default, and what that default should be, belongs in the decoding code, rather than in the protobuf itself. e.g. in the decode function:

    if (deviceInfo.status === 'device_info_unspecified') deviceInfo.status = 'active'

    However if we do add enums, we may well want to leave the unspecified value there, so that the front-end can display it as such, and distinguish it from an "unrecognized" value.

TL;DR, I do think it is valuable to adopt this. It is not necessary for enums that already exist in our "v1", however I think it makese sense to do for all enums for the sake of consistency.

@gmaclennan
Copy link
Member Author

Thoughts on always calling this value UNSPECIFIED (vs. device_type_unspecified), to simplify checks in the front-end in the future? E.g. both UNRECOGNIZED and UNSPECIFIED are "special" values.

@tomasciccola
Copy link
Contributor

Ok, so I understand the case for this now.
Regarding calling always UNSPECIFIED, the issue is that you can't have two enum fields (of different enums, on the same proto) with the same identifier (see protocolbuffers/protobuf#5425), that's why I went with the more specific version (which is a drag for the frontend, I know...)

@gmaclennan
Copy link
Member Author

Ok, so I understand the case for this now. Regarding calling always UNSPECIFIED, the issue is that you can't have two enum fields (of different enums, on the same proto) with the same identifier (see protocolbuffers/protobuf#5425), that's why I went with the more specific version (which is a drag for the frontend, I know...)

ah yes, I see. What you have is fine then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants