-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] scoped enums with an option #1079
Comments
👍 I would love to have something like this! Even if I have to specify it as an option to the |
I like this idea. The c98 enum scoping rule is super annoying when you want to use short enum values. Now you have to add prefix which made the enum very ugly in managed languages like Java/C#. |
One possible constraint is that some platforms do not have scoped enums support. For example, most embedded compiler do not support C++11 well. Just for curiosity, besides C98 and object-C, are there any other languages which do not support scoped enums? I googled for some popular languages and seems all of them have scoped syntax. |
The solution to languages that doesn't support scoped enum is allow customize prefix for enum values: |
👍 I would love to have something like this as well! |
Sorry, I deleted my comment, because I think I will try to implement option scoped instead. But I think the opposite is true. "option scoped" will break existing generators that expect enum values to be unique at the enum's parent's level whereas "option auto_strip_enum_prefixes" can be safely ignored by all generators and then only implemented later. This principle of least breakage was why I had started going that route in the first place. I've changed my mind only because I think that C++ will be the only generator that breaks with option scoped and I think that I can probably fix it, but I'm worried that the PR won't be accepted. The question is what the C++ code generation constraints are. Is enum class allowed? Or should the C++ generator automatically add a prefix (the enum's name for example) when option scoped is encountered? |
For some embeded system, C98 is still widely used, so I think protoc should generate C98 compatible code. One possible choice is that the |
Turns out that current design makes // Convert children.
BUILD_ARRAY(proto, result, message_type, BuildMessage , NULL);
BUILD_ARRAY(proto, result, enum_type , BuildEnum , NULL);
BUILD_ARRAY(proto, result, service , BuildService , NULL);
BUILD_ARRAY(proto, result, extension , BuildExtension, NULL); The scope validation happens in the BUILD_ARRAY for enum_type above and throws the error // Copy options.
if (!proto.has_options()) {
result->options_ = NULL; // Will set to default_instance later.
} else {
AllocateOptions(proto.options(), result);
}
// Note that the following steps must occur in exactly the specified order. The above comment seems to rule out any re-ordering of operations // Cross-link.
CrossLinkFile(result, proto);
// Interpret any remaining uninterpreted options gathered into
// options_to_interpret_ during descriptor building. Cross-linking has made
// extension options known, so all interpretations should now succeed.
if (!had_errors_) {
OptionInterpreter option_interpreter(this);
for (vector<OptionsToInterpret>::iterator iter =
options_to_interpret_.begin();
iter != options_to_interpret_.end(); ++iter) {
option_interpreter.InterpretOptions(&(*iter));
}
options_to_interpret_.clear();
} We don't have options until this point. I'm still going to give it a shot, but my C++ knowledge has atrophied over many years of disuse. |
I think that "option scoped" should only signify that the .proto writer means the enum value names to be scoped as children of the enum rather than siblings. Then it would be up to each language generator to behave in whatever way is appropriate for its language given this option. For all generators, except C++, this means just using the enum values as they are because most languages already scope the values that way. For C++ we could generate |
The way to generate a C98 code for C++ is to add another option: So it looks like this:
People can choose # 2 for newer code with C++11 support, choose # 3 to be compatible with C98 c++ compiler while enjoy the short enum name in other language, and choose # 1 for max compatibility with proto2. |
Options that are non-language-specific, like The meaning of A C# or Java generator can just ignore this option as their semantics already have this behavior. A C++ generator that is limited to C98 just has to find a way to achieve the same purpose within the constraints of C98. A C++ generator that is not limited to C98 (perhaps in the future by default or by option) could use In other words, a Java developer that adds The idea of a |
How about |
Would it be hard to do something akin to what In other words, would it be easier to add something like |
Agree with @philsc , this option is better to be both enum level and proto file level. |
I agree with @goldenbull to have an option for C98-semantic languages to accommodate and others to ignore, but I still agree with @goldenbull's original name of I also like a proto-file-level option in addition to the enum option but would suggest the name @philsc's idea of cc_enum_classes is a great idea, but I think separate. Cleaner C++ code is only half of the issue. The other half is the .proto C98 semantics which forces C# and Java code that looks like this: Message.Value = MyEnumWithALongName.MY_ENUM_WITH_A_LONG_NAME_VALUE; And so I think that 1 The problem is that the system is rigged to look for enums by name where the name is |
Hi, thanks for these thoughtful discussions and sorry for not chiming in sooner. As @warrenfalk already discovered, in protobuf implementation, an enum value is referenced as "namespace.VALUE" rather than "namespace.Enum.VALUE". When we discuss languages with/without scoped enum support, you probably don't think proto itself as a language, but it kind of is and most unfortunately it doesn't support scoped enums. Every element defined in a .proto file has a globally unique fully qualified name, and for enum values, the fully qualified name doesn't have the enum type's name in there. That's why enum values in two difference enum types in the same scope cannot have the same short name regardless of whichever programming language you are generating the code for, or whichever options are specified in .proto file. Changing this would be infeasible, because protocol compiler is not the only tool that parses .proto files and we actually have persistent data containing things like "namespace.VALUE". It's simple to change the protobuf implementation in this github site to do whatever we want, but we won't be able to push this change to Google's internal code base. Anyone who tries to use the same short name for two enum values within the same scope is likely to cause breakages to various parts of Google's systems here and there. It's not uncommon to see the use of a new proto option to cause breakages and has to be rollbacked. Let alone a more fundamental and incompatible change to the proto language. I am sorry to say this, but "option scoped = true" as proposed in @goldenbull 's first post (and similar ideas to make protoc allow same short name for different names) cannot be supported. The other ideas like "option auto_strip_enum_prefixes" is more feasible but I guess it's also far less appealing. |
So basically an This is most unfortunate, indeed, as in retrospect this C98 scoping behavior might be the biggest flaw in the design. It's a shame this wasn't addressed as part of proto3. Is something like this possible in any future version, such as proto4? By the way, for anyone that cares, I was able to get Also by the way, the idea to have a corresponding @xfxyjwf , you suggested in #67 that you did not like the idea of giving the C++ generator permission to use C++11 features via an argument to protoc. Can you provide more insight into that? It would be very convenient for a language generator to know that the user is OK with some C++11 features. Having this would allow the generator to emit |
How about workaround this way:
|
Thanks to the heated discussion, I realized that what I actually want is just more pretty readable code. Any possible solution should not change the existing semantics, but to provide something like syntax sugar. Will the |
But I'm not against allowing either option or even both at the same time. When both present, I think What I know is that
Creating a scenario in which Status.STOP in Java means the same thing as Status_START in C++. But I'm assuming that if a developer wanted to do something like that, protobuf wouldn't stop him. What I don't yet know is how this affects reflection. I know that C++ generated code and reflection would just use the original value name ignoring And in the interest of thoroughness, I see in @goldenbull's example, he also made his aliases Pascal-case (which is C# convention that I'd also like to support in the C# generator). I have been thinking about a global option to try to allow that automatically, but I expect such an option to be language specific. I also prefixed the values with the exact name of the enum |
In most cases |
@goldenbull, that makes sense to me. I am planning to add |
@warrenfalk great, so I can be lazy and wait for you 😄 |
@goldenbull, I have implemented Eventually I will add js support, too, and maybe some others. |
@warrenfalk great, I just can not wait to test 😄 |
@warrenfalk Here is my testcase syntax = "proto3";
package TestEnumPrefix;
option csharp_namespace = "NsTestEnumPrefix";
option auto_strip_enum_prefixes = true;
enum EType {
etYpe_name1 = 0; // should be stripped to "name1"
eT_y_pe_name2 = 1; // should be stripped to "name2"
Name3 = 10; // should be "Name3"
eTypeName3 = 11; // should not strip prefix, conflict with above line
EType_Name3 = 12; // should not strip neither
etype_Name4 = 20; // should not strip, conflict with following line
Name4 = 21; // should be "Name4"
this_is_a_very_very_long_enum_member_name = 30 [scoped_alias="ShortName"]; // alias = ShortName
this_is_another_very_very_long_enum_member_name = 31 [scoped_alias="ShortName2"]; // alias = ShortName2
}
enum enum_type_test
{
EnumTypeTest_v1 = 0; // should be "v1"
Enum_Type_Test_v2 = 1; // should be "v2"
enum_type_test_v3 = 2; // should be "v3"
}
message MyClass
{
EType v1 = 1;
enum_type_test v2 = 2;
} And this is the generated code: public enum EType {
/// <summary>
/// should be stripped to "name1"
/// </summary>
name1 = 0,
/// <summary>
/// should be stripped to "name2"
/// </summary>
name2 = 1,
/// <summary>
/// should be "Name3"
/// </summary>
Name3 = 10,
/// <summary>
/// should not strip prefix, conflict with above line
/// </summary>
eTypeName3 = 11,
/// <summary>
/// should not strip neither
/// </summary>
EType_Name3 = 12,
/// <summary>
/// should not strip, conflict with following line
/// </summary>
etype_Name4 = 20,
/// <summary>
/// should be "Name4"
/// </summary>
Name4 = 21,
/// <summary>
/// alias = ShortName
/// </summary>
ShortName = 30,
/// <summary>
/// alias = ShortName2
/// </summary>
ShortName2 = 31,
}
public enum enum_type_test {
/// <summary>
/// should be "v1"
/// </summary>
v1 = 0,
/// <summary>
/// should be "v2"
/// </summary>
v2 = 1,
/// <summary>
/// should be "v3"
/// </summary>
v3 = 2,
}
...
/// <summary>Field number for the "v1" field.</summary>
public const int V1FieldNumber = 1;
private global::NsTestEnumPrefix.EType v1_ = global::NsTestEnumPrefix.EType.name1;
public global::NsTestEnumPrefix.EType V1 {
get { return v1_; }
set {
v1_ = value;
}
}
/// <summary>Field number for the "v2" field.</summary>
public const int V2FieldNumber = 2;
private global::NsTestEnumPrefix.enum_type_test v2_ = global::NsTestEnumPrefix.enum_type_test.v1;
public global::NsTestEnumPrefix.enum_type_test V2 {
get { return v2_; }
set {
v2_ = value;
}
}
... Seems everything works fine! Great job! 👍 |
Awesome, thanks, I'll submit the PR |
@warrenfalk |
@goldenbull, do you mean make the
Generates Java code like this: public enum MyOtherEnum
implements com.google.protobuf.ProtocolMessageEnum {
/**
* <code>未指定 = 0</code>
*/
未指定(0, 0),
/**
* <code>One = 1</code>
*/
One(1, 1),
/**
* <code>Two = 2</code>
*/
Two(2, 2),
UNRECOGNIZED(-1, -1),
;
/* ... */
} |
@warrenfalk yes, |
hi @warrenfalk , could you tell me how to generate |
oyeah I got it
|
To whom might be interested, I just created a new repository for C++11 feature support in Protobuf: The first feature is based on @warrenfalk's change to support scoped enum. |
see workaround here: #67 (comment) |
So did anything came out of this and was merged ? |
Any updates on this? We have 2022 and embedded compilers are likely C++11 positive. |
Currently pb3 applies c98 enum scoping rule for enums. This is inconvenient when we need same Identifier in different enums. I hope there will be an new option for enums as follow:
With the annotation
option scoped = true
the compiler can useenum class
for C++11, providing better code, and we users will feel more comfortable to use proper identifiers for enums. And this is totally compatible with current implementation, all existing code will not be disturbed.The text was updated successfully, but these errors were encountered: