Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.proto convention for enum value names conflicts with C# convention #1334

Closed
warrenfalk opened this issue Mar 19, 2016 · 7 comments
Closed
Assignees

Comments

@warrenfalk
Copy link
Contributor

C# convention is to use pascal case while .proto convention is to use upper case with underscores. If you then mix your protoc-generated code with other conventional code, it is inconsistent and can also trips up style checkers.

Note: this is only about case - there's a separate issue for scope, #1079, for which a PR has already been submitted.

I propose a file level option csharp_enum_value_case = true to enable automatic conversion from UPPER_CASE_WITH_UNDERSCORES to PascalCase. (In fact, I suggest it also convert from camelCase or lower_case_with_underscores also).

I am happy to submit a PR for this, but would be interested in any feedback. Is this better implemented as an argument to protoc, or an enum-level option?

@gvaish
Copy link
Contributor

gvaish commented Mar 24, 2016

+1 to the suggestion.

I think it may be a good idea to generalize this to support for all languages.

option enum_field_name_conversion = true
  1. Java: All caps with underscores
  2. Objective C: Pascal naming prefixed with enum-type-name (can be a secondary option objectivec_enum_field_name_prefix=true or something to that effect)
  3. C-Sharp: Pascal naming

What do you think?

@warrenfalk
Copy link
Contributor Author

I agree.

Except that I actually think this should be the default behavior and any option should be to turn it off not on.

But you should be aware that currently there is no plan to allow anything like this. See @haberman's argument against PR #1330 to see why. Then see my argument below it. Note that PR #1330 is actually about Issue #1079, not this issue, but all of the same arguments apply, so if that can't be accepted then i can't see how this can either.

Note that @haberman's argument against it isn't that it's hard to do (I'm happy to do it). His argument is that the small bit of complexity it adds to the code is not worth the benefit because the benefit is cosmetic. He hasn't responded to my response.

So unfortunately, if we want this behavior, we're going to have to write or fork our own protoc compiler, or make a better argument than I did, or maybe get an army of people to +1 these issues.

The following is what I said in #1330:

If you look at language generators as function which translates the meaning of a .proto definition into its native language, then preserving the idioms and conventions of the native language would ideally be default behavior. E.g. when someone does this in a .proto:

enum Color {
    COLOR_BLACK = 0;
    COLOR_WHITE = 1;
    COLOR_BLUE = 2;
}

The same person would, in C# write this code:

enum Color {
    Black = 0,
    White = 1,
    Blue = 2,
}

And so if the translator is doing a faithful translation, it should do the same. I realize the problem with backward compatibility at this point, which is why I suggested it as an option in the first place. I've also asked for another option to fix the case-disparity (C# only) between the two code snippets above, but now I'm guessing that hasn't a chance. However, what I'm asking is to at least have a way to have language generators generate according to their language.

@jskeet
Copy link
Contributor

jskeet commented Mar 24, 2016

There are definite problems with this, in terms of the possibilities of collisions - values FooBar and FOO_BAR would collide, for example. That exists for fields too, of course...

I'm not entirely against the proposal, but I certainly wouldn't want C# to be the only platform to do this.

As an implementation note, I'd need to check anywhere that we use Enum.ToString, e.g. for JSON serialization - it would need to fetch the original name instead.

Will ask internally whether there are other examples...

(It's also very late in the release cycle to be making this change, which would break everyone's code unless we made it optional with a default of "as it was"... and at that point, you could easily end up with massive inconsistency between protos depending on how the author decided to specify them.)

@warrenfalk
Copy link
Contributor Author

@jskeet, the collision issue did occur to me and so I've spent some time thinking about that already.

Two things:

  1. Modifying a name to suite a particular language isn't technically new to protoc; the C++ generator does this already to avoid keyword collision.
  2. This C++ behavior causes the same type of conflict you mentioned, but I'm aware of no reports of the problem in the wild. Take the following for example which generates uncompilable C++ code:
enum {
   for = 1;
   for_ = 2;
}

The example is contrived, but no more contrived than FOO_BAR and FooBar in a single enum. Both would seem to require a developer to be trying to cause himself pain.

Of course there are multiple strategies for mitigating both scenarios (e.g. not renaming if it would result in collision, renaming but further mangling, etc). The problem is that none can be perfect because they run into the problem that a previously defined enum value name might be effectively renamed by adding a new enum value name. But at this point it becomes a question of when to stop worrying about masochistic developers.

Also note that if this is to be considered, can we also consider, for the same reason, PR #1330 (at least the suggested modification). In terms of "generate code that follows the language convention" these could both be merged into a single option.

@jskeet
Copy link
Contributor

jskeet commented Mar 24, 2016

@warrenfalk I definitely agree that it's only widening the existing set of possible collisions. Not something I would like to do lightly, but it's not a total blocker for me.

The compatibility issue is a bigger one for me at the moment - it's very hard to know how many C# developers are currently using 3.0.0-beta2, how many enums they have, and whether they'd prefer a bit of breakage now or permanent non-ideal enum values.

I'm not trying to be coy here - I really don't know what the best course of action would be.

If it's any consolation at all, I'm kicking myself for not thinking of it before. (Somehow I managed to go through the proto2 version without ever considering this a big problem, too...)

@warrenfalk
Copy link
Contributor Author

@jskeet no problem, I totally understand both hesitations.

If opt-out can't be risked, does that mean opt-in behavior still could (possibly as an argument to protoc)?

Let me just throw out this thought on backward-compatibility: when git changed default behavior, their strategy was to employ a warning when the option was left unconfigured. Perhaps there is an analogous temporary solution to protoc, something like the following (only when affected languages are specified):

$ protoc --csharp_out=. test.proto
Warning: enum value blah blah is off by default but will be on by default in the future.
  Use --enum_option_here=false to disable this warning and use old behavior.
  Or --enum_option_here=true to use the new behavior

@jskeet
Copy link
Contributor

jskeet commented Apr 23, 2016

This has now been implemented in #1401. There is a flag to enable the old behaviour for the moment (legacy_enum_values) for the moment, but that will be removed in the final release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants