Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"None" as protobuf type names breaks mypy codegen #221

Closed
alkasm opened this issue Jul 1, 2021 · 8 comments · Fixed by #226
Closed

"None" as protobuf type names breaks mypy codegen #221

alkasm opened this issue Jul 1, 2021 · 8 comments · Fixed by #226

Comments

@alkasm
Copy link

alkasm commented Jul 1, 2021

A protobuf type message None { ... } won't appear in the .pyi file at all, and other messages which use the type will reference a non-existing global__None

If the message type is an inner message, it generates syntactically invalid code, e.g.

message Message {
  message None { }
...

will generate a file with global__Message.None which is not valid Python syntax. Can update with a repo showing this behavior soon. I hypothesize this happens with all keywords as well, not just None.

@alkasm
Copy link
Author

alkasm commented Jul 1, 2021

See also: #80

@alkasm
Copy link
Author

alkasm commented Jul 1, 2021

Here's a repo showing None used in various places inside proto files, and the generated code so you can easily see the issues: https://github.com/alkasm/mypy-protobuf-none

I split into many files so that I could reuse the name None multiple times in different contexts (which otherwise produces errors)

nipunn1313 added a commit that referenced this issue Jul 7, 2021
Unfortunately, our tests missed this, because all our tests with
reserved names for message types also had reserved names for the
identifiers. Added a couple tests for the new scenario, reproed,
and fixed by replacing such instances with `Any` (best we can do)

Fixes #221
@nipunn1313
Copy link
Owner

Great find and thank you for reproing! I was able to repro, add tests and whip up a fix (coming soon)

nipunn1313 added a commit that referenced this issue Jul 7, 2021
Unfortunately, our tests missed this, because all our tests with
reserved names for message types also had reserved names for the
identifiers. Added a couple tests for the new scenario, reproed,
and fixed by replacing such instances with `Any` (best we can do)

Fixes #221
nipunn1313 added a commit that referenced this issue Jul 7, 2021
Unfortunately, our tests missed this, because all our tests with
reserved names for message types also had reserved names for the
identifiers. Added a couple tests for the new scenario, reproed,
and fixed by replacing such instances with `Any` (best we can do)

Fixes #221
@nipunn1313
Copy link
Owner

Added you as a contributor in that diff! Really good bugreport.
Thank you and appreciated!

nipunn1313 added a commit that referenced this issue Jul 7, 2021
Unfortunately, our tests missed this, because all our tests with
reserved names for message types also had reserved names for the
identifiers. Added a couple tests for the new scenario, reproed,
and fixed by replacing such instances with `Any` (best we can do)

Fixes #221
@alkasm
Copy link
Author

alkasm commented Jul 7, 2021

@nipunn1313 great, and no problem. For the attempted solution, I agree that typing it as Any and not giving it a class will be an immediate fix to the syntactically invalid code, so that's cool (and I'm happy with the PR).

What do you think about generating the types with the usual underscore suffix? E.g. from_, to_, None_, and so on? They can still be more helpful than Any in these contexts.


On a more tangential note, and thinking out loud...

The protobuf compiler handles these cases somewhat gracefully---I generated the normal Python code for this and committed it back to the repo above if you want to take a look. For top-level messages, protoc compiler inserts the types by their string name into the globals() dict, and for nested message types, uses getattr/hasattr. I don't think these sort of runtime shenanigans are currently supported by mypy. It is quite hard to google for, but I found no mention of Python reserved keywords on the mypy or typing repo issues.

FWIW I tried some naive things to combine my suffix suggestion with the runtime business in the .pyi file and indeed mypy did not like it. For example:

class Message(google.protobuf.message.Message):
    DESCRIPTOR: google.protobuf.descriptor.Descriptor = ...
    class None_(google.protobuf.message.Message):
        DESCRIPTOR: google.protobuf.descriptor.Descriptor = ...
        FIELD_FIELD_NUMBER: builtins.int
        field: typing.Text = ...

        def __init__(self,
            *,
            field : typing.Text = ...,
            ) -> None: ...
        def ClearField(self, field_name: typing_extensions.Literal[u"field",b"field"]) -> None: ...

    NONE_FIELD_NUMBER: builtins.int

    @property
    def none(self) -> global___Message.None_: ...

    def __init__(self,
        *,
        none : typing.Optional[global___Message.None_] = ...,
        ) -> None: ...
    def HasField(self, field_name: typing_extensions.Literal[u"none",b"none"]) -> builtins.bool: ...
    def ClearField(self, field_name: typing_extensions.Literal[u"none",b"none"]) -> None: ...
setattr(Message, "None", Message.None_)
global___Message = Message

This is runnable/syntactically valid, but there's no way to reference Message.None in a typing context that I could figure out. msg: "none_pb2.Message.None" yields "error: Invalid type comment or annotation", same with msg: getattr(none_pb2.Message, "None"). This doesn't even work with non-reserved names, though the error is different: "Name "none_pb2.Message.A" is not defined". All of these things are supported by the annotations syntax in general (they are valid Python expressions) but mypy is not smart enough to utilize that. Now that I'm thinking about this, I realize I have no idea how mypy figures out what variables are on a class or module (in my example, the runtime work does add them to the class dictionary). I guess .pyi files don't need to be runnable (due to forward refs, maybe other reasons too) so I guess to mypy the setattr() calls won't be executed.

@nipunn1313
Copy link
Owner

Yep - per https://developers.google.com/protocol-buffers/docs/reference/python-generated#keyword-conflicts - if the field name conflicts with a python attr - you end up with a field only accessible with getattr/setattr.
The docs are a bit sparse on what happens with message names, but from my testing, you have to do something like

from v1 import None # doesn't work
import v1
getattr(v1, "None")

This means that you won't be able to access this type from a mypy typing context.
I'll add a test of the above variety to the PR

mypy doesn't run code in the way that you describe - so lines like setattr(Message, "None", Message.None_) aren't going to be executed in mypy. We could generate types like None_ where needed, but it wouldn't be particularly useful because there wouldn't be a natural way to get ahold of the type None_ since the only way to access such a field is via getattr which is inherently not typed as such.

I could potentially imagine using overload + literal to get better typing on getattr - which might make reserved fields with the underscore strategy more useful for nested fields, but still wouldn't be helpful for module-level message names.
https://mypy.readthedocs.io/en/stable/literal_types.html

Something like

class Message(google.protobuf.message.Message):
  @overload
  def getattr(attr: Literal["None"]) -> None_: ...
  @overload
  def getattr(attr: Literal["Foo"]) -> Foo: ...

It seems like it might clutter the stubs for limited value, so I'm hesitant to really jump in with such a strategy.

nipunn1313 added a commit that referenced this issue Jul 7, 2021
Unfortunately, our tests missed this, because all our tests with
reserved names for message types also had reserved names for the
identifiers. Added a couple tests for the new scenario, reproed,
and fixed by replacing such instances with `Any` (best we can do)

Fixes #221
nipunn1313 added a commit that referenced this issue Jul 7, 2021
Unfortunately, our tests missed this, because all our tests with
reserved names for message types also had reserved names for the
identifiers. Added a couple tests for the new scenario, reproed,
and fixed by replacing such instances with `Any` (best we can do)

Fixes #221
nipunn1313 added a commit that referenced this issue Jul 7, 2021
Unfortunately, our tests missed this, because all our tests with
reserved names for message types also had reserved names for the
identifiers. Added a couple tests for the new scenario, reproed,
and fixed by replacing such instances with `Any` (best we can do)

Fixes #221
@alkasm
Copy link
Author

alkasm commented Jul 7, 2021

We could generate types like None_ where needed, but it wouldn't be particularly useful because there wouldn't be a natural way to get ahold of the type None_ since the only way to access such a field is via getattr which is inherently not typed as such.

Hmm, well you can access the field by name when the type name is what is reserved, rather than the field name: message.none which could be of type Message.None_ . But maybe since other access methods (namely, the sanctioned getattr) would return Any, it's not a particularly useful annotation (i.e. mypy would complain about sending the getattr'd one into a function call).

The getattr overloads would probably be acceptable if they covered all the use-cases, but since they don't, I agree with your assessment. Valiant attempt though!

nipunn1313 added a commit that referenced this issue Jul 7, 2021
Unfortunately, our tests missed this, because all our tests with
reserved names for message types also had reserved names for the
identifiers. Added a couple tests for the new scenario, reproed,
and fixed by replacing such instances with `Any` (best we can do)

Fixes #221
nipunn1313 added a commit that referenced this issue Jul 7, 2021
Unfortunately, our tests missed this, because all our tests with
reserved names for message types also had reserved names for the
identifiers. Added a couple tests for the new scenario, reproed,
and fixed by replacing such instances with `Any` (best we can do)

Fixes #221
@nipunn1313
Copy link
Owner

Hmm, well you can access the field by name when the type name is what is reserved, rather than the field name: message.none which could be of type Message.None_ .

This is a good point. I'll file an issue for supporting messages with python reserved keyword names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants