Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: use relative imports in generated modules #1491

Closed
little-dude opened this issue May 5, 2016 · 164 comments
Closed

python: use relative imports in generated modules #1491

little-dude opened this issue May 5, 2016 · 164 comments
Assignees
Labels

Comments

@little-dude
Copy link

I have a package foo that looks like this:

.
├── data
│   ├── a.proto
│   └── b.proto
└── generated
    ├── a_pb2.py
    ├── b_pb2.py
    └── __init__.py
# a.proto
package foo;
# b.proto
import "a.proto";

package foo;

Generate the code: protoc -I ./data --python_out=generated data/a.proto data/b.proto.
Here is the failure:

Python 3.5.1 (default, Mar  3 2016, 09:29:07) 
[GCC 5.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from generated import b_pb2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/corentih/repro/generated/b_pb2.py", line 16, in <module>
    import a_pb2
ImportError: No module named 'a_pb2'

This is beacuse the generated code looks like this:

import a_pb2

If the import was relative it would actually work:

from . import a_pb2
@little-dude little-dude changed the title use relative imports in generated modules python: use relative imports in generated modules May 6, 2016
@goldenbull
Copy link

I have the exactly same problem, hope to be fixed

little-dude added a commit to little-dude/protobuf that referenced this issue May 10, 2016
@little-dude
Copy link
Author

@goldenbull I submitted a fix, let's see if it makes it through. I'm just not sure: are there cases where we don't want relative imports?

@goldenbull
Copy link

@little-dude how about if a_pb2.py is generated into a different folder as b_pb2.py?

@little-dude
Copy link
Author

Could you provide a small example of what you're thinking about, so that I try it with my change?

@goldenbull
Copy link

.
├── proto
│   ├── a.proto
│   └── b.proto
├── pkg_a
│   ├── a_pb2.py
│   └── __init__.py
└── pkg_b
     ├── b_pb2.py
     └── __init__.py

maybe this is not a good case, I don't have enough knowledge about how protobuf/python/etc. deal with the importing

@little-dude
Copy link
Author

little-dude commented May 11, 2016

I don't think this is actually possible because the generated modules follow the hierarchy of the proto files.
However we could imagine that we have the following:

.
└── data
    ├── a.proto
    ├── b.proto
    └── sub
        ├── c.proto
        └── sub
             └── d.proto

with the following:

# a.proto
package foo;
import "b.proto";
import "sub/c.proto";
import "sub/sub/d.proto";

# b.proto
package foo;
import "sub/c.proto";
import "sub/sub/d.proto";

# sub/c.proto
package foo;
import "sub/d.proto";

# sub/sub/d.proto
package foo;

We generate the code with:

protoc -I data -I data/sub -I data/sub/sub --python_out=generated data/a.proto data/b.proto data/sub/c.proto data/sub/sub/d.proto

which generated the following:

.
└── generated
    ├── a_pb2.py
    ├── b_pb2.py
    └── sub
        ├── c_pb2.py
        └── sub
            └── d_pb2.py

But this is a more complex case than what I am trying to fix.

Edit: I'm not even sure this is a valid case but here is the error I'm getting with the master branch (4c6259b):

In [1]: from generated import a_pb2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-f28bccc761b6> in <module>()
----> 1 from generated import a_pb2

/home/corentih/repro/generated/a_pb2.py in <module>()
     14 
     15 
---> 16 import b_pb2 as b__pb2
     17 from sub import c_pb2 as sub_dot_c__pb2
     18 from sub.sub import d_pb2 as sub_dot_sub_dot_d__pb2

/home/corentih/repro/generated/b_pb2.py in <module>()
     14 
     15 
---> 16 from sub import c_pb2 as sub_dot_c__pb2
     17 from sub.sub import d_pb2 as sub_dot_sub_dot_d__pb2
     18 

/home/corentih/repro/generated/sub/c_pb2.py in <module>()
     14 
     15 
---> 16 from sub import d_pb2 as sub_dot_d__pb2
     17 
     18 

/home/corentih/repro/generated/sub/sub/d_pb2.py in <module>()
     20   package='foo',
     21   syntax='proto2',
---> 22   serialized_pb=_b('\n\x0fsub/sub/d.proto\x12\x03\x66oo')
     23 )
     24 _sym_db.RegisterFileDescriptor(DESCRIPTOR)

TypeError: __init__() got an unexpected keyword argument 'syntax'

@little-dude
Copy link
Author

@haberman is there any chance for this to be fixed before next release? It's quite limiting for python 3.

@drats
Copy link

drats commented Jun 17, 2016

I have exactly the same problem (using protobuf v3beta3), the generated imports do not conform to PEP 328 (finalized in 2004), restated in the Python docs: https://docs.python.org/3/tutorial/modules.html#intra-package-references - the 12 yr-old specification is enforced in Python3, so generated protobufs are unusable without further modification.

@asgoel
Copy link

asgoel commented Jul 15, 2016

any updates on this?

@haberman
Copy link
Member

Yikes, sorry for the slow reply on this.

Wouldn't relative imports break the case that you are importing protos from a different pip package?

For example, the well-known types come from the google-protobuf pip package. If we merge a change to use relative imports, imports of google/protobuf/timestamp.proto (for example) would be broken.

@drats
Copy link

drats commented Aug 8, 2016

@haberman This bug has to do with protos importing protos in the same package, even in the same directory. The compiler converts this into relative imports with defective syntax under Python 3, so the generated code cannot execute at all. I don't see how you can get away without using relative imports in this case. I've had to manually edit the compiler generated pb2.py files to get them to work at all.

@oc243
Copy link

oc243 commented Oct 22, 2016

+1 for fixing this bug. It's stopping me migrating from python2 to python3.

@brynmathias
Copy link

+1 for fixing as well

@27359794
Copy link

27359794 commented Nov 18, 2016

+1, as far as I can tell this completely prevents proto imports in Python 3. Seems extremely worrying that this isn't fixed.

EDIT: this is not quite right, see my comment below.

@ylwu
Copy link

ylwu commented Nov 18, 2016

+1 for fixing

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Nov 18, 2016

I believe protobuf is working as intended in this case. The python package generated for a .proto file mirrors exactly the relative path of the .proto file itself. For example, if you have a .proto file "proto/a.proto", the generated python code must be "proto/a_pb2.py" and it must be in the "proto" package. In @little-dude 's example, if you want the generated code in the "generated" package, the .proto files themselves must be put in the "generated" directory:

└── generated
    ├── a.proto
    ├── b.proto

with protoc invoked as:

$ protoc --python_out=. generated/a.proto generated/b.proto

This way, the output will have the correct import statements (it will be "import generated.a_pb2" rather than "import a_pb2").

Using relative imports only solves the problem when all generated py code is put in the same directory. That's not the case when you import protos from other projects though (e.g., use protobuf's well-known types). It will likely break more than it fixes.

@haberman
Copy link
Member

I am confused by the claims that this is totally broken in Python 3. We have Python 3 tests (and have for a while) that are passing AFAIK. Why would Python 3 require relative imports?

@27359794
Copy link

27359794 commented Nov 21, 2016

The issue that I'm having and that I believe others are having is that the proto code import "foo.proto" compiles into the Python3 code import foo_pb2. However, implicit relative imports were disabled in Python3, so relative imports must be of the form from . import foo_pb2. Manually changing the generated proto code to this form after proto compilation fixes the issue.

There are already multiple existing issues concerning this problem, and it first seems to have been recognised in 2014 (!!!): #90, #762, #881, #957

@27359794
Copy link

27359794 commented Nov 25, 2016

I read a bit more about Python 3's import rules and I think I can give a better explanation.

In Python 3 the syntax import foo imports from the interpreter's current working directory, from $PYTHONPATH, or from an installation-dependent default. So if you compile proto/foo.proto to gen/foo_pb2.py, the syntax import foo_pb2 works only if the current working directory is gen/ or if you placed gen/ on your python path.

If you are compiling protos as part of a Python package (which is the case in most non-trivial Python projects), the interpreter's current working directory is the directory of your main module (suppose the directory is mypackage/), and modules in the package must either use fully-qualified absolute imports (e.g. import mypackage.gen.foo_pb2) or relative imports (e.g. from .gen import foo_pb2).

In Python 2, a module inside gen/ could do import foo_pb2 and this would import mypackage.gen.foo_pb2 into its namespace, regardless of the current working directory. This is an implicit relative import.

In Python 3, implicit relative imports don't exist and import foo_pb2 will not find foo_pb2.py, even if the module importing foo_pb2 is inside gen/. This is the issue that people are complaining about in the thread.


The root of this problem seems to be that import "foo.proto"; needs to compile into from <absolute or relative package path> import foo_pb2 when the proto is inside a package, and import foo_pb2 otherwise. Neither syntax will work in both scenarios. The proto compiler ignores the package name in the proto file and only observes the directory structure of the proto files, so if you want the from <path> import foo_pb2 output you need to place your protos in a directory structure mirroring the Python structure. For instance, if you have the following directory structure and you set the proto path to proto_files/ and python_out to mypackage/proto/, the correct import line is generated, but the compiled python is put in the wrong directory.

Pre-compilation:

proto_files/
  mypackage/
    proto/
      foo.proto  # import "mypackage/proto/bar.proto";
      bar.proto
mypackage/
  qux/
    mymodule.py  # import mypackage.proto.foo_pb2
  proto/

Post-compilation:

proto_files/
  mypackage/
    proto/
      foo.proto  # import "mypackage/proto/bar.proto";
      bar.proto
mypackage/
  qux/
    mymodule.py  # import mypackage.proto.foo_pb2`
  proto/
    mypackage/
      proto/
        foo_pb2.py  # from mypackage.proto import bar_pb2 (the import we want! but file should be in ../../)
        bar_pb2.py

This is close to the desired result, but not quite it, because now the absolute reference to the compiled file is mypackage.proto.my_package.proto.foo_pb2 rather than mypackage.proto.foo_pb2.

In this instance you can actually get it to produce the right output by specifying the python output path mypackage/. Here, the compiler detects that it doesn't need to create mypackage/proto because it already exists, and it just plops the generated files in that directory. However, this doesn't play nicely when the project directory structure makes use of symlinks. e.g. if mypackage/proto is a symlink to somewhere else and you actually want to dump the compiled protos there instead.

I think the 'correct' fix is to make use of the proto package rather than the location of the proto in the directory structure.

@haberman
Copy link
Member

@DanGoldbach Thanks very much for all of the detail. I think a lot of the confusion here has been a result of not fully explaining all of the background and assumptions we are making. The more full description really helps clarify things.

Let me first respond to this:

I think the 'correct' fix is to make use of the proto package rather than the location of the proto in the directory structure.

Can you be more specific about exactly what fix you are proposing? An example would help.

One thing people seem to want, but that doesn't seem to work in practice, is that a message like this:

package foo.bar;

message M {}

...can be imported like this in Python:

from foo.bar import M

That is a very natural thing to want, but doesn't work out, as I described here: grpc/grpc#2010 (comment)

Overall, your directory structure appears to be more complicated than what we generally do at Google (which is the environment where all this behavior was designed/evolved). At Google we generally have a single directory structure for all source files, including protos. So we would anticipate something more like this:

Pre-compilation:

mypackage/
  foo.proto  # import "mypackage/bar.proto";
  bar.proto
  qux/
    mymodule.py  # import mypackage.foo_pb2

Post-compilation:

mypackage/
  foo.proto  # import "mypackage/proto/bar.proto";
  foo_pb2.py # import mypackage.proto.bar_pb2
  bar.proto
  bar_pb2.py
  qux/
    mymodule.py  # import mypackage.proto.foo_pb2

Because protobuf thinks in terms of this single, flat namespace, that's why we get a little confused when people talk about needing relative imports. I haven't wrapped my head around why this is necessary. Why doesn't the scheme I outlined above work for your use case?

@27359794
Copy link

Thanks, I understand much better now.

Can you be more specific about exactly what fix you are proposing?

I meant that it would be nice if the compiled proto module hierarchy mirrored the package hierarchy specified in the proto source file. As you pointed out in the grpc thread, this isn't feasible right now. Maybe in the future, the one-to-one restriction between proto sources and gens can be relaxed.

It sounds like protos work best when the generated files compile to the same directory as the source files, as per your example. Our directory structure has a separate build/ directory for generated code which isn't indexed by source control.

/build/  # generated code directory
  proto/
    # compiled protos go here
/python/  # parent directory for python projects
  my_python_pkg/  # root of this python package
    proto -> /build/proto/  # symlink to compiled proto dir
    main.py  # import my_python_pkg.proto.compiled_proto_pb2

We explicitly keep generated and source files separate, so your scheme doesn't suit our current repo layout.

We would also like the option of using those protos in multiple distinct Python packages in the future, so generating compiled protos into one particular Python package isn't ideal. At Google this isn't an issue because IIRC the entire repo acts like one massive Python package and blaze provides you with the bits of the repo that you need.

I think we'll get around this by either adding the compiled proto directory to our Python path or by writing a build command to manually edit the imports in the generated protos to be package-relative imports.

Hopefully this helps other people reading the thread.

@haberman
Copy link
Member

Cool, glad we're getting closer to understanding the problems.

Maybe in the future, the one-to-one restriction between proto sources and gens can be relaxed.

I think this would be difficult to do. Right now we guarantee that:

$ protoc --python_out=. foo.proto bar.proto

...is equivalent to:

$ protoc --python_out=. foo.proto
$ protoc --python_out=. bar.proto

This is important because it's what allows the build to be parallelized. At Google we have thousands of .proto files (maybe even tens or hundreds of thousands of files, haven't checked lately) that all need to be compiled for a given build. It's not practical to do one big protoc run for all of them.

It's also not practical to try and ensure that all .proto files with a given (protobuf) package get compiled together. Protobuf doesn't require the file/directory to match the package, so .proto files for package foo could exist literally anywhere in the whole repo. So we have to allow that two different protoc runs will both contain messages for the same package.

So with these constraints we're a bit stuck. It leads to the conclusion that we can't have more than one .proto file put symbols into the same Python module, because the two protoc runs would overwrite the same output file.

We would also like the option of using those protos in multiple distinct Python packages in the future, so generating compiled protos into one particular Python package isn't ideal.

Usually for this case we would put the generated code for those protos into a package that multiple other packages can use. Isn't that usually the solution when you want to share code?

If you have foo/bar.proto that you want to share across multiple packages, can't you put it in a package such that anyone from any package can import it as foo.bar_pb2?

@27359794
Copy link

I hadn't considered the constraints placed on the proto compiler by Google's scale and parallelism requirements, but that makes sense.

I guess I can compile the protos into their own proto-only package in build/ and then import that package from wherever I need it. I think you still need to add that the parent of that package to the python path.

@hindman
Copy link

hindman commented Jan 10, 2017

@DanGoldbach Thanks for your example -- it helped me solve a problem.

I think your example work as desired if you run protoc like this:

protoc --python_out . --proto_path proto_files proto_files/mypackage/proto/*.proto

It generates correct import lines and places the _pb2.py files in the correct location.

@nemith
Copy link

nemith commented May 17, 2023

I am not sure why this is closed and what the official solution to this is. There is a lot of external tools and sed here

Official documentation states:

The Python code generated by the protocol buffer compiler is completely unaffected by the package name defined in the .proto file. Instead, Python packages are identified by directory structure.

But if that is the case how do we utilize it with imports?

@ElDavoo
Copy link

ElDavoo commented May 27, 2023

I'd be curious to know whether the tool I wrote, protoletariat, to convert protobuf absolute imports into relative imports is helpful

It surely does! It still feels like a "hack" but there is no better option.

@jscheel
Copy link

jscheel commented May 30, 2023

I'm fairly new to this, but if the point of protobufs is that they can be shared across projects, the protobuf's file structure should not be the determining factor in setting the module structure. Ideally, each tool that generates code from the proto files should be able to define the output, right?

@vcozzolino
Copy link

I'm on Pyhton 3.8.8 and facing the same exact problem.

The generated code with import elb_pb2 as elb__pb2 doesn't work.

If I use relative import like from . import elb_pb2 as elb__pb2 then the error is gone.

@BatmanAoD
Copy link

BatmanAoD commented Jul 13, 2023

@haberman I'm struggling to understand the connection between the parallel compilation issue you described regarding Betterproto (this comment) and the suggestion of using relative paths for imports. You said in this later comment that:

there is no way of improving it -- at least not one that I have heard -- that addresses the user complaints without creating other problems.

The Betterproto example you gave shows the problem with generating Python module names based on Proto package names. But surely generating relative paths wouldn't break parallel code genertation?

Edit: The approach used by protoletariat (a tool mentioned above) is indeed just converting proto file imports to use relative paths, as originally suggested. What "other problems" are introduced this way?


More concretely, I am currently working on a project that generates both protoc and Betterproto Python packages based on the same protobuf files. We would like to be able to choose the top-level module name for each Python package. This works with Betterproto, but, as far as I can tell, not with protoc.

@markdoerr
Copy link

Protoc just needs to generate

from . import my_pb2 as ...
instead of:
import my_pb2 as ...

That would simply solve the issue.

@haberman
Copy link
Member

haberman commented Jul 24, 2023

Protoc just needs to generate

from . import my_pb2 as ...
instead of:
import my_pb2 as ...

That would simply solve the issue.

It would break existing uses, as described in #1491 (comment)

If my_pb2.py is in another directory that is inPYTHONPATH, then from . import my_pb2 raises the following error:

$ PYTHONPATH=/tmp/t3 /tmp/venv/bin/python3 -m test_pb2
Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/private/tmp/t2/test_pb2.py", line 14, in <module>
    from . import my_pb2 as my__pb2
ImportError: attempted relative import with no known parent package

Whereas with the existing (absolute) import, it works.

@BatmanAoD
Copy link

@haberman The aforementioned protoletariat tool handles this correctly, though. If, for instance, my_pb2 is in a directory named t2, then protoletariat generates this import, which works:

from ..t2 import my_pb2

@haberman
Copy link
Member

The aforementioned protoletariat tool handles this correctly, though. If, for instance, my_pb2 is in a directory named t2, then protoletariat generates this import, which works: from ..t2 import my_pb2

I tried this import line in my example. It still did not work:

/tmp/t2 $ PYTHONPATH=/tmp/t3 /tmp/venv/bin/python3 -m test_pb2
Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/private/tmp/t2/test_pb2.py", line 14, in <module>
    from ..t2 import my_pb2 as my__pb2
ImportError: attempted relative import with no known parent package

Note, I am running this command from /tmp/t2, but trying to import from /tmp/t3. In my example, t2, and t3 are meant to represent separate Python packages.

@BatmanAoD
Copy link

BatmanAoD commented Jul 25, 2023

@haberman As the error indicates, relative paths require a shared parent module, which requires an __init__.py file in the top-level directory. Here's a working example:

Directory structure:

.
└── protoimport
    ├── __init__.py
    ├── t1
    │   ├── __init__.py
    │   └── my_pb1.py
    └── t2
        ├── __init__.py
        └── my_pb2.py

Import line:

from ..t2 import my_pb2

Invocation (from the directory above protoimport):

PYTHONPATH=. python -m protoimport.t1.my_pb1

As I mentioned, the protoletariat tool has been recommended multiple times above, and generates working Python modules. I would recommend downloading it, running it on some example cases, and seeing what it does. (And, indeed, it generates __init__.py files when operating on a directory structure.)

@haberman
Copy link
Member

haberman commented Jul 25, 2023

As the error indicates, relative paths require a shared parent module, which requires an init.py file in the top-level directory.

Right, but this means that if we changed protoc to output the code that you suggested (the code that protoletariat outputs), it would break some use cases that work today. It is not a benign bugfix that would "just work", it would fix some cases while breaking others.

It's true that this can be worked around by changing the project structure, as you suggested. But the same is true of the initial complaint in this issue: it can be worked around if you change your project structure.

The core of this issue is:

  1. The protobuf language itself does not support relative imports, only absolute. If you write import "foo.proto", then foo.proto must exist relative to some --proto_path.
  2. Any --python_out=DIR you gave to protoc must also be in sys.path when you run Python.

If you follow the invariant in (2), everything will work. As far as I can tell, the only reason that people need protoletariat is because people do not want to do (2).

@BatmanAoD
Copy link

@haberman Would it be a breaking change even if __init__.py files are generated?

I'm not sure how this would happen, since in the above example, the absolute import path would presumably be protoimport.t2.my_pb2, so protoimport has to be a top-level module anyway.

...or are you talking about the case where t1 and t2 are both "top-level" modules, so the absolute import path is just t2.my_pb2? If the Python modules are in separate packages, yes, the imports need to remain absolute, and I'm not sure if Protoletariat does that. Have you tried running Protoletariat on the example case you constructed?

@haberman
Copy link
Member

or are you talking about the case where t1 and t2 are both "top-level" modules, so the absolute import path is just t2.my_pb2?

Yes, that case. Many Python packages distribute _pb2.py files. For example, the protobuf package itself distributes _pb2.py files for the well-known types.

We need to make sure that imports of proto generated code continue to work across Python packages.

Have you tried running Protoletariat on the example case you constructed?

The example I provided is essentially the same as the example given in Protoletariat's README. It also matches the original message in this issue.

I just tried this with Protoletariat and verified that indeed it produces this output.

Have you been able to try the solution I suggested?

Any --python_out=DIR you gave to protoc must also be in sys.path when you run Python.

You can respect this invariant either by:

  1. Changing your --python_out=DIR directory to be a directory that is already in your sys.path.
  2. Changing sys.path at runtime (possibly through PYTHONPATH) to include whatever you passed to --python_dir=DIR.

@VeNoMouS
Copy link

VeNoMouS commented Jul 25, 2023

As I mentioned, the protoletariat tool has been recommended multiple times above, and generates working Python modules. I would recommend downloading it, running it on some example cases, and seeing what it does. (And, indeed, it generates __init__.py files when operating on a directory structure.)

or... the original package that everyone is using could just be fixed since all of us complaining about it....

image

maybe if we complain for another 7 years... they might listen to us.... /s

@BatmanAoD
Copy link

I just tried this with Protoletariat and verified that indeed it produces this output.

I appreciate that; thank you.

The example I provided is essentially the same as the example given in Protoletariat's README. It also matches #1491 (comment).

Those are both "flat", aren't they? Neither has a .proto files importing other .proto files from another directory.

Have you been able to try the solution I suggested?

Yes, I have a working Python library generated from the standard protoc plugin; but I was specifically asked to modify it to have a single top-level module name matching the top-level package name, as is standard practice in Python.

Any --python_out=DIR you gave to protoc must also be in sys.path when you run Python.

I'm publishing pip packages, not importing directly from file paths. Moreover, the problem is not that I don't know how to import the generated code, but that the generated code forces consumers to have the top level names in any proto import paths as separate libraries located directly in the Python path. The consumers in this case have specifically complained that this is not very conventional for Python packages.

@sbrother
Copy link

@BatmanAoD For pip packages, you might check out my repo with an example of how to set up a pip installable protobuf repo: https://github.com/sbrother/python-protobuf-repo-example. Under the hood it uses a custom setuptools command to compile the protobufs properly on pip install.

More generally, this issue is never going to get fixed, because Google has millions of protobufs that live alongside the code and are compiled using an internal version of bazel. They literally "import directly from file paths" like you say, and builds are massively parallelized so it's imperative that protoc distributes at the file, not module level.

I strongly recommend using betterproto, which works the way people outside Google expect. Josh is correct that it isn't quite a drop-in replacement -- in particular it solves this issue while breaking the way that protobufs are used inside Google. But we use it at my current employer (10s of millions of DAUs on a Python app) with no issues.

@haberman
Copy link
Member

Yes, I have a working Python library generated from the standard protoc plugin; but I was specifically asked to modify it to have a single top-level module name matching the top-level package name, as is standard practice in Python.

Just to make sure I understand, you have a package named something like myproject, and you've been asked to make sure that your imports are something like import myproject.some_pb2, correct?

This can be done with protoc. You just need to make sure that your protobuf import paths also start with myproject:

import "myproject/some.proto";

Proto import paths are just like Python import paths: it is expected that they will be globally unique. So they should probably be qualified by your project name, just like your Python imports are. For example, that's why the Google well-known types are imported as import "google/protobuf/timestamp.proto" instead of just import "timestamp.proto". If we used the latter, it would conflict with any other timestamp.proto file used in another project.

I have personal Python projects outside of Google that use protobuf, and I have never run into the problem described in this bug. I put the .proto files right next to the .py files in my project (so they have names like myproject/foo.proto), and then I run protoc --python_out=. myproject/*.proto, which creates files like myproject/foo_pb2.py. Then I can import them as import myprotject.foo_pb2. I also add a .gitignore entry to ignore all the pb2.py files. Everything just works. The imports work whether I am trying to import from the same package or from a different package.

@cpcloud
Copy link

cpcloud commented Jul 25, 2023

@haberman Perhaps it is time to lock this issue.

@nemith
Copy link

nemith commented Jul 25, 2023

Just as a side node. In thrift/Facebook the import path is configurable which is an alternative solution, but would require some retooling to respect that on imports and they have a massively large distributed build system as well.

I think the approach on this is super idealized and not all of us have the luxury of redoing our entire project structure to fit this idealized .

Luckily there are work around, but there is risk that they go unmaintained or break with assumption upstream which I think why there are many of us begging for a fix in protoc itself (that is backwards compatible of course).

@BatmanAoD
Copy link

@sbrother thanks for the pointer to your example; I'll take a look at that.

We actually did try to switch to Betterproto, found that it was orders of magnitude slower at runtime (I wasn't involved yet when this happens so I can't provide more detail there), and now maintain two pip packages, one generated with Betterproto and the other generated with the standard tool. That's part of why it doesn't really make sense for the protobuf import paths to start with Python package paths: there are two Python packages representing the same protobuf files.


Just to make sure I understand, you have a package named something like myproject, and you've been asked to make sure that your imports are something like import myproject.some_pb2, correct?

This can be done with protoc. You just need to make sure that your protobuf import paths also start with myproject: [example]

Correct; but the pip package name is based on what the set of protobuf files conceptually are (a specific company's service/domain model specification), whereas the import paths in the .proto files themselves are restricted to actual concepts within that domain. And unfortunately we're fairly locked in to that approach at this point in the project history.

I think I at least have a better understanding now of the fundamental disconnects between our setup and the setup that would break if relative paths were generated, so thanks for walking through it with me. I do think our setup is a pretty reasonable case, and based on the thread it doesn't seem to be unusual, so it would be great if the plugin could provide an option or something (like how the Go generator provides path= for different setups). But I realize this is a feature request and not just a bugfix.

@ElDavoo
Copy link

ElDavoo commented Jul 27, 2023

or... the original package that everyone is using could just be fixed since all of us complaining about it....

Maybe just adding a non-default flag in protoc would be enough?

@vd-indriver
Copy link

Just came across this issues. It's very sad seeing how maintainers keep ignoring the community for 7 (!) years when this can be simply resolved by adding a couple flags so the users can configure their python imports.

@protocolbuffers protocolbuffers locked as too heated and limited conversation to collaborators Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.