Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix visibility of libprotobuf symbols in protoc_compiler.so on Mac #24992

Merged
merged 4 commits into from
Dec 15, 2020

Conversation

gnossen
Copy link
Contributor

@gnossen gnossen commented Dec 15, 2020

Fixes #24897

@yulin-liang This will require a backport (and patch to) to v1.34.x.

Our Mac OS build currently produces many global weak symbols corresponding to static variables in libprotobuf:

000000000041a008 gw    O __DATA,__data guard variable for google::protobuf::FileDescriptorTables::GetEmptyInstance()::file_descriptor_tables
0000000000419708 gw    O __DATA,__data guard variable for google::protobuf::util::converter::ProtoStreamObjectWriter::Options::Defaults()::defaults
000000000041a198 gw    O __DATA,__data guard variable for google::protobuf::internal::ShutdownData::get()::data
0000000000209940 gw    F __TEXT,__text google::protobuf::safe_strtod(google::protobuf::StringPiece, double*)

On Mac OS, clang will produce global weak symbols for function-local static variables when the function is a class method. For example:

class Bar {                                                                                                                    
public:                                                                                                                        
  int bar() {                                                                                                                  
    static int foo;                                                                                                            
    return foo;                                                                                                                
  }                                                                                                                            
};                                                                                                                             
                                                                                                                               
int main() {                                                                                                                   
  Bar bar;                                                                                                                     
  auto foo = bar.bar();                                                                                                        
  return foo;                                                                                                                  
}  

This C++ snippet produces the following symbol table:

$ objdump -t -wide foo.o | c++filt

foo.o:	file format Mach-O 64-bit x86-64

SYMBOL TABLE:
0000000000000030 gw    F __TEXT,__text Bar::bar()
0000000000000044 gw    O __DATA,__data Bar::bar()::foo
0000000000000000 __float128     F __TEXT,__text _main

When given a choice between two weak symbols with the same name in the same process, the Mac OS runtime linker will prefer the first one:

dyld: weak bind: _protoc_compiler.cpython-36m-darwin.so:0x110772158 = _message.cpython-36m-darwin.so:guard variable for google::protobuf::FileDescriptorTables::GetEmptyInstance()::file_descriptor_tables, *0x110772158 = 0x10EE03388
dyld: weak bind: _protoc_compiler.cpython-36m-darwin.so:0x110772168 = _message.cpython-36m-darwin.so:guard variable for google::protobuf::internal::ShutdownData::get()::data, *0x110772168 = 0x10EE03370
dyld: weak bind: _protoc_compiler.cpython-36m-darwin.so:0x110772968 = _message.cpython-36m-darwin.so:google::protobuf::RepeatedField<bool>::SwapElements(int, int), *0x110772968 = 0x10ECF2CB0
dyld: weak bind: _protoc_compiler.cpython-36m-darwin.so:0x110772970 = _message.cpython-36m-darwin.so:google::protobuf::RepeatedField<bool>::Swap(google::protobuf::RepeatedField<bool>*), *0x110772970 = 0x10ECF2B50
dyld: weak bind: _protoc_compiler.cpython-36m-darwin.so:0x110772978 = _message.cpython-36m-darwin.so:google::protobuf::RepeatedField<bool>::Reserve(int), *0x110772978 = 0x10ECF1710

Above, you see references to libprotobuf symbols in _protoc_compiler.so (grpcio-tools) resolving to addresses corresponding to symbols in _message.so (protobuf wheel).

When this happens, as in #24897, the initializer for static variables may come from one version of libprotobuf while, the runtime code may come from another version of it. When data layouts or other pertinent invariants change, this may result in segmentation faults, data corruption, or other "bad behavior."

The fix in this PR is to make all symbols in _protoc_compiler.so besides _PyInit__protoc_compiler (the Python interpreter's entrypoint to the grpcio-tools C extension) local:

00000000003f87a0 lw    O __DATA,__data google::protobuf::FileDescriptorTables::GetEmptyInstance()::file_descriptor_tables
00000000003f87a8 lw    O __DATA,__data guard variable for google::protobuf::FileDescriptorTables::GetEmptyInstance()::file_descriptor_tables
00000000003f8930 lw    O __DATA,__data google::protobuf::internal::ShutdownData::get()::data
00000000003f8938 lw    O __DATA,__data guard variable for google::protobuf::internal::ShutdownData::get()::data

This issue only ever affected Mac OS. Our Linux shared object libraries contain no global weak symbols and (somewhat perplexingly), the .pyd files in our Windows artifacts contain no symbols at all, according to winedump.

@gnossen gnossen marked this pull request as ready for review December 15, 2020 17:37
@gnossen gnossen requested a review from lidizheng December 15, 2020 17:37
Copy link
Contributor

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice fix! TIL.

s/#4897/#24897/ in description

tools/distrib/python/grpcio_tools/setup.py Show resolved Hide resolved
tools/distrib/python/grpcio_tools/setup.py Outdated Show resolved Hide resolved
@gnossen
Copy link
Contributor Author

gnossen commented Dec 15, 2020

Known failure.

@gnossen gnossen merged commit de32de4 into grpc:master Dec 15, 2020
@gnossen gnossen deleted the fix_grpc_tools_so_mac_symbols branch December 15, 2020 23:40
gnossen added a commit to gnossen/grpc that referenced this pull request Dec 15, 2020
Fix visibility of libprotobuf symbols in protoc_compiler.so on Mac
ravidcottopia pushed a commit to ottopia-tech/grpc that referenced this pull request Dec 29, 2020
Fix visibility of libprotobuf symbols in protoc_compiler.so on Mac
rowillia pushed a commit to rowillia/protobuf that referenced this pull request Feb 25, 2021
@gnossen gave a great overview in grpc/grpc#24992 of the overall problem.

If a python process using both protobuf _and_ another native library linking in libprotobuf
frequently can cause crashes.  This seems to frequently affect tensorflow as well:

tensorflow/tensorflow#8394,
tensorflow/tensorflow#9525 (comment)
tensorflow/tensorflow#24976,
tensorflow/tensorflow#35573,
https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/contrib/makefile/rename_protobuf.sh,
tensorflow/tensorflow#16104

Testing locally this fixes both crashes when linking in multiple versions of protobuf
and fixes `DescriptorPool` clashes as well (e.g. Python and Native code import different versions of the same message).
TeBoring pushed a commit to protocolbuffers/protobuf that referenced this pull request Apr 24, 2021
@gnossen gave a great overview in grpc/grpc#24992 of the overall problem.

If a python process using both protobuf _and_ another native library linking in libprotobuf
frequently can cause crashes.  This seems to frequently affect tensorflow as well:

tensorflow/tensorflow#8394,
tensorflow/tensorflow#9525 (comment)
tensorflow/tensorflow#24976,
tensorflow/tensorflow#35573,
https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/contrib/makefile/rename_protobuf.sh,
tensorflow/tensorflow#16104

Testing locally this fixes both crashes when linking in multiple versions of protobuf
and fixes `DescriptorPool` clashes as well (e.g. Python and Native code import different versions of the same message).
goaway pushed a commit to envoyproxy/envoy-mobile that referenced this pull request Jun 8, 2021
Description: This PR modifies the build recipe for `envoy_engine.so` to hide all symbols except for `_PyInit_envoy_engine` by default on maOS, so that envoy_engine can be loaded alongside a [protobuf](http://pypi.org/project/protobuf) wheel without breaking.

Loading in protobuf alongside `envoy_engine.so` would register two sets of the same symbols and, on macOS, the runtime linker chooses the first symbol it finds causing problems. See the [similar protobuf change](protocolbuffers/protobuf#8346) and its [sister change in grpc](grpc/grpc#24992) for more information on what's going on.

Risk Level: Low
Testing: See the now-closed #1504 for how I've been testing this.
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Cerek Hillen <[email protected]>
acozzette pushed a commit to acozzette/protobuf that referenced this pull request Jan 22, 2022
@gnossen gave a great overview in grpc/grpc#24992 of the overall problem.

If a python process using both protobuf _and_ another native library linking in libprotobuf
frequently can cause crashes.  This seems to frequently affect tensorflow as well:

tensorflow/tensorflow#8394,
tensorflow/tensorflow#9525 (comment)
tensorflow/tensorflow#24976,
tensorflow/tensorflow#35573,
https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/contrib/makefile/rename_protobuf.sh,
tensorflow/tensorflow#16104

Testing locally this fixes both crashes when linking in multiple versions of protobuf
and fixes `DescriptorPool` clashes as well (e.g. Python and Native code import different versions of the same message).
acozzette added a commit to protocolbuffers/protobuf that referenced this pull request Jan 25, 2022
@gnossen gave a great overview in grpc/grpc#24992 of the overall problem.

If a python process using both protobuf _and_ another native library linking in libprotobuf
frequently can cause crashes.  This seems to frequently affect tensorflow as well:

tensorflow/tensorflow#8394,
tensorflow/tensorflow#9525 (comment)
tensorflow/tensorflow#24976,
tensorflow/tensorflow#35573,
https://github.com/tensorflow/tensorflow/blob/v2.0.0/tensorflow/contrib/makefile/rename_protobuf.sh,
tensorflow/tensorflow#16104

Testing locally this fixes both crashes when linking in multiple versions of protobuf
and fixes `DescriptorPool` clashes as well (e.g. Python and Native code import different versions of the same message).

Co-authored-by: Roy Williams <[email protected]>
jpsim pushed a commit to envoyproxy/envoy that referenced this pull request Nov 28, 2022
Description: This PR modifies the build recipe for `envoy_engine.so` to hide all symbols except for `_PyInit_envoy_engine` by default on maOS, so that envoy_engine can be loaded alongside a [protobuf](http://pypi.org/project/protobuf) wheel without breaking.

Loading in protobuf alongside `envoy_engine.so` would register two sets of the same symbols and, on macOS, the runtime linker chooses the first symbol it finds causing problems. See the [similar protobuf change](protocolbuffers/protobuf#8346) and its [sister change in grpc](grpc/grpc#24992) for more information on what's going on.

Risk Level: Low
Testing: See the now-closed #1504 for how I've been testing this.
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Cerek Hillen <[email protected]>
Signed-off-by: JP Simard <[email protected]>
jpsim pushed a commit to envoyproxy/envoy that referenced this pull request Nov 29, 2022
Description: This PR modifies the build recipe for `envoy_engine.so` to hide all symbols except for `_PyInit_envoy_engine` by default on maOS, so that envoy_engine can be loaded alongside a [protobuf](http://pypi.org/project/protobuf) wheel without breaking.

Loading in protobuf alongside `envoy_engine.so` would register two sets of the same symbols and, on macOS, the runtime linker chooses the first symbol it finds causing problems. See the [similar protobuf change](protocolbuffers/protobuf#8346) and its [sister change in grpc](grpc/grpc#24992) for more information on what's going on.

Risk Level: Low
Testing: See the now-closed #1504 for how I've been testing this.
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Cerek Hillen <[email protected]>
Signed-off-by: JP Simard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build kind/bug lang/Python release notes: yes Indicates if PR needs to be in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

grpcio-tools==1.34.0 seg faults on macOS Big Sur, depending on import order
2 participants