Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do name mangling on individual symbols? #79

Open
reaperhulk opened this issue Jul 24, 2017 · 9 comments
Open

Do name mangling on individual symbols? #79

reaperhulk opened this issue Jul 24, 2017 · 9 comments

Comments

@reaperhulk
Copy link
Contributor

auditwheel uses patchelf to rename vendored libraries with unique names. This works wonderfully until something links against a library with the same symbol names and loads them before we get there. This scenario occurs when using uwsgi with cryptography's new manylinux1 wheels. Since uwsgi links against libssl/libcrypto itself we're seeing segfaults where it's calling between two different versions of the library (the one uwsgi loads globally and the one cryptography has vendored). You can see more discussion (and a better explanation from @njsmith) here: pyca/cryptography#3804 (comment)

One potential solution that was put forth would be to make it so patchelf is capable of name mangling individual symbols so that even in a scenario like the one above there's no issue with namespace collision. This sounds very challenging but as more and more manylinux1 wheels start being shipped it may be the right long term solution.

@njsmith
Copy link
Member

njsmith commented Jul 25, 2017

Right, basically the idea is to solve the ELF namespace collision problem once and for all by finding all the symbols exported by our vendored libraries and giving them unique names, and then finding all the imports of those symbols and patching them up to use the new unique names.

Basically this is one of those solutions that requires some ridiculous hoop-jumping, but in some sense it's the only really reliable solution, and it offends me that we can't do it. (It's just rewriting some lists of strings stored in an admittedly-rather-complicated data structure! who's in charge around here anyway?!)

I spent like... half an hour looking at this, anyway, and it isn't obviously untenable. The main obstacle that previous attempts ran into is that if you modify the size of the .dynstr section then you need to go through the whole file and fix up all the relative offsets but... patchelf is already solving this problem, because commands like patchelf --replace-needed (which we use now) also have to modify the size of .dynstr. That said, there's still some non-trivial work to do to identify all the places where exported/imported symbols are named, modify them, and then fix things up. (In particular you need to rebuild the hash tables or otherwise deal with them somehow.) Quite possibly there are other obstacles I haven't noticed, but it might be doable in a weekend?

There's also this thing, which might already be doing most of what we want, not sure: http://mit.edu/~jfc/src/sym-rename.c

CC'ing @geofft because somehow I have the impression that this is the kind of mad science heroism that might appeal to him. Could be wrong.

@njsmith
Copy link
Member

njsmith commented Jul 25, 2017

It turns out that this is also creates somewhat nasty problems for pypy: https://bitbucket.org/pypy/pypy/issues/2617/pypy-binary-is-linked-to-too-much-stuff

@ehashman
Copy link
Member

The other thing that comes to mind for this: if we mangle the symbols, then won't we have to go back into the Python code that uses them and change the references to point to the mangled symbols? That seems like it would be very challenging if not impossible to do right, especially when we're dealing with CFFI and the like.

@njsmith
Copy link
Member

njsmith commented Oct 18, 2018

@ehashman I think Python code that accesses symbols by name at runtime – ctypes, or cffi-in-abi-mode – is also opening the library by name at runtime, and thus already has this problem due to auditwheel's renaming of things? It would be nice if we had a better way for these kinds of wrappers to work with auditwheel – I'm kind of surprised we don't get more complaints about that actually! But I don't think this changes idea itself would change it.

@shuoli84
Copy link

shuoli84 commented Oct 7, 2019

I think name mangling is not the final solution.

The reason is, some library actually does not allow multiple copy in same process, e.g, they use some sort of file lock with same name, or anything considered global state.

In my mind, final solution is to make the c shared library dependencies first class citizen in pypi. Then we can do proper dependency resolution, which opens the door that many libraries can use same c lib. Then what auditwheel should just modify the rpath.

@takluyver
Copy link
Member

@shuoli84 do you have specific examples of the kinds of problems you describe? Packaging C libraries as first class citizens is straying into becoming a general-purpose packaging system, which there already plenty of other tools doing.

Realistically, there are always going to be some limits on what's possible. So I think it's more valuable to discuss specifics like "library X doesn't work because..." rather than generalities.

@njsmith
Copy link
Member

njsmith commented Oct 7, 2019

Global state in general is certainly a problem, and there could indeed be advantages to making C libraries more first-class citizens. (There are some old notes on that here.) But it's not relevant to this issue, because even if we did have C libraries as first class citizens, it wouldn't help with the uwsgi situation: the problem here is that uwsgi is unilaterally injecting symbols into distributed binaries that aren't compatible with those symbols. And there's no way to distribute binaries that are compatible with the injected symbols, because different installations of uwsgi are injecting different versions of those symbols, and it's impossible to be compatible with all of them at once. So we should probably keep this issue specifically about avoiding symbol clashes in auditwheel's vendored libraries, and split any discussion of packaging C libraries off into another thread (perhaps at discuss.python.org).

@shuoli84
Copy link

shuoli84 commented Oct 8, 2019

Name mangling and shipping dependencies in wheel, though works most of time, is kinda risky. And I didn't find any warning message in document or readme.

If it breaks, it introduced hard to track bugs. E.g, corrupt data files. Ref: https://ericsink.com/entries/multiple_sqlite_problem.html.

User should know what they are doing, but auditwheel's repair is so easy to use, and provided by pypa, which may make this THE recommended approach.

If a third-party library author convert the library to use this approach, app dev may not aware of this.

@njsmith the link you provides is actually what I am looking for. Any follow up? There isn't much discussion in the mailing thread (I googled it, may miss something).

@njsmith
Copy link
Member

njsmith commented Oct 8, 2019

@shuoli84 figuring out how to make the pynativelib idea work on macOS turned out to be way more complicated than I anticipated, and then by the time I'd finally done that I'd run out of energy for working on the project, and no-one else has stepped up. So it is doable, but there's no-one currently working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants