Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlopen: eliminate internal soname mapping #22828

Merged
merged 2 commits into from
Jan 27, 2018
Merged

dlopen: eliminate internal soname mapping #22828

merged 2 commits into from
Jan 27, 2018

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jul 16, 2017

This should now be handled by external packages (e.g. JuliaPackaging/BinDeps.jl#277), where we can handle errors appropriately and cache the results to ensure they are observable and constant.

fix #13719

@ararslan
Copy link
Member

Looks like you'll need

diff --git a/test/libdl.jl b/test/libdl.jl
index ffeeed83f4..c189616013 100644
--- a/test/libdl.jl
+++ b/test/libdl.jl
@@ -183,8 +183,4 @@ let dl = C_NULL
     end
 end

-if Sys.KERNEL in (:Linux, :FreeBSD)
-    ccall(:jl_read_sonames, Void, ())
-end
-
 end

@vtjnash vtjnash force-pushed the jn/soname-noname branch from 3f69ffd to 4bc1434 Compare July 16, 2017 03:00
@vtjnash
Copy link
Member Author

vtjnash commented Jul 16, 2017

Oops, I had that, then pushed the wrong commit.

@ararslan
Copy link
Member

Is it possible to deprecate this rather than a hard break?

@tkelman
Copy link
Contributor

tkelman commented Jul 16, 2017

and will this break benchmarking or do things that haven't been widely tested need to be released?
@nanosoldier runbenchmarks(ALL, vs = ":master")

@tkelman
Copy link
Contributor

tkelman commented Jul 16, 2017

Will this result in visible changes to the way the Julia Libdl.dlopen handles sonames?

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: failed to run benchmarks against primary commit: failed process: Process(`sudo cset shield -e su nanosoldier -- -c ./benchscript.sh`, ProcessExited(1)) [1]

Logs and partial data can be found here
cc @ararslan

@vtjnash
Copy link
Member Author

vtjnash commented Jul 16, 2017

Is it possible to deprecate this rather than a hard break?

Everything is possible, but I changed BinDeps to stop using this code a few weeks ago, so there wouldn't seem to be any consumer of this code even if I did the work to add a deprecation. I just realized it still needed a release tag however (JuliaPackaging/BinDeps.jl@264e086...e2553e5)

Will this result in visible changes to the way the Julia Libdl.dlopen

Yes, but since the ecosystem uses BinDeps, this code path wasn't really being used and didn't have any tests.

@tkelman
Copy link
Contributor

tkelman commented Jul 16, 2017

There are quite a few raw uses of dlopen without going through BinDeps. The BinDeps version of the code is also largely untested.

@tkelman
Copy link
Contributor

tkelman commented Jul 16, 2017

and what about ccall without an soname? that's harder to search for, but BinDeps usage is not universal

@vtjnash
Copy link
Member Author

vtjnash commented Jul 16, 2017

and what about ccall without an soname

then you'll now either need to install the correct (usually -dev) package, compile the library yourself (not recommended), or use a distro that doesn't have this issue (e.g. not debian/ubuntu-like, afaik)

@yuyichao
Copy link
Contributor

yuyichao commented Jul 16, 2017

FWIW, there are many base libraries that do not set the correct soname.
Only developer oriented distros install the unversioned libraries by default, it includes everything other than Arch, LFS and maybe gentoo.

Copy link
Contributor

@tkelman tkelman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we should delete this functionality from ccall and dlopen, it's useful outside of bindeps.

@vtjnash
Copy link
Member Author

vtjnash commented Jul 21, 2017

Just to clarify the purpose of this PR. The functionality in here that I'm deleting is poorly tested, fails occasionally and unpredictably (we silently hide this during CI runs, which ends up working OK, since we don't actually use or test for it), and in general this is just a bad (unreliable) way of managing dependencies. The code I added to BinDeps solves all of those issues. If you want to keep this code, you should add tests for it. But, I'm getting rid of it.

@tkelman
Copy link
Contributor

tkelman commented Jul 21, 2017

Adding code to BinDeps doesn't fix usage of this via direct ccall or dlopen of a library name without an soname. That doesn't go through BinDeps and would be broken by this.

@ararslan
Copy link
Member

Is there any estimate of how often this is used in packages and/or "in the wild"?

@vtjnash
Copy link
Member Author

vtjnash commented Sep 12, 2017

This will fix #6742, once the build system changes for this PR are also integrated

vtjnash added a commit that referenced this pull request Nov 26, 2017
This removes the last remaining dependence on `ldconfig -p` (#22828)
as well as DYLD_FALLBACK_LIBRARY_PATH (#24789).

fix #6742
vtjnash added a commit that referenced this pull request Nov 27, 2017
This removes the last remaining dependence on `ldconfig -p` (#22828)
as well as DYLD_FALLBACK_LIBRARY_PATH (#24789).

fix #6742
vtjnash added a commit that referenced this pull request Nov 27, 2017
This removes the last remaining dependence on `ldconfig -p` (#22828)
as well as DYLD_FALLBACK_LIBRARY_PATH (#24789).

fix #6742
vtjnash added a commit that referenced this pull request Nov 27, 2017
This removes the last remaining dependence on `ldconfig -p` (#22828)
as well as DYLD_FALLBACK_LIBRARY_PATH (#24789).

fix #6742
@vtjnash
Copy link
Member Author

vtjnash commented Dec 18, 2017

@tkelman Our build system is fixed now not to want this hack (as demonstrated by our working macOS Travis build, which used to use a different, CI-specific hack), so I'd like to merge this now to finish removing it from Linux in favor of predictable builds.

@tkelman
Copy link
Contributor

tkelman commented Dec 18, 2017

"usage of this via direct ccall or dlopen of a library name without an soname" is still a valuable feature that this would be deleting without any replacement

@StefanKarpinski
Copy link
Member

Can you elaborate? Valuable for what? Why do we need that?

@tkelman
Copy link
Contributor

tkelman commented Dec 18, 2017

Being able to use ccall or dlopen on a library without having to include the full numbered soname

@nalimilan
Copy link
Member

OTOH libraries are supposed to break API when they change SONAME, so why would you load a library without specifying its SONAME?

@tkelman
Copy link
Contributor

tkelman commented Dec 18, 2017

The same reason you might want to link to them without having to specify the soname, especially if you don't know or care what it is. Or write code that works across a range of old and new linux distros in a single line instead of needing to have an soname lookup table and search in user code. Having the julia loader do that resolution for you, even if there are supposedly some bugs with it (of which an example failure case hasn't been provided despite being requested over in the bindeps pr), is more convenient than needing to know the soname or install a -dev package (which non-admin users often won't be able to do without having to rebuild many things from source) to dynamically load and call functions in a library on linux.

@tkelman
Copy link
Contributor

tkelman commented Dec 27, 2017

It's hardly vestigial when it's used and useful in all the above ccall-without-BinDeps situations, all of which would be made noticeably less convenient by removing this capability.

@vtjnash
Copy link
Member Author

vtjnash commented Dec 27, 2017

It's hardly vestigial when it's used and useful in all the above ccall-without-BinDeps situations, all of which would be made noticeably less convenient by removing this capability.

You're about 4 years too late to stop BinDeps from becoming the de facto implementation of correct ccall usage. Sorry, but if you wanted this design, you would have needed to push for it back then. But since all of the examples you gave are of broken packages, it's pretty obvious why we eventually adopted BinDeps (despite its quirks) and stopped pursuing this more complicated design.

@ararslan
Copy link
Member

ararslan commented Dec 27, 2017

To me this is mostly orthogonal to BinDeps. ccall and dlopen have other uses, and I agree with Tony that this makes interactive and exploratory use of those functions far less pleasant, which is quite unfortunate given how cool a feature they are in Julia.

@tkelman
Copy link
Contributor

tkelman commented Dec 28, 2017

They aren't broken packages, currently. They would be broken by this PR's removal of the feature they rely on.

@StefanKarpinski
Copy link
Member

While I'm sympathetic to wanting to delete code, I don't understand why it's urgent to remove this code. What harm is it doing? Does deleting it enable anything we can't otherwise do?

@vtjnash
Copy link
Member Author

vtjnash commented Dec 28, 2017

It needs to be merge in advance of feature freeze so we don’t get stuck with it (code that we can’t test or fix) for 1.0.

@tkelman
Copy link
Contributor

tkelman commented Dec 28, 2017

"can't fix" is an awfully strong statement. Couldn't the better implementation replace this one without changing the way ccall works in some 1.x release, and no one would notice?

This is now handled more reliably by our build-system and external packages.
Also use the real name of the libatomic1 library (which includes the SONAME and extension).
@vtjnash
Copy link
Member Author

vtjnash commented Jan 27, 2018

Seeing no alternatives posed, or tests implemented, and having given fair warning, and wanting to finally close a fairly old bug report, I've merged this. Sometimes deleting half broken functionality is the only way to motivate improvement. Sometimes it'll just turn out that that code wasn't that necessary or useful, or that having simple, reliable models mattered more anyways.

@vtjnash vtjnash merged commit 8452531 into master Jan 27, 2018
@vtjnash vtjnash deleted the jn/soname-noname branch January 27, 2018 05:06
@ararslan
Copy link
Member

As far as I can tell no one in this thread was supportive of this change, and Stefan's question of "what harm is it doing" was not actually addressed.

@tkelman
Copy link
Contributor

tkelman commented Jan 27, 2018

Deleting a used, and useful, feature is not an improvement. It's just vindictive. You've given zero evidence that keeping the feature and improving it later hurts anything. Deleting the feature now serves no purpose.

@JeffBezanson
Copy link
Member

Wouldn't it fix the out-of-memory problem if we called jl_read_sonames at or near process start rather than on demand?

@vtjnash
Copy link
Member Author

vtjnash commented Jan 28, 2018

There’s a lot of ways to change this feature - we just don’t need this complexity here as we already have several alternatives (BinDeps, symlinks, and computed sonames) that are much more widely used and enjoy cross-platform applicability.

@tkelman
Copy link
Contributor

tkelman commented Jan 28, 2018

It's less than 150 lines of code, hardly the most complex feature. Multiple packages have been making good use of it and are now broken on master for no reason, with no deprecation. The bug that this closed hasn't been reported reliably or recently. Symlinks or computed sonames add unnecessary complexity downstream across every package that would prefer to not worry about linux-specific library versioning details.

Making a breaking change by removing a feature generally calls for a mention in news, and a stronger justification than one person not liking it.

@brenhinkeller
Copy link
Contributor

brenhinkeller commented Oct 20, 2018

This is terrible! I'm probably going to sound unreasonably irritated, but this is something like the tenth case I've found in the switch to 1.0 where my code has been broken because someone decided to remove imperfect-but-workable functionality for no reason other than ideological purity. Congratulations, @vtjnash, now all your tests pass but you've broken actual users code and wasted my time.

Can't agree strongly enough with @tkelman "Deleting a used, and useful, feature is not an improvement. It's just vindictive."

Edit: and over what looked like clear community opposition!

bors bot added a commit to JuliaGPU/CUDAapi.jl that referenced this pull request Dec 5, 2018
61: Add shared library versioning r=maleadt a=maleadt

Noticed how `libcudnn.so.7` didn't get picked up on my Arch dev system.
JuliaLang/julia#22828
bors try

Co-authored-by: Tim Besard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent segfault in libdl test on linux64 buildbots
10 participants