-
-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libfabric 1.4.1 #6551
libfabric 1.4.1 #6551
Conversation
I can't tell why the CI test failed on El Cap...? Is this a libfabric failure, or a brew infrastructure failure, or ...? |
@jsquyres it's a libfabric failure, specifically a problem when using the Xcode 8 SDK on El Capitan caused by the weak symbol clock_gettime, which is defined in the time.h header but not actually available at run time unless you're on Sierra. |
Can you point me to the specific failure? I'm able to build libfabric 1.4 on my El Cap-based (10.11.6) laptop with Xcode 8.1 (8B62), and run its executables. |
You're probably building against the Xcode 7 Command Line Tools, not Xcode 8. The full log is here: https://bot.brew.sh/job/Homebrew%20Core/10613/version=el_capitan/consoleText If you make sure If not, |
Forgive me, on multiple levels...
So I installed Homebrew (gasp!) and then tried |
Then you can test 1.4.0 |
Perfect; that got it. I see off the bat that when I run configure/make myself, it uses gcc. But when homebrew runs configure/make, it uses clang. I'll dig into this... |
@jsquyres thanks. You may also want to peruse our collection of these and the various workarounds: https://github.com/Homebrew/homebrew-core/pulls?q=is%3Apr+is%3Aclosed+label%3Aclock_gettime Also, you may find the logs in ~/Library/Logs/Homebrew/libfabric useful since you can see what command was handed to our so-called "superenv" shims, and then what command was actually executed. |
After a bunch of builds, I figured out the difference. Recall:
The difference is in the environment that Homebrew is using to build libfabric. If I add these two environment variables to my environment, I can replicate the error that Homebrew encounters (i.e., that libfabric fails to build):
Specifically: Homebrew is setting these env variables before it invokes libfabric's configure script. If I set either one of these without the other, libfabric still builds/works fine. It is the combination of both of these env variables that causes the problem. I'll have to admit ignorance here of why the SDKROOT variable matters -- to my unknowning eye, I only have one SDK installed:
That being said, I think if you update your libfabric formula to not set |
@jsquyres you're saying a cached value of ac_cv_search_clock_gettime=no causes your configure to choose to use clock_gettime? How can that possibly be correct? |
@ilovezfs Libfabric provides its own #if HAVE_CLOCK_GETTIME
clock_gettime(...);
#else
something_else(...);
#endif |
OK, but what does that have to do with the standard |
The problem appears to be that MacOSX10.12.sdk does contain Here's a trivial test that shows this: # My simple configure script for this test
$ cat configure.ac
AC_INIT([bogus], [0.1], [[email protected]])
AC_SEARCH_LIBS([clock_gettime]) # My simple test script
$ cat run
#!/bin/sh
autoconf
export SDKROOT=
echo "======= WITH DEFAULT SDK (SDKROOT is unset)"
echo "======= SDKROOT=$SDKROOT"
./configure
export 'SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk'
echo "======= WITH 10.12 SDK"
echo "======= SDKROOT=$SDKROOT"
./configure Ok, let's now run this script on my OS X 10.11.x El Cap machine with Xcode 8.1 (8B62): $ ./run
======= WITH DEFAULT SDK (SDKROOT is unset)
======= SDKROOT=
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for library containing clock_gettime... no
======= WITH 10.12 SDK
======= SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for library containing clock_gettime... none required Notice the "none required" output -- meaning that |
@bturrubiates and I were looking closer at this, and he found Homebrew/brew@1d7aa1fe0. Is this perhaps a difference between Xcode 8.0 and 8.1? I ask because if I build libfabric out of the box, calls to |
10.11 does not have clock_gettime. It's only defined in the Xcode 8/8.1 time.h header and in the libsystem_c.tbd file as a weak symbol. |
clock_gettime(3) on MacOS is... complicated. 1. CLOCK_REALTIME (and friends) may or may not exist 2. clockid_t may or may not exist 3. clock_gettime, as a symbol, may or may not exist 4. Even if clock_gettime exists as a symbol, it may be a weak symbol that has no strong symbol behind it (which causes a run-time linker error when you call it). Separate off the clock_gettime(3) checks into their own .m4 to keep the insanity limited. The m4 probably could have been written a bit more compactly, but I tried for maximum clarity instead (i.e., small, simple macros that call each other). Thanks to Homebrew/homebrew-core#6551 for identifying the issue. Signed-off-by: Jeff Squyres <[email protected]>
clock_gettime(3) on MacOS is... complicated. 1. CLOCK_REALTIME (and friends) may or may not exist 2. clockid_t may or may not exist 3. clock_gettime, as a symbol, may or may not exist 4. Even if clock_gettime exists as a symbol, it may be a weak symbol that has no strong symbol behind it (which causes a run-time linker error when you call it). Separate off the clock_gettime(3) checks into their own .m4 to keep the insanity limited. The m4 probably could have been written a bit more compactly, but I tried for maximum clarity instead (i.e., small, simple macros that call each other). Thanks to Homebrew/homebrew-core#6551 for identifying the issue. Signed-off-by: Jeff Squyres <[email protected]>
clock_gettime(3) on MacOS is... complicated. 1. CLOCK_REALTIME (and friends) may or may not exist 2. clockid_t may or may not exist 3. clock_gettime, as a symbol, may or may not exist 4. Even if clock_gettime exists as a symbol, it may be a weak symbol that has no strong symbol behind it (which causes a run-time linker error when you call it). Separate off the clock_gettime(3) checks into their own .m4 to keep the insanity limited. The m4 probably could have been written a bit more compactly, but I tried for maximum clarity instead (i.e., small, simple macros that call each other). Thanks to Homebrew/homebrew-core#6551 for identifying the issue. Signed-off-by: Jeff Squyres <[email protected]>
clock_gettime(3) on MacOS is... complicated. 1. CLOCK_REALTIME (and friends) may or may not exist 2. clockid_t may or may not exist 3. clock_gettime, as a symbol, may or may not exist 4. Even if clock_gettime exists as a symbol, it may be a weak symbol that has no strong symbol behind it (which causes a run-time linker error when you call it). Separate off the clock_gettime(3) checks into their own .m4 to keep the insanity limited. The m4 probably could have been written a bit more compactly, but I tried for maximum clarity instead (i.e., small, simple macros that call each other). Thanks to Homebrew/homebrew-core#6551 for identifying the issue. Signed-off-by: Jeff Squyres <[email protected]>
Ok, I see the issue now: MacOS can be evil with regards to I opened a PR with what I hope is the fix: ofiwg/libfabric#2508 If this is the fix, can you pull this patch into your brew recipe? Or do we need to do a v1.4.1 release? (I'm a little hesitant to do a 1.4.1 release just for this issue...) |
@jsquyres LOL. You aren't alone in that sentiment. Erlang upstream had a similar reaction: https://bugs.erlang.org/browse/ERL-256?focusedCommentId=11634&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-11634 Adding the patch is no problem. No need for you to do a new release until you're ready. |
Thanks @ilovezfs. Let's see how my PR shakes out. |
OK, here's what I'm seeing with the patch in place: https://gist.githubusercontent.com/ilovezfs/23938f7c49efa89aec7323040dfa48e3/raw/68f8f29cf7d1ae732a54ed2dc6b6ac0d284bdaec/gistfile1.txt |
e263656
to
7e1c0e5
Compare
clock_gettime(3) on MacOS is... complicated. 1. CLOCK_REALTIME (and friends) may or may not exist 2. clockid_t may or may not exist 3. clock_gettime, as a symbol, may or may not exist 4. Even if clock_gettime exists as a symbol, it may be a weak symbol that has no strong symbol behind it (which causes a run-time linker error when you call it). Separate off the clock_gettime(3) checks into their own .m4 to keep the insanity limited. The m4 probably could have been written a bit more compactly, but I tried for maximum clarity instead (i.e., small, simple macros that call each other). Thanks to Homebrew/homebrew-core#6551 for identifying the issue. Signed-off-by: Jeff Squyres <[email protected]>
Blech! After all that, I forgot to test the SDKROOT+ac_cv_search_clock_gettime combination. PR updated with what I think is the fix. |
7e1c0e5
to
3adbf57
Compare
@jsquyres this looks like it works. However, I think you may want to use a different name for your fallback implementation. If you attempt to link libfabric with
since that symbol name is not permitted on 10.11 with This has already caused problems elsewhere where software has provided a custom implementation and called it "clock_gettime" |
Yoinks. Ok. |
clock_gettime(3) on MacOS is... complicated. Base facts: 1. OS X 10.11 (El Capitan) does not have clock_gettime(3). 1. MacOS 10.12 (Sierra) has clock_gettime(3). 1. XCode 8.x allows you to build 10.12 applications on 10.11 by setting the env var SDKROOT to point to the 10.12 root. On 10.11, the SDK libraries contain a weak symbol for clock_gettime(3) (and some other POSIX functions). Meaning: 1. If you run the resulting executable on 10.11, you get a linker error because there is no strong symbol for clock_gettime behind the weak symbol. 1. If you run the resulting executable on 10.12, it works fine. What this boils down to is: not only do we have to AC_SEARCH_LIBS to check for the presence of the clock_gettime symbol, we also have to make sure that it actually works (with AC_RUN_IFELSE). This is a little gross, but... pass the beer nuts. If we don't have a functioning clock_gettime(3), then emulate it. Also due to OS X/MacOS shenanigans, update the rest of the code base to rename our emulated clock_gettime() to be ofi_clock_gettime(). In systems with a functioning clock_gettime(3), this is just an inline passthru to clock_gettime(3). Otherwise, we call an OS-specific emulation function. The reason for this is because Apple is encouraging developers to link with "-Wl,-no_weak_imports", which will fail to link if the clock_gettime weak symbol is pulled in (per the problematic scenario described above). So just avoid the whole situation by always using ofi_clock_gettime(). Finally, separate off the clock_gettime(3) checks into their own .m4 to keep the insanity limited. Thanks to @ilovezfs for identifying the issue on Homebrew/homebrew-core#6551. Signed-off-by: Jeff Squyres <[email protected]>
3adbf57
to
cc5c3e3
Compare
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok). Signed-off-by: Jeff Squyres <[email protected]>
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok). Signed-off-by: Jeff Squyres <[email protected]>
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok). Signed-off-by: Jeff Squyres <[email protected]>
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok). Signed-off-by: Jeff Squyres <[email protected]>
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok). Signed-off-by: Jeff Squyres <[email protected]>
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok). Signed-off-by: Jeff Squyres <[email protected]>
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok). Signed-off-by: Jeff Squyres <[email protected]>
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok. Signed-off-by: Jeff Squyres <[email protected]>
@ilovezfs It took forever, but I think we finally got the right fix committed upstream: ofiwg/libfabric#2508 |
@jsquyres nice work! Let's give it a try |
cc5c3e3
to
065f05a
Compare
@jsquyres it worked for me locally and now CI 🍏. As you can see, I've modified the formula to build 16326406506adf8b3f8b30d802453064395d3341 as 1.4.1-alpha1 so that it could go through CI, but it would be great if you could create a new version (preferably a release tarball but just a new tag would work too if we throw in the Autotools dependencies, as I've done here). |
I don't know if we're ready to do a 1.4.1 release, or if we'll do one just for this fix. Would it be possible to download the 1.4.0 tarball, but then apply a patch file corresponding to ofiwg/libfabric@0b0c889? |
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok. Signed-off-by: Jeff Squyres <[email protected]>
065f05a
to
8e59bf2
Compare
@jsquyres shipped! My apologies for the delay. |
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok. As noted above, this issue was discovered after v1.4.0 and was fixed on master. However, we later branched from the v1.4.0 tag to make the v1.4.x branch to fix some specific bugs, and then released v1.4.1. Unfortunately, the fix for this issue was forgotten, and not included in the v1.4.1 release (see Homebrew/homebrew-core#10292). Although it is unlikely, I'm cherry-picking this fix from master to the v1.4.x branch just in case we ever do a v1.4.2 release. Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit 0b0c889)
The portability of clock_gettime(3) on Mac OS / OS X is... complicated. Per Homebrew/homebrew-core#6551 and ofiwg#2508, remove the use of clock_gettime(3) from the core of libfabric. clock_gettime(3) is still used in some providers that do not compile on OS X / MacOS, but that's ok. This issue was discovered after v1.4.0 was released and was fixed on master. However, we later branched from the v1.4.0 tag to make the v1.4.x branch in order to fix some specific bugs. We then released v1.4.1. Unfortunately, the fix for this issue was forgotten / not included in the v1.4.1 release (see Homebrew/homebrew-core#10292). Although it is unlikely, I'm cherry-picking this fix from master to the v1.4.x branch just in case we ever do a v1.4.2 release. Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit 0b0c889)
Created with
brew bump-formula-pr
.