Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-0.4, RFC]: Remove 0-size and variable size array members from public headers. #14829

Merged
merged 1 commit into from
Mar 7, 2016

Conversation

nalimilan
Copy link
Member

Backport of 63f3edf to release-0.4. This allows building with GCC 6.0.

GCC 6.0 has just entered Fedora rawhide, and it seems smart enough to detect empty structs with variable-size arrays despite of the tricks used in the code. This is a manual backport, but I'm not actually sure this doesn't introduce any ABI breakage. At #13945 (comment), @yuyichao said it didn't, but I'd like him to confirm this.

There's still the API break, though, so that's not ideal for embedding. A simpler work-around would be a better solution, if at all possible.

@yuyichao
Copy link
Contributor

LGTM except the API breakage.

What's the error with GCC 6.0? Can it be work around with proper CFLAGS or CXXFLAGS?

@nalimilan
Copy link
Member Author

The exact error is:

g++ -march=x86-64 -m64 -I/usr/include/llvm33  -DNDEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -fomit-frame-pointer -std=c++11 -fvisibility-inlines-hidden -fno-exceptions -fPIC -Woverloaded-virtual -Wcast-qual  -DSYSTEM_LLVM -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -march=x86-64 -pipe -fPIC -fno-rtti -O3 -ggdb3 -falign-functions -momit-leaf-frame-pointer -D_GNU_SOURCE -Iflisp -Isupport -I/usr/include/llvm33 -I/builddir/build/BUILD/julia/build/usr/include -I/builddir/build/BUILD/julia/build/usr/include -DLIBRARY_EXPORTS -I. -I/builddir/build/BUILD/julia-0.4.3/deps/valgrind -Wall -Wno-strict-aliasing -fno-omit-frame-pointer -fvisibility=hidden -fno-common -DLLVM_SHLIB "-DJL_SYSTEM_IMAGE_PATH=\"../lib64/julia/sys.so\"" -c debuginfo.cpp -o debuginfo.o
In file included from debuginfo.cpp:42:0:
julia.h:87:34: error: flexible array member '_jl_value_t::fieldptr' in an otherwise empty 'struct _jl_value_t'
     struct _jl_value_t *fieldptr[];
                                  ^
julia.h:85:16: note: in the definition of 'struct _jl_value_t'
 typedef struct _jl_value_t {
                ^~~~~~~~~~~
In file included from disasm.cpp:81:0:
julia.h:87:34: error: flexible array member '_jl_value_t::fieldptr' in an otherwise empty 'struct _jl_value_t'
     struct _jl_value_t *fieldptr[];
                                  ^
julia.h:85:16: note: in the definition of 'struct _jl_value_t'
 typedef struct _jl_value_t {
                ^~~~~~~~~~~

I agree that only changing a flag would be ideal, but I haven't found one. Since this goes against the C standard, I'm not certain this error can be disabled.

@yuyichao
Copy link
Contributor

Is the error still there if you replace -std=c++11 with -std=gnu++11?

This is actually standard C. It's not standard C++.... edit: yeah, I think empty struct is indeed non-standard C. (I was thinking about flexible struct members)

@nalimilan
Copy link
Member Author

Is the error still there if you replace -std=c++11 with -std=gnu++11?

Doesn't seem to make any difference:
http://koji.fedoraproject.org/koji/watchlogs?taskID=12712030

(At least, assuming that the last of -std=gnu++11 and -std=c++11 wins.)

This is actually standard C. It's not standard C++....

AFAIK C99 allows variable-length arrays, but only at the end of a non-empty struct.

@tkelman
Copy link
Contributor

tkelman commented Jan 28, 2016

gnu99 maybe? I think C11 made optional a few of the gcc extensions from c99. Surprising to see gcc break their own extensions.

Adjusting the headers like this is risky for a backport. I'd like embedders to test this before merging.

@tkelman tkelman changed the title RFC: Remove 0-size and variable size array members from public headers. [release-0.4, RFC]: Remove 0-size and variable size array members from public headers. Jan 28, 2016
@nalimilan
Copy link
Member Author

gnu99 maybe? I think C11 made optional a few of the gcc extensions from c99. Surprising to see gcc break their own extensions.

Unfortunately, same result:
https://kojipkgs.fedoraproject.org//work/tasks/2242/12712242/build.log

But I've not found any reference about what happens when -std is passed twice. Are you sure the last one wins? (I don't know where -std=c++11 comes from, as it doesn't appear in the Makefiles -- LLVM?)

Adjusting the headers like this is risky for a backport. I'd like embedders to test this before merging.

Right.

@yuyichao
Copy link
Contributor

gnu99 maybe

The error is in C++ so the standard for C shouldn't make a difference...

Have you tried to reproduce it locally? (and run the failing command directly to make sure you only provide one -std flag). Seems that VLA might be in the c++1y draft (although not in zero sized struct) so might worth give that a try too.

To make this PR less risky, you can probably try to only change the jl_value_t part and leave other ones alone.

@nalimilan
Copy link
Member Author

The error is in C++ so the standard for C shouldn't make a difference...

Have you tried to reproduce it locally? (and run the failing command directly to make sure you only provide one -std flag). Seems that VLA might be in the c++1y draft (although not in zero sized struct) so might worth give that a try too.

I haven't, since I don't have the same gcc version on my distro release. But it does seem that the option takes effect, as passing -std=c++1y triggers different errors in addition to the one at stake:
https://kojipkgs.fedoraproject.org//work/tasks/2507/12712507/build.log

To make this PR less risky, you can probably try to only change the jl_value_t part and leave other ones alone.

Sure. But what's the smallest independent set of changes I can retain? Just those affecting struct jl_value_t?

@yuyichao
Copy link
Contributor

Does this help? The removal of fieldptr0 may not be necessary.

@nalimilan
Copy link
Member Author

Looks like it does, but I still get the '::isinf' has not been declared errors, which seem to have been introduced today by a package update, since the previous build now fails too. It must be due to an upgrade to glibc-headers, as that's about the only package which has been updated -- but I don't know why. I need to debug this before I can confirm it works.

@nalimilan
Copy link
Member Author

For reference, I've filed a bug at https://bugzilla.redhat.com/show_bug.cgi?id=1302825. The joys of the bleeding edge...

@nalimilan
Copy link
Member Author

OK, now it works in Fedora. So looks like @yuyichao's patch is a cleaner solution than my patch.

@nalimilan
Copy link
Member Author

Oh, and skipping the part of the patch about fieldptr0 also works fine.

@yuyichao
Copy link
Contributor

@nalimilan Is it possible to backport the non-breaking part of my patch to the 0.4 branch (replace v->fieldptr with jl_data_ptr(v)) and maintain the breaking part (remove fieldptr from jl_value_t) as a downstream packaging patch for now? I think this can avoid possible breakage (it will be very minimum and easy to fix in a backward compatible way though) and also minimize conflict in backport. When GCC 6.1 is released and if people start to hit this issue on the 0.4 branch, I think we can revisit this and merge the slightly breaking fix at that time.

@nalimilan
Copy link
Member Author

I'm fine with keeping the patch downstream and only backporting non-breaking changes. But I fear there's no good solution if we decide to support GCC6 at some point...

@yuyichao
Copy link
Contributor

Master should be fine already and I was just hoping that we can release 0.5 before GCC 6.1 is widely used in distributions. Another option is to only introduce the breaking change on GCC 6, should be as simple as conditionally not defining fieldptr on __GNUC__ >= 6.

@yuyichao
Copy link
Contributor

I've updated my branch to do that. Can you check if it works?

The only breakage would be on gcc 6.0+ in c++ code, in which case there's nothing better we can do, and it shouldn't break any existing working setup.

@nalimilan
Copy link
Member Author

I actually had the same idea. Users of GCC 6 during the lifetime of Julia 0.4 will likely be a small minority, so only breaking the API there is a good solution. But your last patch isn't correct: you got the condition on the GCC version backward :-). Also, no need to check for cplusplus, as even the C compiler raises an error with GCC 6 (this isn't valid C either). With these changes, it works fine.

That said, that behavior change in GCC isn't great, so I've filed a bug. If they close it, I guess we should merge your (fixed) patch:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69550

@jwakely
Copy link

jwakely commented Jan 29, 2016

gnu99 maybe? I think C11 made optional a few of the gcc extensions from c99. Surprising to see gcc break their own extensions.

No, it's not valid in any C mode, and if you tried to compile it with a C compiler it would have failed years ago. But the issue is about C++ code not C, and previously G++ was not checking flexible array members properly. Now it is compatible with the C semantics (because flexible array members are not standard in C++, so the C rules are the sensible ones to follow). So previously it was rejected by gcc and accidentally accepted by g++, now it's rejected by both. The extension still works as designed, to be compatible with C, but you were not using it that way.

@yuyichao
Copy link
Contributor

So previously it was rejected by gcc

I don't think so. It was accepted by both gcc and g++.

@jwakely
Copy link

jwakely commented Jan 29, 2016

I don't think so. It was accepted by both gcc and g++.

No, the C compiler always rejected it long before GCC 6.

http://ideone.com/YnOrru (I don't know what version of GCC that uses)

http://melpon.org/wandbox/permlink/gsgofr0VzgxsF5Fw (GCC 4.8.2, with -std=gnu99)

@yuyichao
Copy link
Contributor

That code is not really what we have here. (Note that I'm not trying to argue it should be accepted, only it was accepted by GCC and G++ before 6.0)

what we have is

typedef struct mytype {
    struct mytype *fieldptr0[0];
    struct mytype *fieldptr[];
} mytype;

which I believe is accepted by GCC and G++ < 6.0. Note that the first member is the 0-length array extension and not the C99 standard flexible array member.

@jwakely
Copy link

jwakely commented Jan 29, 2016

Then the GCC bug report that asks for the error to be relaxed should probably use the real code, so that we don't adjust the diagnostic to allow some other invalid piece of code that doesn't actually help you!

@yuyichao
Copy link
Contributor

I see that there's a typo in the code attached to the gcc bug report. Sorry about that. You can see what we have as fieldptr0 here. I see why you say the code is non-sense (it really is...). I'm actually a little surprised that having two flexible array members was accepted by g++ at all....

@nalimilan
Copy link
Member Author

Sorry for the confusion, I must say I don't quite get how the combination of these two tricky features works. Glad you replaced them with something less hacky. :-)

@nalimilan
Copy link
Member Author

On the GCC bug tracker, somebody noted:

But if having two such arrays is nececcary for some reason, changing the flexible array member to a zero-length array should get around the error in an ABI/API compatible way.

What do you think of that solution?

@yuyichao
Copy link
Contributor

See keno's comment in my original pr. It was exactly the solution I tried first but it doesnt work since it causes a out of bound runtime error on osx.

@nalimilan
Copy link
Member Author

Too bad. Could you reply on the GCC bug tracker then? I wouldn't be able to explain the problem.

@yuyichao
Copy link
Contributor

Yeah, I'm going to do that.

@nalimilan
Copy link
Member Author

I meant, special-casing GCC without breaking the API, i.e. simply by using a zero-sized array instead of a flexible array. But if the problem is with the libc, we can't do that (except if we don't expect people to ue GCC 6 on Mac for some time).

@tkelman
Copy link
Contributor

tkelman commented Feb 5, 2016

Homebrew usually updates pretty quickly so people on mac will likely be using gcc 6 soon after it gets released.

@tkelman
Copy link
Contributor

tkelman commented Feb 12, 2016

@nalimilan @yuyichao what's the status here? Is this in a state that we can go ask some embedders to test against this branch and make sure this actually doesn't change anything relative to 0.4.3?

@nalimilan
Copy link
Member Author

The problem is, I think this will inevitably break this API for embedders. The only thing we can check is whether they use it or not.

We could also ask gcc developers whether they think they are going to do something about this, even if they don't seem super enthusiastic.

@tkelman
Copy link
Contributor

tkelman commented Feb 12, 2016

cc @waTeim would it be hard for you to test node-julia against this branch?

@waTeim
Copy link
Contributor

waTeim commented Feb 12, 2016

@tkelman sure can, do I simply check out a branch and compile-install? BTW, as far as OS/X goes for embedding, it's always going to be what is provided by Xcode; clang or if explicitly g++ then only g++ as emulated by clang, so if this were a problem of g++ not flagging an error, wouldn't clang-g++ catch it anyway? And aren't new gcc features/fixes going to lag until they get incorporated by clang?

@yuyichao
Copy link
Contributor

@waTeim You should try the yyc/no-empty-struct-0.4 branch, which should be much less breaking then this one. (It'll only make a difference with GCC6 though, so you may want to remove the gcc version check to test it on a older gcc).

@waTeim
Copy link
Contributor

waTeim commented Feb 12, 2016

@yuyichao Ok, so I'm kinda catching up on this. Are there 3 things do do then?

  1. Compile as is to see it doesn't break things
  2. Remove the compiler version check to see it doesn't break things in the future.
  3. Use gcc 6 to see it doesn't break things

Do this on both Linux and OS/X if possible (don't think 3 on OS/X is possible for me). I do have access to Ubuntu 15 though, is that where gcc 6 is landing or is this still all speculative at this point?

@yuyichao
Copy link
Contributor

I'd say 1 and 2 (or even 2 only, since 1 should be non-breaking on non-gcc 6) should be good enough. It's clear that gcc 6 might be breaking something (like this case) but if it breaks node-julia other than the case we fix in our header, there's little we can do about it (Of course, you may want to make sure that node-julia will work on gcc 6 (i.e. do 3) but that's orthogonal to this PR).

@waTeim
Copy link
Contributor

waTeim commented Feb 12, 2016

So I'm encountering this problem on step 1

ld: library not found for -lcrt1.10.6.o
collect2: error: ld returned 1 exit status
make[3]: *** [sblat1] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [tests] Error 2
*** Clean the OpenBLAS build with 'make -C deps clean-openblas'. Rebuild with 'make OPENBLAS_USE_THREAD=0 if OpenBLAS had trouble linking libpthread.so, and with 'make OPENBLAS_TARGET_ARCH=NEHALEM' if there were errors building SandyBridge support. Both these options can also be used simultaneously. ***
make[1]: *** [openblas/libopenblas64_.dylib] Error 1

is this a matter of having an old vers of gfortran?

bizarro% gfortran -v
gfortran: warning: couldn't understand kern.osversion '15.2.0
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/usr/local/homebrew/Cellar/gfortran/4.8.2/gfortran/libexec/gcc/x86_64-apple-darwin13.1.0/4.8.2/lto-wrapper
Target: x86_64-apple-darwin13.1.0
Configured with: ../configure --prefix=/usr/local/homebrew/Cellar/gfortran/4.8.2/gfortran --datarootdir=/usr/local/homebrew/Cellar/gfortran/4.8.2/share --bindir=/usr/local/homebrew/Cellar/gfortran/4.8.2/bin --enable-languages=fortran --with-system-zlib --with-gmp=/usr/local/opt/gmp --with-mpfr=/usr/local/opt/mpfr --with-mpc=/usr/local/opt/libmpc --with-cloog=/usr/local/opt/cloog --with-isl=/usr/local/opt/isl --disable-cloog-version-check --disable-isl-version-check --enable-checking=release --disable-stage1-checking --disable-libstdcxx --enable-lto --disable-nls --disable-multilib
Thread model: posix
gcc version 4.8.2 (GCC) 

@tkelman
Copy link
Contributor

tkelman commented Feb 12, 2016

That usually sounds like an xcode-select --install problem if you've upgraded osx recently

@waTeim
Copy link
Contributor

waTeim commented Feb 12, 2016

not recently, no, it's been 5 months. But I think here's the thing.

bizarro% which gcc
/usr/bin/gcc    <--- that's from xcode

bizarro% gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/c++/4.2.1
Apple LLVM version 7.0.0 (clang-700.0.72)
Target: x86_64-apple-darwin15.2.0
Thread model: posix

bizarro% which gfortran   <--- that's from homebrew
/usr/local/bin/gfortran

It's gfortran that compiles openblas, is it not? Thus I need to update gfortran via homebrew?

@nalimilan
Copy link
Member Author

gcc developers just confirmed they won't allow to disable the error. So I think we should merge the fix conditional on GCC 6 to limit possible breakage. Anyway it can't be worse than a build failure.

@tkelman
Copy link
Contributor

tkelman commented Feb 22, 2016

That's unfortunate. This is in julia.h so getting used in both C and C++, so is the "rejected in C mode" statement right?

Which branch has your proposed conditional version?

@nalimilan
Copy link
Member Author

This commit is apparently the less breaking one: 2d9f368

@nalimilan
Copy link
Member Author

@yuyichao Could you push the most limited fix to the release-0.4 branch (after confirmation from @tkelman)? If I'm not mistaken, that's 2d9f368. We should really fix the build with GCC6 in 0.4.4.

@yuyichao
Copy link
Contributor

yuyichao commented Mar 5, 2016

Yes it is 2d9f368. I'll assume this can go into @tkelman 's release-0.4.4 PR.

@tkelman
Copy link
Contributor

tkelman commented Mar 6, 2016

Is there any reason to keep the contents of this PR's branch around? If not, we could force push that commit here and do it that way.

I'm a little concerned about what happens if we upgrade the buildbots to use gcc 6 for creating nightlies, does that mean the binaries will get built in a way that does not match how embedding clients will then need to use them?

@yuyichao
Copy link
Contributor

yuyichao commented Mar 6, 2016

There shouldn't be any ABI changes.

@nalimilan
Copy link
Member Author

No reason to keep this PR, but we've been using it as a tracker for this issue. Close once you'll have backported the other fix.

@tkelman
Copy link
Contributor

tkelman commented Mar 6, 2016

May as well have the tracker for the issue also directly associated with the resolution of the issue?

By removing flexible sized array in empty struct.
@yuyichao
Copy link
Contributor

yuyichao commented Mar 6, 2016

I've rebased the commit and pushed it to this branch.

tkelman added a commit that referenced this pull request Mar 7, 2016
[release-0.4, RFC]: Remove 0-size and variable size array members from public headers.
@tkelman tkelman merged commit 147d6d0 into release-0.4 Mar 7, 2016
@tkelman tkelman deleted the nl/gcc6 branch March 7, 2016 00:34
@nalimilan
Copy link
Member Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants