Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Headers] Don't declare unreachable() from stddef.h in C++ #86748

Merged
merged 1 commit into from
Apr 4, 2024

Conversation

ian-twilightcoder
Copy link
Contributor

@ian-twilightcoder ian-twilightcoder commented Mar 26, 2024

Even if __need_unreachable is set, stddef.h should not declare unreachable() in C++ because it conflicts with the declaration in <utility>.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Mar 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2024

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: Ian Anderson (ian-twilightcoder)

Changes

Even if __need_unreachable is set, stddef.h should not declare unreachable() in C++ because it conflicts with the declaration in <utility>.


Full diff: https://github.com/llvm/llvm-project/pull/86748.diff

1 Files Affected:

  • (modified) clang/lib/Headers/__stddef_unreachable.h (+4)
diff --git a/clang/lib/Headers/__stddef_unreachable.h b/clang/lib/Headers/__stddef_unreachable.h
index 518580c92d3f5d..61df43e9732f8a 100644
--- a/clang/lib/Headers/__stddef_unreachable.h
+++ b/clang/lib/Headers/__stddef_unreachable.h
@@ -7,6 +7,8 @@
  *===-----------------------------------------------------------------------===
  */
 
+#ifndef __cplusplus
+
 /*
  * When -fbuiltin-headers-in-system-modules is set this is a non-modular header
  * and needs to behave as if it was textual.
@@ -15,3 +17,5 @@
     (__has_feature(modules) && !__building_module(_Builtin_stddef))
 #define unreachable() __builtin_unreachable()
 #endif
+
+#endif

@Bigcheese
Copy link
Contributor

I'm checking with the C and C++ Compatibility study group (SG22) for what's expected here.

Even if __need_unreachable is set, stddef.h should not declare unreachable() in C++ because it conflicts with the declaration in <utility>.
@ian-twilightcoder
Copy link
Contributor Author

ian-twilightcoder commented Mar 27, 2024

I'm checking with the C and C++ Compatibility study group (SG22) for what's expected here.

Prior to adding __need_unreachable in LLVM 18, clang would never declare unreachable() in C++ mode, but would defer to the C++ library to do that. I think we should keep with that behavior and let libc++'s <utility> and possibly <stddef.h> headers handle declaring unreachable().

@ldionne
Copy link
Member

ldionne commented Mar 27, 2024

Why do the Clang builtin headers even try to define this? Shouldn't this be provided by the platform's C library instead? I am really confused about what's the intended layering here and it seems to me like Clang is overstepping its responsibilities by basically implementing parts of the C library.

@AaronBallman
Copy link
Collaborator

Why do the Clang builtin headers even try to define this? Shouldn't this be provided by the platform's C library instead? I am really confused about what's the intended layering here and it seems to me like Clang is overstepping its responsibilities by basically implementing parts of the C library.

I did that in a9797d7

It's a freestanding feature in C23, so we need to expose it from stddef.h at least when not in hosted builds. I don't understand why this is making into C++ builds at all:

We set __need_unreachable based on __STDC_VERSION__:

#define __need_unreachable

We only include __stddef_unreachable.h if __need_unreachable is defined:

#include <__stddef_unreachable.h>

@AaronBallman AaronBallman self-requested a review March 27, 2024 16:03
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as requesting changes so this isn't accidentally landed when there's an open question as to why it's needed in the first place.

@ldionne
Copy link
Member

ldionne commented Mar 27, 2024

Why do the Clang builtin headers even try to define this? Shouldn't this be provided by the platform's C library instead? I am really confused about what's the intended layering here and it seems to me like Clang is overstepping its responsibilities by basically implementing parts of the C library.

I did that in a9797d7

It's a freestanding feature in C23, so we need to expose it from stddef.h at least when not in hosted builds. I don't understand why this is making into C++ builds at all:

Thanks for the explanation. I think I don't understand the relationship between that feature being freestanding and the fact that it's implemented in the Clang builtin headers. I might be misunderstanding (or just ignorant of) the Clang approach for freestanding in C. Doesn't Clang still rely on there being a C library even in Freestanding mode?

@AaronBallman
Copy link
Collaborator

Why do the Clang builtin headers even try to define this? Shouldn't this be provided by the platform's C library instead? I am really confused about what's the intended layering here and it seems to me like Clang is overstepping its responsibilities by basically implementing parts of the C library.

I did that in a9797d7
It's a freestanding feature in C23, so we need to expose it from stddef.h at least when not in hosted builds. I don't understand why this is making into C++ builds at all:

Thanks for the explanation. I think I don't understand the relationship between that feature being freestanding and the fact that it's implemented in the Clang builtin headers. I might be misunderstanding (or just ignorant of) the Clang approach for freestanding in C. Doesn't Clang still rely on there being a C library even in Freestanding mode?

In C, freestanding used to be "here's a pile of macros", so the compiler was able to easily supply a freestanding set of headers. That's changed with C23 (for example, we now include string.h and stdbit.h in freestanding, both of which provide a lot of function interfaces). I don't think we've documented an official policy one way or the other about what level of CRT support needs to come from outside of the compiler, but given that unreachable is a macro that expands to a builtin, it is a very reasonable thing for us to expose from our freestanding headers (there's no underlying library support needed, at least in theory). It's somewhat akin to supplying offsetof which is also a function-like macro that expands to a builtin.

@ian-twilightcoder
Copy link
Contributor Author

(for example, we now include string.h and stdbit.h in freestanding, both of which provide a lot of function interfaces)

We do? I don't see those in lib/Headers

@AaronBallman
Copy link
Collaborator

(for example, we now include string.h and stdbit.h in freestanding, both of which provide a lot of function interfaces)

We do? I don't see those in lib/Headers

Sorry for the confusion! "We" being WG14 (the C committee). We (Clang) haven't done anything about either of these headers as we're expecting the user to have to find a freestanding runtime library that provides them.

@ldionne
Copy link
Member

ldionne commented Mar 28, 2024

Thanks for the explanation, @AaronBallman . I think I am generally deeply confused about what should be provided by the compiler and what should be provided by the C Standard Library on any given platform. From your reply, it looks like there's no clear rule and Clang basically provides anything that seems "easy enough" to provide. I still don't understand how that works in case you do end up including a header from the platform that (re)defines unreachable, for example.

The same problem also applies today to e.g. size_t and anything else provided by the Clang builtin headers. If a platform decides to define size_t differently than what the Clang builtin headers define it to, I guess we run into issues? I assume it's not something that happens often because it's pretty unambiguous what these typedefs should be, but still.

Anyway, this might turn out to be nothing more than a documentation issue, but in any case I think it would be valuable to write down how this is intended to work somewhere (even if only in a comment), since I suspect it's not clear to most people. I work on the C++ Standard Library and I don't understand how this works in our own implementation :-)

@ian-twilightcoder
Copy link
Contributor Author

I still don't understand how that works in case you do end up including a header from the platform that (re)defines unreachable, for example.

The same problem also applies today to e.g. size_t and anything else provided by the Clang builtin headers. If a platform decides to define size_t differently than what the Clang builtin headers define it to, I guess we run into issues? I assume it's not something that happens often because it's pretty unambiguous what these typedefs should be, but still.

We do indeed run into issues, the redeclarations cause two problems.

  1. Modules get upset when different modules declare the same name somewhat differently.
  2. Swift uses the module name as part of the type name, and ambiguities arise when different modules define the same type, even identically.

We're needing to carefully remove our duplications in the Apple headers and always use the clang builtins. The coupling is unfortunate, but I don't know of any practical way around it.

@AaronBallman
Copy link
Collaborator

Thanks for the explanation, @AaronBallman . I think I am generally deeply confused about what should be provided by the compiler and what should be provided by the C Standard Library on any given platform. From your reply, it looks like there's no clear rule and Clang basically provides anything that seems "easy enough" to provide. I still don't understand how that works in case you do end up including a header from the platform that (re)defines unreachable, for example.

For C17 and earlier, the rule was that we provided everything required for freestanding by the standard, and anything else the user wanted had to come from a library the user supplied. For C23, which we're still in the process of implementing, the situation is different and we've not documented what the expectations are yet. I suspect we're going to end up requiring the user to provide their own freestanding library implementation and the compiler will provide only the bits that the compiler has to provide (things like limits.h).

The unreachable macro is interesting in that it could be provided by either a CRT implementation or the compiler. The whole point to the macro is "if the expansion is reached, you have undefined behavior", which fits naturally with __builtin_unreachable() which the backend knows how to do optimizations around. However, a CRT implementation could decide the macro expands to *(int *)nullptr or the likes, which also triggers undefined behavior.

The same problem also applies today to e.g. size_t and anything else provided by the Clang builtin headers. If a platform decides to define size_t differently than what the Clang builtin headers define it to, I guess we run into issues? I assume it's not something that happens often because it's pretty unambiguous what these typedefs should be, but still.

If the platform defines size_t to be different from the Clang builtin headers define it to, there are Serious Problems™ because sizeof() suddenly returns strange things, interfaces like printf will find ABI issues with %z, etc.

Anyway, this might turn out to be nothing more than a documentation issue, but in any case I think it would be valuable to write down how this is intended to work somewhere (even if only in a comment), since I suspect it's not clear to most people. I work on the C++ Standard Library and I don't understand how this works in our own implementation :-)

Definitely agreed that we need to document this, but we need to figure out what we want to document in the first place -- some of this is new-ish territory because of the additional interfaces. unreachable is morally similar to offsetof in that it's a macro interface where the implementation can give better results by expanding to a builtin than a library is likely to be able to give via a naive implementation, but the interfaces can be provided by a CRT. So which one should win? Today, if something else defined offsetof or unreachable before including stddef.h, Clang's headers won't bother re-defining the macro.

I think we may need to look at what existing freestanding CRTs and hosted CRTs expect to help give us a path forward?

@ldionne
Copy link
Member

ldionne commented Mar 28, 2024

Thanks for the explanation, @AaronBallman . I think I am generally deeply confused about what should be provided by the compiler and what should be provided by the C Standard Library on any given platform. From your reply, it looks like there's no clear rule and Clang basically provides anything that seems "easy enough" to provide. I still don't understand how that works in case you do end up including a header from the platform that (re)defines unreachable, for example.

For C17 and earlier, the rule was that we provided everything required for freestanding by the standard, and anything else the user wanted had to come from a library the user supplied. For C23, which we're still in the process of implementing, the situation is different and we've not documented what the expectations are yet. I suspect we're going to end up requiring the user to provide their own freestanding library implementation and the compiler will provide only the bits that the compiler has to provide (things like limits.h).

Thanks, this clarifies a lot of things in my head.

Definitely agreed that we need to document this, but we need to figure out what we want to document in the first place -- some of this is new-ish territory because of the additional interfaces. unreachable is morally similar to offsetof in that it's a macro interface where the implementation can give better results by expanding to a builtin than a library is likely to be able to give via a naive implementation, but the interfaces can be provided by a CRT. So which one should win? Today, if something else defined offsetof or unreachable before including stddef.h, Clang's headers won't bother re-defining the macro.

I think we may need to look at what existing freestanding CRTs and hosted CRTs expect to help give us a path forward?

Personally, my naive take on this is that Clang should basically define the least amount of things from the C Standard Library, and let a proper C Standard Library implement it. Having interacted with a good number of folks who work on C libraries (including embedded flavors), I think they all expect that they need to provide basically all the declarations/definitions in the C Standard, except perhaps stuff from <stdarg.h> and a few other oddities. Doing otherwise and trying to be "helpful" in Clang only creates confusion and forces C libraries to jump through hoops to avoid colliding with stuff already defines in Clang builtin headers.

A C library is not super complicated in the first place, so I don't think we're saving a lot of people a lot of time by trying to give them a few definitions here and there. Additionally, there's many freestanding-friendly C libraries around so it's not like most people have to write their own.

That's just my immediate reaction to this. It's pretty naive but at this specific point I believe we should basically not provide any Clang builtin headers except the ones that clearly provide compiler-related stuff (e.g. <stdarg.h> or tmmintrin.h & friends). I'm eager to hear other people's takes on this.

SLA note: I'll be OOO for a few days so I'll only see replies early next week.

@philnik777
Copy link
Contributor

unreachable is morally similar to offsetof in that it's a macro interface where the implementation can give better results by expanding to a builtin than a library is likely to be able to give via a naive implementation, but the interfaces can be provided by a CRT.

FWIW I don't think they are morally similar. unreachable could be implemented in very different ways. Consider a libc which tries to terminate on UB instead of letting the compiler optimize things away (e.g. a debug mode). Then the Clang implementation is not at all what the libc would do.

@AaronBallman
Copy link
Collaborator

unreachable is morally similar to offsetof in that it's a macro interface where the implementation can give better results by expanding to a builtin than a library is likely to be able to give via a naive implementation, but the interfaces can be provided by a CRT.

FWIW I don't think they are morally similar. unreachable could be implemented in very different ways. Consider a libc which tries to terminate on UB instead of letting the compiler optimize things away (e.g. a debug mode). Then the Clang implementation is not at all what the libc would do.

I think we're saying the same thing but in different ways. I'm saying that the compiler implementation is generally what's positioned best in terms of what the optimizer can work with but the library implementation can be positioned to have a conforming implementation with different QoI properties. There's benefits to allowing either approach; the compiler can provide a high-quality implementation so that C standard library maintainers don't have to do anything but the compiler's headers should not prevent a C standard library from providing additional/different functionality.

That's just my immediate reaction to this. It's pretty naive but at this specific point I believe we should basically not provide any Clang builtin headers except the ones that clearly provide compiler-related stuff (e.g. <stdarg.h> or tmmintrin.h & friends). I'm eager to hear other people's takes on this.

There's two angles to this. We provide something already today and we don't want to break code, so whatever we provide tomorrow needs to have a good enough user experience that code either still works, or users have an easy way to react to the changes. So reducing the set of interfaces we provide is risky, though I would argue that the C23 offerings are the easiest for us to change because I'm unaware of any fully conforming C standard library for C23 yet. The other angle is where we want to be in an ideal situation. I do not think think "because of GCC compatibility, let's do what glibc does" is the correct default ideal situation for us to look at -- if we're going to make structural changes and document a policy, I would love for us to gather feedback from glibc, MSVC CRT, musl, ulibc, llvm-libc, bionic, and any other implementations (including STL implementations) we can find contacts for so we can intentionally support a wide set of C standard libraries from Clang without introducing significant burdens for C++ standard library maintainers. For example, we have problems like "who provides what for freestanding", but we also have serious issues with "who defines what" for predefined macros, because some of them (__STDC_IEC_559__ and others) require a handshake between both parts of the implementation.

So I think I see a short-term and a long-term project here. Long-term, we should find a way to migrate our existing headers to a more intentional approach (whatever that may be). Short-term, we need to do something for unreachable in stddef.h and I think there's two initial tasks: 1) can we find evidence that we still need the __need machinery or can that be removed entirely and simplify the interface in Clang? 2) The C++ standard isn't rebased on top of C23 yet, but WG21 needs to figure out what to do about https://eel.is/c++draft/cstddef.syn#1; it would be unintuitive for cstddef to not expose one of C interfaces, but it would be a problem for cstddef to provide a macro that then whacks the function declaration in utility. I can see WG21 solving this either by removing unreachable from cstddef or by requiring utility to guard against the macro via implementation magic (personally, I think utility should guard against it because of users including stddef.h directly) and it would be nice to know which direction the committee is leaning before we get too far into a solution.

WDYT?

Also, I'd love to hear from some llvm-libc folks. @jhuber6 @lntue (feel free to tag others)

@philnik777
Copy link
Contributor

I can see WG21 solving this either by removing unreachable from cstddef or by requiring utility to guard against the macro via implementation magic (personally, I think utility should guard against it because of users including stddef.h directly) and it would be nice to know which direction the committee is leaning before we get too far into a solution.

I'm not sure whether you mean it, but I'd expect that it's handled like signbit and friends - they are functions instead of macros. For libc++ that would simply be #undef unreachable, #include <__utility/unreachable.h> and using std::unreachable.

@AaronBallman
Copy link
Collaborator

I can see WG21 solving this either by removing unreachable from cstddef or by requiring utility to guard against the macro via implementation magic (personally, I think utility should guard against it because of users including stddef.h directly) and it would be nice to know which direction the committee is leaning before we get too far into a solution.

I'm not sure whether you mean it, but I'd expect that it's handled like signbit and friends - they are functions instead of macros. For libc++ that would simply be #undef unreachable, #include <__utility/unreachable.h> and using std::unreachable.

Yeah, that would one way to handle it, but it does still defy some user expectations (and perhaps that's fine):

#include <stddef.h>
// #include "whatever.h" // transitively includes <utility>

#ifndef unreachable
#error "oh no, my macro!"
#endif

Uncommenting whatever.h suddenly causes an error which may be a surprise because some C standard library things remain macros even in C++ (like offsetof, or va_start and friends). Any C standard library function can be implemented as a macro, so the potential for confusion always exists, but for things defined explicitly as a macro in C, they sometimes stay a macro in C++. I think unreachable is probably reasonable to turn into a real function though.

But another way WG21 could perhaps approach this is:

// In <utility>
#ifndef unreachable
void unreachable();
#endif

which retains the macro rather than the function.

@philnik777
Copy link
Contributor

I mean that stddef.h would be modified to provide it as a function instead of a macro. In C++ it would simply never be a macro.

@AaronBallman
Copy link
Collaborator

I mean that stddef.h would be modified to provide it as a function instead of a macro. In C++ it would simply never be a macro.

Yeah, that could be done too, but it seems pretty unclean IMO (at least when aiming for an ideal implementation) -- C++ changing the contents of C headers in ways that are observable to the user is pretty gross. For example, writing a C header file that includes stddef.h and some inline functions may then compile quite differently whether included in a C TU or a C++ TU. e.g.,

// MyAwesomeLibrary.h

#include <stddef.h>

#ifndef unreachable
extern void my_special_unreachable_handler_only_works_in_c_for_reasons(void);
#define unreachable() (my_special_unreachable_handler_only_works_in_c_for_reasons(),*(int *)nullptr)
#endif

enum CoolStuff { Awesome, Radical, Tubular };
inline int func(enum CoolStuff CS) {
  switch (CS) {
  case Awesome: ... break;
  case Radical: ... break;
  case Tubular: ... break;
  }
  unreachable();
}

where the expectation is that because we're including stddef.h, the only time the fallback is needed is with C standard libraries that don't yet support C23. It's a pretty contrived example, so don't read into it too much, it's more that I'm not certain of the usability of quietly changing standard interfaces in the C standard library.

@lntue
Copy link
Contributor

lntue commented Mar 29, 2024

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Mar 29, 2024

I think I am generally deeply confused about what should be provided by the compiler and what should be provided by the C Standard Library on any given platform.

+1

Doing otherwise and trying to be "helpful" in Clang only creates confusion and forces C libraries to jump through hoops to avoid colliding with stuff already defines in Clang builtin headers.
I believe we should basically not provide any Clang builtin headers except the ones that clearly provide compiler-related stuff (e.g. <stdarg.h> or tmmintrin.h & friends).

Weak +1 (+1, but similar levels of unsuredness).


On the llvm-libc side, we're running into pain with some of the compiler provided headers, too, particularly because they tend to themselves include additional headers (the xmm intrinsic headers including stdlib for example). I'm very tempted to resort to -nostdinc and just have our own definitions for everything:

typedef __SIZE_TYPE__ size_t;
...

and just be done with compiler resource headers.


I'll suspect that #define __need_foo #include <stddef.h> pattern makes precompiling stddef.h impossible.


cc @enh for visibility.

@michaelrj-google
Copy link
Contributor

Re: there being observable differences between a header in C vs C++ mode, that's already a thing.

The definition for isinf in math.h is specified to be a macro in C, but specified to be a function in C++. We're already doing #ifdef __cplusplus for that (see https://github.com/llvm/llvm-project/blob/main/libc/include/llvm-libc-macros/math-macros.h), and other libc's do similar (see iszero in glibc).

@philnik777
Copy link
Contributor

Re: there being observable differences between a header in C vs C++ mode, that's already a thing.

The definition for isinf in math.h is specified to be a macro in C, but specified to be a function in C++. We're already doing #ifdef __cplusplus for that (see https://github.com/llvm/llvm-project/blob/main/libc/include/llvm-libc-macros/math-macros.h), and other libc's do similar (see iszero in glibc).

That's actually really bad. Both libc++ and libstdc++ assume that these are either not provided at all by the libc or are macros. This will inevitably lead to ambiguous overloads.

@michaelrj-google
Copy link
Contributor

I'm fine with changing our definition to better coordinate with libc++, though we should probably discuss that on someplace other than this patch.

@ian-twilightcoder
Copy link
Contributor Author

I do wonder if we could have the broader builtin headers discussion independent of this patch? Is everyone happy with this patch? We can keep talking about the builtin headers in here independent of merging right?

@AaronBallman
Copy link
Collaborator

FWIW, I did verify that it's very unlikely the changes in this PR will break existing code: https://sourcegraph.com/search?q=context:global+__need_unreachable+-file:.*clang.*&patternType=keyword&sm=0, so that's a good thing.

I do wonder if we could have the broader builtin headers discussion independent of this patch? Is everyone happy with this patch? We can keep talking about the builtin headers in here independent of merging right?

I guess I don't see these as independent topics; if we decide that C++ mode should not have observable differences in C headers, then the changes here are incorrect. I think we should have this discussion in a broader context (like Discourse) before moving forward with these changes.

Also, I'd still like an explanation for this question:

I don't understand why this is making into C++ builds at all:

because it may turn out we don't need these changes in the first place because the issue is elsewhere.

@ian-twilightcoder
Copy link
Contributor Author

ian-twilightcoder commented Apr 1, 2024

FWIW, I did verify that it's very unlikely the changes in this PR will break existing code: https://sourcegraph.com/search?q=context:global+__need_unreachable+-file:.*clang.*&patternType=keyword&sm=0, so that's a good thing.

I do wonder if we could have the broader builtin headers discussion independent of this patch? Is everyone happy with this patch? We can keep talking about the builtin headers in here independent of merging right?

I guess I don't see these as independent topics; if we decide that C++ mode should not have observable differences in C headers, then the changes here are incorrect. I think we should have this discussion in a broader context (like Discourse) before moving forward with these changes.

Also, I'd still like an explanation for this question:

I don't understand why this is making into C++ builds at all:

because it may turn out we don't need these changes in the first place because the issue is elsewhere.

Right now I just noticed that in a C++ test I was writing that stddef.h alone doesn't give me unreachable, but __needs_unreachable does. And that's probably wrong because unreachable belongs to <utility> in C++ and not stddef.h. The C++ standard has some frustrating divergence with the C standard as to what the c stdlib headers declare...

@AaronBallman
Copy link
Collaborator

FWIW, I did verify that it's very unlikely the changes in this PR will break existing code: https://sourcegraph.com/search?q=context:global+__need_unreachable+-file:.*clang.*&patternType=keyword&sm=0, so that's a good thing.

I do wonder if we could have the broader builtin headers discussion independent of this patch? Is everyone happy with this patch? We can keep talking about the builtin headers in here independent of merging right?

I guess I don't see these as independent topics; if we decide that C++ mode should not have observable differences in C headers, then the changes here are incorrect. I think we should have this discussion in a broader context (like Discourse) before moving forward with these changes.
Also, I'd still like an explanation for this question:

I don't understand why this is making into C++ builds at all:

because it may turn out we don't need these changes in the first place because the issue is elsewhere.

Right now I just noticed that in a C++ test I was writing that stddef.h alone doesn't give me unreachable, but __needs_unreachable does. And that's probably wrong because unreachable belongs to in C++ and not stddef.h. The C++ standard has some frustrating divergence with the C standard as to what the c stdlib headers declare...

I don't yet agree that it's wrong -- you define the macro saying you want unreachable from stddef.h, so you get unreachable from stddef.h. Morally, it's very similar to:

#define unreachable "I am doing something which causes myself pain"
#include <utility>

@ian-twilightcoder
Copy link
Contributor Author

ian-twilightcoder commented Apr 2, 2024

Right now I just noticed that in a C++ test I was writing that stddef.h alone doesn't give me unreachable, but __needs_unreachable does. And that's probably wrong because unreachable belongs to in C++ and not stddef.h. The C++ standard has some frustrating divergence with the C standard as to what the c stdlib headers declare...

I don't yet agree that it's wrong -- you define the macro saying you want unreachable from stddef.h, so you get unreachable from stddef.h. Morally, it's very similar to:

#define unreachable "I am doing something which causes myself pain"
#include <utility>

I was thinking that a lot of the low level Apple SDK C headers use the __need_ macros to add things they use, and that shouldn't mess up C++ clients. Although I guess if they actually use unreachable then they need to have a __cplusplus in there to include <utility> instead of just not getting unreachable at all. So, I guess there's a problem either way. However, I still don't really like the idea that someone can make a mistake in the SDK and break the C++ declaration of unreachable. I also don't like that unreachable is currently inconsistent with nullptr_t, which we almost never declare in C++ mode either, for similar reasons.

@AaronBallman
Copy link
Collaborator

I was thinking that a lot of the low level Apple SDK C headers use the __need_ macros to add things they use, and that shouldn't mess up C++ clients. Although I guess if they actually use unreachable then they need to have a __cplusplus in there to include instead of just not getting unreachable at all. So, I guess there's a problem either way. However, I still don't really like the idea that someone can make a mistake in the SDK and break the C++ declaration of unreachable.

It's a danger, to be sure, but I don't think it warrants a surgical change here.

I also don't like that unreachable is currently inconsistent with nullptr_t, which we almost never declare in C++ mode either, for similar reasons.

It's not inconsistent with nullptr_t, or am I missing something?

#if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L) || \
sets __need_nullptr_t based on it being C++ or C23.
#if defined(__need_nullptr_t)
includes the header with the implementation if __need_nullptr_t is defined.
https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__stddef_nullptr_t.h defines nullptr_t in both C and C++ mode.

@ian-twilightcoder
Copy link
Contributor Author

ian-twilightcoder commented Apr 3, 2024

https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__stddef_nullptr_t.h defines nullptr_t in both C and C++ mode.

It only defines it in C++ in specific MSC circumstances (_MSC_EXTENSIONS and _NATIVE_NULLPTR_SUPPORTED)

@AaronBallman
Copy link
Collaborator

https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__stddef_nullptr_t.h defines nullptr_t in both C and C++ mode.

It only defines it in C++ in specific MSC circumstances (_MSC_EXTENSIONS and _NATIVE_NULLPTR_SUPPORTED)

Ah, you're right, we do define it in C++ but only sometimes.

That seems to be a rather ancient behavior: 158dec5

Regardless, for this patch, it's a hard call either way. GCC does not define unreachable in C++ mode in their stddef.h: https://godbolt.org/z/vY81xvE1M (they use __need macros for some things but not for nullptr_t or unreachable), so there's a compatibility argument. However, the C++ standard makes it pretty clear that this should be defined (https://eel.is/c++draft/support.c.headers#general-1) because the C standard headers are valid in C++ for interoperability with C. Also "Source files that are not intended to also be valid ISO C should not use any of the C headers." makes it somewhat unmotivating to exclude the macro definition in C++ mode.

On balance, I think we should not make changes here unless/until WG21 rebases on top of C23 because this seems to be an issue of the SDK not adhering to the existing contract for including this header with its special __need functionality. However, I will reach out to the C and C++ Compatibility study group to see if I can get some early signals as to which way folks are leaning. If there's a clear direction coming from that, we can use that to help guide our decision before the rebase happens.

@AaronBallman
Copy link
Collaborator

On balance, I think we should not make changes here unless/until WG21 rebases on top of C23 because this seems to be an issue of the SDK not adhering to the existing contract for including this header with its special __need functionality. However, I will reach out to the C and C++ Compatibility study group to see if I can get some early signals as to which way folks are leaning. If there's a clear direction coming from that, we can use that to help guide our decision before the rebase happens.

I've heard back from the compatibility study group and the sentiment there is to not expose unreachable from stddef.h for two primary reasons. 1) Conformance reasons; unreachable isn't reserved and so it's non-conforming in C++ to bring it in, 2) future compat reasons; the expectation is that when C++ rebases onto C23, it will explicitly prohibit exposing unreachable as a macro.

We should probably take a tour through the headers Clang exposes and make sure we're paying attention to the conformance aspects of all the C headers when included in C++ mode.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though we should have a release note about the change because we've been exposing the macro since Clang 17. I don't think this warrant a potentially breaking change notice, though, just a regular bugfix one.

@AaronBallman
Copy link
Collaborator

We should probably take a tour through the headers Clang exposes and make sure we're paying attention to the conformance aspects of all the C headers when included in C++ mode.

I've spotted some problems:
https://godbolt.org/z/G8W7Gs1vE
https://godbolt.org/z/vbjs787TT
https://godbolt.org/z/Ks67KEoh3

So I've filed #87668 to track this.

@ian-twilightcoder
Copy link
Contributor Author

ian-twilightcoder commented Apr 4, 2024

LGTM, though we should have a release note about the change because we've been exposing the macro since Clang 17. I don't think this warrant a potentially breaking change notice, though, just a regular bugfix one.

I think we weren't exposing unreachable in C++ at all in Clang 17

#if defined(__need_STDDEF_H_misc) && defined(__STDC_VERSION__) &&              \
    __STDC_VERSION__ >= 202000L
#define unreachable() __builtin_unreachable()
#endif /* defined(__need_STDDEF_H_misc) && >= C23 */

(__need_STDDEF_H_misc was only set if none of the other __need_ macros were set - nothing would happen if the includer set it manually.)

And then it's only exposed in Clang 18 if something sets the new __needs_unreachable.

Is that still worth mention in the release notes do you think?

@AaronBallman
Copy link
Collaborator

Is that still worth mention in the release notes do you think?

Oh duh, that's right, this requires someone to define the __need macro to get it, so no, I don't think it needs a release note. Sorry for the noise!

@ian-twilightcoder
Copy link
Contributor Author

Is that still worth mention in the release notes do you think?

Oh duh, that's right, this requires someone to define the __need macro to get it, so no, I don't think it needs a release note. Sorry for the noise!

👍 thanks for bearing with me on this!

@ian-twilightcoder ian-twilightcoder merged commit df69a30 into llvm:main Apr 4, 2024
4 checks passed
@ian-twilightcoder ian-twilightcoder deleted the cxx-unreachable branch April 4, 2024 20:02
@AaronBallman
Copy link
Collaborator

Is that still worth mention in the release notes do you think?

Oh duh, that's right, this requires someone to define the __need macro to get it, so no, I don't think it needs a release note. Sorry for the noise!

👍 thanks for bearing with me on this!

Likewise! :-)

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Apr 4, 2024
Even if __need_unreachable is set, stddef.h should not declare
unreachable() in C++ because it conflicts with the declaration in
\<utility>.

(cherry picked from commit df69a30)
tstellar pushed a commit to llvmbot/llvm-project that referenced this pull request Apr 10, 2024
Even if __need_unreachable is set, stddef.h should not declare
unreachable() in C++ because it conflicts with the declaration in
\<utility>.

(cherry picked from commit df69a30)
@pointhex pointhex mentioned this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants