-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HEDLEY_ASSUME_ALIGNED #1
Comments
I asked about this on a GCC ML, and the response wasn't what I was hoping for. Basically, Something like #define HEDLEY_ASSUME_ALIGNED(arg, align) (arg = __builtin_assume_aligned(arg, align)) would work, but we can't do that because it would evaluate Next idea is going the opposite direction, and forcing everyone else to use a GCC-style macro: void* foo = HEDLEY_ASSUME_ALIGNED(bar, align); Unfortunately, this would have the same problem with the argument being evaluated twice for everyone other than GCC: #define HEDLEY_ASSUME_ALIGNED(arg, align) (__assume_aligned((arg), (align)), (arg)) We want to return a value, so a do...while loop with a temp variable won't work (besides, that would restrict us to C99+). Just using a function won't work because the Basically, I don't think there is a way to support this without pushing an ifdef into the code of people using Hedley, which is unacceptable. I'm out of ideas, other than to ask the GCC people to change |
I also had a version like this for gcc: What's more interesting (and what I wanted to demonstrate), is that __assume is much stronger than that, |
void* const x = ...;
__assume_aligned(x, 32); *Compiler explodes*. If you care about C, you may want to change that to something like #if !defined(__cplusplus)
#define __assume_aligned(x,y) (x=(__typeof__(x))(__builtin_assume_aligned((void*)x,y)))
#elif /* Test for C++11 mode */
#define __assume_aligned(x,y) (x=decltype(x)(__builtin_assume_aligned((void*)x,y)))
#else
...
#endif Obviously it's a GNU extension, but decltype doesn't work at all in C mode… |
If you want, I could implement __assume for GCC using __builtin_unreachable(). Unfortunately it won't work for specifying alignment, at least not currently (see that response on the gcc-help ML), but maybe in the future… #if HEDLEY_GCC_HAS_BUILTIN(__builtin_unreachable,4,5,0)
# define HEDLEY_ASSUME(expr) ((expr) ? 1 : (__builtin_unreachable(), 0))
#elif ...
#endif |
I think that VS/IC have some kind of tracking system for value bits. Btw, what do you think about my coroutine implementation - https://github.com/Shelwien/stegdict/blob/master/Lib/coro2b.inc |
I think it does work for clang. I haven't verified, but see http://llvm.org/docs/doxygen/html/Compiler_8h.html#a2fd576fb00a760ba803c8a171bff051a (grep for LLVM_ASSUME_ALIGNED). Maybe the author was just optimistic, but if not this could take care of HEDLEY_ASSUME_ALIGNED for clang.
Interesting. It looks like it would take me a while to review the code, a coroutine implementation could be a nice feature for portable-snippets; if you would be interested in that you can file an issue there, that would be a good place to discuss it. FWIW, I've been planning on adding libcoro to Squash for some stuff, it might be interesting to compare. Also, Microsoft has an interesting fiber API which could make for a nice backend… |
An example with __assume: |
Maybe a forum thread - https://encode.ru/threads/2714-Coroutine-class-implementation |
Have you seen this to work? By the way, why not:
Isn't this enough to say the address of the pointer is aligned? |
It's a thing people do, I've never actually verified that the code generated by MSVC works as expected, but I think it does. Intel has some documentation where it works at https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization FWIW,
That's probably okay since this is compiler-specific anyways. In theory, though, you could run into some problems because you're doing math on mixed types (an int and a pointer). With the code I have above you do the math on two pointers of the same type, then modulus on the integer type. |
I wonder if there is someone in the MSVC Compiler Team we can ask if this works and makes the compiler treat the data as aligned for aligned load when using SIMD. |
After seeing the propasal for There is still a lot of work to do (especially testing), and it's already pretty ugly, but this version might work out. If anyone has a better test case I'd be very interested. |
Is might be reasinable to make |
I didn't go back and read this whole issue, but at least regarding @yuri-kilochek comment, I don't think the reason for "assume aligned" is to generate aligned instructions (e.g., E.g., during vectorization the compiler might want the core loop to be aligned, not to use aligned load instructions, but because aligned access is more efficient. To do that, it may insert a prologue which does some scalar iterations to get to an alignment point: this code is redundant if the input is already aligned. I think that's the main purpose of assume aligned. Now I am not sure if MSVC actually ever does that - my experience is more gcc and clang. Another reason to assume alignment is allow load-op instructions (instructions that include a memory operand) for SSE instructions, which require alignment for memory operations (AVX changed that and load-op instructions no longer needed to be aligned). |
I don't want to forget about all this again, so…
I'm having trouble with
__builtin_assume_aligned
.For ICC there is
__assume_aligned
. This is what I'm looking to emulate; you just pass it a pointer and the desired alignment:For MSVC there is
__assume
, which is a bit of a pain to use, but basically you pass an expression (which evaluates to true if the variable is aligned as expected). Something likeGCC 4.7+, OTOH, does things a bit different. It returns a value, and you're supposed to use the returned value:
The question is, with
__builtin_assume_aligned
, does GCC know thatarg
(notx
) is 16-byte aligned? Basically, can I use it like__assume_aligned
?Also, for 4.5+ (when
__builtin_unreachable was added
), if I do something likeWould GCC know that
arg
is aligned?Next, there is the question of OpenMP. If
_OPENMP >= 201307L
(4.0), should we output#pragma omp simd aligned(arg, align)
? If so, in addition to the builtin/intrinsic, or instead of it?The text was updated successfully, but these errors were encountered: