-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[x86/Linux] Stack align 16 bytes for JIT code #8849
Conversation
CC @parjong |
src/jit/target.h
Outdated
@@ -484,9 +484,15 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits | |||
#define MIN_ARG_AREA_FOR_CALL 0 // Minimum required outgoing argument space for a call. | |||
|
|||
#define CODE_ALIGN 1 // code alignment requirement | |||
#if !defined(FEATURE_PAL) | |||
#define STACK_ALIGN 4 // stack alignment requirement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 16-byte stack alignment requirement comes from GCC/Clang (as discussed in #8590). I'm not sure which flag will be better: __GNUC__
or FEATURE_PAL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the stack alignment requirement part of the ABI? Perhaps there should be a UNIX_X86_ABI, similar to UNIX_AMD64_ABI and UNIX_ARM_ABI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, it is not a part of standard ABI. It is just a requirement of GCC (> 4.5) and corresponding Clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parjong. Good point. __GNUC__
vs FEATURE_PAL
for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in this case, __GNUC__
would be the right choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. From https://en.wikipedia.org/wiki/X86_calling_conventions: In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GNUC is not cross-compilation friendly though: It describes the host compiler; not the target environment. Introducing UNIX_X86_ABI
maybe be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UNIX_X86_ABI
maybe be best.
Thank you!. I'll add this to another PR and later rebase this patch. I think this PR will take some time :)
CC: @dotnet/jit-contrib |
Do you see the stack actually aligned with this change? E.g. replace the code to pad the stack in ThePreStub with check that the stack is aligned - does it work? The x86 calling convention is stdcall (=callee pop) and so you may need to pad each callsite by variable pad instead or in addition to this. |
src/jit/codegenxarch.cpp
Outdated
// Push a single 4-byte zero. This matches the 4-byte STACK_ALIGN value. | ||
static_assert_no_msg(STACK_ALIGN == REGSIZE_BYTES); | ||
inst_IV(INS_push_hide, 0); // --- push 4-byte 0 | ||
#else | ||
// Push four 4-byte zeros. This matches the 16-byte STACK_ALIGN value. | ||
static_assert_no_msg(STACK_ALIGN == REGSIZE_BYTES * 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it is better to replace all of these #ifdefs with a loop that pushes STACK_ALIGN bytes of zeros onto the stack.
static_assert_no_msg((STACK_ALIGN % REGSIZE_BYTES) == 0));
unsigned const count = (STACK_ALIGN / REGSIZE_BYTES);
for (unsigned i=0; i<count; i++)
{
inst_IV(INS_push_hide, 0); // --- push REG_SIZE bytes of 0
}
// Note that the stack must always be aligned to STACK_ALIGN bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment. It's more clean and nice.
src/jit/lclvars.cpp
Outdated
// We add 16 so compLclFrameSize is still a multiple of 16. | ||
lvaIncrementFrameSize(STACK_ALIGN); | ||
} | ||
assert((compLclFrameSize % STACK_ALIGN) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this need to be written differently:
First if it isn't the final frame layout phase you have to add on the maximum pad of 12 bytes.
Then after that you have to align compLclFrameSize to a multiple of 16.
This can probably be unconditionally as well: (not inside the #else)
if (STACK_ALIGN > REGSIZE_BYTES)
{
if (lvaDoneFrameLayout != FINAL_FRAME_LAYOUT)
{
// If we are not doing final layout, we don't know the exact value of compLclFrameSize
// and thus do not know how much we will need to add in order to be aligned.
// We add the maximum pad that we could ever have (which is 12)
lvaIncrementFrameSize(STACK_ALIGN - REGSIZE_BYTES);
}
// The stack must always be 16 byte aligned.
if ((compLclFrameSize % STACK_ALIGN) != 0)
{
lvaIncrementFrameSize(STACK_ALIGN - (compLclFrameSize % STACK_ALIGN));
}
}
@jkotas , thank you for the comment!
I've checked with one unit test case that failed when alignment, but I think it's better to check this in the
Yes, I also think so. I've done a unit test compare with 4 byte alignment in |
With 967a45b patch, I've done a small test with
If this is somewhat incorrect, please let me know. I'll start over again. This method is actually
As there is three And then the when entering the second JITted code which is
and as there is no two pushes compared to
It seems that pushing two registers somehow gave an 16 byte alignment but not for the second JIT. |
I need to update again to apply |
I think I've found what I was thinking: |
Some changes are from |
@janvorli , could you please help me with build break on Windows? I cann't find the reason. Line 3175 is empty but says expanded from the macro.
|
src/jit/codegenxarch.cpp
Outdated
#endif // FEATURE_PUT_STRUCT_ARG_STK | ||
|
||
{ | ||
stackArgBytes += genTypeSize(genActualType(arg->TypeGet())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a simple fix for if ~ else code block that is surrounded with #if
directive. Bigger block is surrounded with _TARGET_X86_ || FEATURE_UNIX_AMD64_STRUCT_PASSING
so no effect other arch/defines.
I've found the reason. Line number was I needed to rebase and |
@dotnet-bot test Ubuntu x64 Checked Build and Test please |
@briansull can you please take a look at the last iteration of the JIT changes? |
@janvorli , Thank you! |
src/jit/codegencommon.cpp
Outdated
(compiler->compTailCallUsed ? 4 : 0); // CORINFO_HELP_TAILCALL args | ||
|
||
noway_assert(getEmitter()->emitMaxStackDepth <= accStackDepth); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#endif // EMIT_TRACK_STACK_DEPTH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
src/jit/codegencommon.cpp
Outdated
genTypeStSz(TYP_LONG) + // longs/doubles may be transferred via stack, etc | ||
#if defined(UNIX_X86_ABI) | ||
(genTypeStSz(TYP_INT) * 3) + // stack align for x86 0 to 3 padding | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that we just write this as two statements, with the second statement being:
#if defined(UNIX_X86_ABI)
accStackDepth += (genTypeStSz(TYP_INT) * 3); // stack align for x86 - allow up to 3 INT's for padding
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thank you
@dotnet-bot test OSX x64 Checked Build and Test please |
src/jit/target.h
Outdated
#define STACK_ALIGN 16 // stack alignment requirement | ||
#define STACK_ALIGN_SHIFT 4 // Shift-right amount to convert stack size in bytes to size in DWORD_PTRs | ||
#define STACK_ALIGN_SHIFT_ALL 4 // Shift-right amount to convert stack size in bytes to size in STACK_ALIGN units | ||
#define STACK_ALIGN_PADDING 16 // Shift-right amount for padding and reset for offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct" A shift by 16 bits? This seems incorrect to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High 16 bit is amount of padding, "rest", not "reset" - seems typo, low 16 bit is argument offset.
And STACK_ALIGN_STKOFFSET
should be ((1<<STACK_ALIGN_PADDING)-1)
to get lower 16 bits.
@briansull, could you please look again? |
src/jit/target.h
Outdated
#define STACK_ALIGN_SHIFT 4 // Shift-right amount to convert stack size in bytes to size in DWORD_PTRs | ||
#define STACK_ALIGN_SHIFT_ALL 4 // Shift-right amount to convert stack size in bytes to size in STACK_ALIGN units | ||
#define STACK_ALIGN_PADDING 16 // Shift-right amount for padding and rest for offset | ||
#define STACK_ALIGN_STKOFFSET ((1<<STACK_ALIGN_PADDING)-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#define STACK_ALIGN_PADDING 16 // Shift-right amount for padding and rest for offset
I don't see how shifting by 16 bits can possibly be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see what you are doing here.
You are packing two values into an "int" with the second value encodes in the upper 16 bits.
@@ -4635,10 +4635,22 @@ struct GenTreePutArgStk : public GenTreeUnOp | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead on encoding this in the upper 16 bits
It seems better to me to add a new field on line 4549:
#if defined(UNIX_X86_ABI)
unsigned gtPadAlign; // num of padding slots for stack alignment
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thank you
src/jit/lower.cpp
Outdated
@@ -933,15 +933,20 @@ GenTreePtr Lowering::NewPutArg(GenTreeCall* call, GenTreePtr arg, fgArgTabEntryP | |||
|
|||
PUT_STRUCT_ARG_STK_ONLY(assert(info->isStruct == varTypeIsStruct(type))); // Make sure state is | |||
// correct | |||
#if defined(UNIX_X86_ABI) | |||
int slotNum = info->slotNum + (info->padStkAlign << STACK_ALIGN_PADDING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the shift by 16 as the way to encode this information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll use new setArgPadding
to pass info->padStkAlign
src/jit/gentree.h
Outdated
{ | ||
return gtSlotNum >> STACK_ALIGN_PADDING; | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needed is setArgPadding
These can get/set the value of a new field gtPadAlign
Also @BruceForstall TAL |
src/jit/target.h
Outdated
#define STACK_ALIGN_SHIFT_ALL 4 // Shift-right amount to convert stack size in bytes to size in STACK_ALIGN units | ||
#define STACK_ALIGN_PADDING 16 // Shift-right amount for padding and rest for offset | ||
#define STACK_ALIGN_STKOFFSET ((1<<STACK_ALIGN_PADDING)-1) | ||
#endif // !FEATURE_PAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need a much better comment here to describe STACK_ALIGN_PADDING
and STACK_ALIGN_STKOFFSET
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new variable can remove STACK_ALIGN_STKOFFSET
definition
src/jit/compiler.h
Outdated
@@ -1262,6 +1265,9 @@ class fgArgInfo | |||
unsigned argCount; // Updatable arg count value | |||
unsigned nextSlotNum; // Updatable slot count value | |||
unsigned stkLevel; // Stack depth when we make this call (for x86) | |||
#if defined(UNIX_X86_ABI) | |||
unsigned padStkAlign; // Count of padding for stack alignment | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment could be improved: Is this bytes of padding? Where is the padding? Is this a sum of all the padding for each individual argument (since fgArgTabEntryPtr also has a padding field), or does it represent some pre-/post- argument padding?
src/jit/morph.cpp
Outdated
#if defined(UNIX_X86_ABI) | ||
void fgArgInfo::ArgsAlignPadding() | ||
{ | ||
// To get the padding amount, sum up all the slots and get the remainer for padding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remainer [](start = 67, length = 8)
typo: remainer => remainder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you :)
src/jit/morph.cpp
Outdated
if (firstArgTabEntry != nullptr) | ||
{ | ||
firstArgTabEntry->padStkAlign = (numSlotsAligned - (numSlots % numSlotsAligned)) % numSlotsAligned; | ||
this->padStkAlign = firstArgTabEntry->padStkAlign; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this calculation. Can you explain? What are you trying to compute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get remainder of slots to align by, multiple of, 4(for x86/linux). For example, if numSlots is 7, we need one more to make it 8, which will make . So, 4 - (7 % 4) = 1. It seems % numSlotsAligned
at the end isn't needed. What was I thinking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use AlignmentPad(numSlots, numSlotsAligned)
In reply to: 99253550 [](ancestors = 99253550)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When numSlots is 8, it would be 4 - (8 % 4) = 4. If this happens, we don't need to add as it is already aligned. So we need the last % numSlotsAligned
src/jit/target.h
Outdated
#define STACK_ALIGN_SHIFT_ALL 4 // Shift-right amount to convert stack size in bytes to size in STACK_ALIGN units | ||
#define STACK_ALIGN_PADDING 16 // Shift-right amount for padding and rest for offset | ||
#define STACK_ALIGN_STKOFFSET ((1<<STACK_ALIGN_PADDING)-1) | ||
#endif // !FEATURE_PAL | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on the #endif
is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
src/jit/codegencommon.cpp
Outdated
genTypeStSz(TYP_LONG) + // longs/doubles may be transferred via stack, etc | ||
(compiler->compTailCallUsed ? 4 : 0))); // CORINFO_HELP_TAILCALL args | ||
{ | ||
unsigned accStackDepth = compiler->fgPtrArgCntMax + // Max number of pointer-sized stack arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accStackDepth [](start = 17, length = 13)
I don't know what the prefix "acc" means. Can you spell it out? Maybe, maxAllowedStackDepth
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of "accumulation" but I'll go with maxAllowedStackDepth
{ | ||
inst_IV(INS_push_hide, 0); // --- push REG_SIZE bytes of 0 | ||
} | ||
// Note that the stack must always be aligned to STACK_ALIGN bytes | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was from @briansull :)
src/jit/lclvars.cpp
Outdated
} | ||
|
||
// The stack must always be 16 byte aligned. | ||
int adjustFrameSize = compLclFrameSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only true for unix/x86 case, so either state that, or move the comment within the #if
that follows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
I think I finally understand your code (which perhaps should be commented somewhere). You add a It seems like this should work. I wonder if a "cleaner" implementation would be to create a new GT_PUTARG_ALIGN node that contains the alignment, instead of overloading the GT_PUTARG_STK node and storing a possibly unused value in all the arg node call info structs. Anyway, I wouldn't worry about that possibility, since it has its own drawbacks. |
OK,
It would be more risky for me for now and I was thinking enable |
@briansull , @BruceForstall , is there anything else needed? |
src/jit/morph.cpp
Outdated
// Set stack align pad for the first argument | ||
// Padding value will be 3 to 1 when numSlots are from 1 to 3 or 5 to 7. | ||
// we need extra '% numSlotsAligned' at the end for numSlots be multiple of 4 | ||
firstArgTabEntry->padStkAlign = (numSlotsAligned - (numSlots % numSlotsAligned)) % numSlotsAligned; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
firstArgTabEntry->padStkAlign = AlignmentPad(numSlots, numSlotsAligned);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
src/jit/morph.cpp
Outdated
* making a "Call". After the Call, stack is re-adjusted to the value it | ||
* was with fgArgInfo->padStkAlign value as we cann't use the one in | ||
* fgArgTabEntry. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use standard function header comment, using //
style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sure
A couple nits, but otherwise LGTM |
Change JIT code to align stack in 16 byte used in modern compiler
@dotnet-bot test Ubuntu x64 Checked Build and Test please |
I can't decode any real failure: https://ci.dot.net/job/dotnet_coreclr/job/master/job/checked_osx_flow_prtest/2581/ try again. |
@BruceForstall , thank you :) |
[x86/Linux] Stack align 16 bytes for JIT code Commit migrated from dotnet/coreclr@b05cf50
Change JIT code to align stack in 16 byte used in modern compiler