-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLD][ELF][AArch64] Add BTI Aware long branch thunks #108989
Conversation
When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] ARM-software/abi-aa#196
@llvm/pr-subscribers-lld @llvm/pr-subscribers-lld-elf Author: Peter Smith (smithp35) ChangesWhen Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 Patch is 35.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108989.diff 7 Files Affected:
diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index 36880bf67e9f23..d56f4b4ae5136a 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -28,6 +28,38 @@ uint64_t elf::getAArch64Page(uint64_t expr) {
return expr & ~static_cast<uint64_t>(0xFFF);
}
+// A BTI landing pad is a valid target for an indirect branch
+// when the Branch Target Identification has been enabled.
+// As linker generated branches are via x16 the
+// BTI landing pads are defined as:
+// BTI C, BTI J, BTI JC, PACIASP, PACIBSP.
+bool elf::isAArch64BTILandingPad(Symbol &s, int64_t a) {
+ // PLT entries accessed indirectly have a BTI c.
+ if (s.isInPlt())
+ return true;
+ Defined *d = dyn_cast_or_null<Defined>(&s);
+ if (d == nullptr || d->section == nullptr ||
+ d->section->kind() != InputSectionBase::Regular)
+ // All places that we cannot disassemble are responsible for making
+ // the target a BTI landing pad.
+ return true;
+ InputSection *isec = cast<InputSection>(d->section);
+ int64_t off = d->value + a;
+ // Likely user error, but protect ourselves against out of bounds
+ // access.
+ if (off < 0 || static_cast<uint64_t>(off) >= isec->getSize())
+ return true;
+ const uint8_t *buf = isec->content().begin();
+ const uint32_t instr = read32le(buf + off);
+ if (instr == 0xd503233f || // PACIASP.
+ instr == 0xd503237f || // PACIBSP.
+ instr == 0xd503245f || // BTI C.
+ instr == 0xd503249f || // BTI J.
+ instr == 0xd50324df) // BTI JC.
+ return true;
+ return false;
+}
+
namespace {
class AArch64 : public TargetInfo {
public:
diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp
index 6c07051a231537..b7f714bc2158b7 100644
--- a/lld/ELF/Relocations.cpp
+++ b/lld/ELF/Relocations.cpp
@@ -2243,6 +2243,14 @@ std::pair<Thunk *, bool> ThunkCreator::getThunk(InputSection *isec,
return std::make_pair(t, true);
}
+std::pair<Thunk *, bool> ThunkCreator::getSyntheticLandingPad(Defined &d,
+ int64_t a) {
+ Thunk *lp = landingPadsBySectionAndAddend[{{d.section, d.value}, a}];
+ if (lp)
+ return std::make_pair(lp, false);
+ return std::make_pair(addLandingPadThunk(d, a), true);
+}
+
// Return true if the relocation target is an in range Thunk.
// Return false if the relocation is not to a Thunk. If the relocation target
// was originally to a Thunk, but is no longer in range we revert the
@@ -2326,6 +2334,21 @@ bool ThunkCreator::createThunks(uint32_t pass,
ts = getISDThunkSec(os, isec, isd, rel, src);
ts->addThunk(t);
thunks[t->getThunkTargetSym()] = t;
+
+ // When indirect branches are restricted, such as AArch64 BTI
+ // Thunks may need to target a linker generated landing pad
+ // instead of the target.
+ if (t->needsSyntheticLandingPad()) {
+ Thunk *lpt;
+ auto &dr = cast<Defined>(t->destination);
+ std::tie(lpt, isNew) = getSyntheticLandingPad(dr, t->addend);
+ if (isNew) {
+ InputSection *targetsec = dyn_cast<InputSection>(dr.section);
+ ts = getISThunkSec(targetsec);
+ ts->addThunk(lpt);
+ }
+ t->landingPad = lpt->getThunkTargetSym();
+ }
}
// Redirect relocation to Thunk, we never go via the PLT to a Thunk
diff --git a/lld/ELF/Relocations.h b/lld/ELF/Relocations.h
index aaa4581490a285..2e53962d800fc1 100644
--- a/lld/ELF/Relocations.h
+++ b/lld/ELF/Relocations.h
@@ -16,6 +16,7 @@
#include <vector>
namespace lld::elf {
+class Defined;
class Symbol;
class InputSection;
class InputSectionBase;
@@ -173,6 +174,8 @@ class ThunkCreator {
std::pair<Thunk *, bool> getThunk(InputSection *isec, Relocation &rel,
uint64_t src);
+ std::pair<Thunk *, bool> getSyntheticLandingPad(Defined &d, int64_t a);
+
ThunkSection *addThunkSection(OutputSection *os, InputSectionDescription *,
uint64_t off);
@@ -200,6 +203,14 @@ class ThunkCreator {
// The Mips LA25 Thunk is an example of an inline ThunkSection.
llvm::DenseMap<InputSection *, ThunkSection *> thunkedSections;
+ // Record landing pads, generated for a section + offset destination.
+ // Landling pads are alternative entry points for destinations that need
+ // to be reached via thunks that use indirect branches. A destination
+ // needs at most one landing pad as that can be reused by all callers.
+ llvm::DenseMap<std::pair<std::pair<SectionBase *, uint64_t>, int64_t>,
+ Thunk *>
+ landingPadsBySectionAndAddend;
+
// The number of completed passes of createThunks this permits us
// to do one time initialization on Pass 0 and put a limit on the
// number of times it can be called to prevent infinite loops.
diff --git a/lld/ELF/Target.h b/lld/ELF/Target.h
index 9894fb32c503c3..3142a3ec9959d9 100644
--- a/lld/ELF/Target.h
+++ b/lld/ELF/Target.h
@@ -230,6 +230,7 @@ void writePrefixedInstruction(uint8_t *loc, uint64_t insn);
void addPPC64SaveRestore();
uint64_t getPPC64TocBase();
uint64_t getAArch64Page(uint64_t expr);
+bool isAArch64BTILandingPad(Symbol &s, int64_t a);
template <typename ELFT> void writeARMCmseImportLib();
uint64_t getLoongArchPageDelta(uint64_t dest, uint64_t pc, RelType type);
void riscvFinalizeRelax(int passes);
diff --git a/lld/ELF/Thunks.cpp b/lld/ELF/Thunks.cpp
index 349e31332f59aa..599e053c13a6e5 100644
--- a/lld/ELF/Thunks.cpp
+++ b/lld/ELF/Thunks.cpp
@@ -51,12 +51,20 @@ namespace {
// distance from the thunk to the target is less than 128MB. Long thunks can
// branch to any virtual address and they are implemented in the derived
// classes. This class tries to create a short thunk if the target is in range,
-// otherwise it creates a long thunk.
+// otherwise it creates a long thunk. When BTI is enabled indirect branches
+// must land on a BTI instruction. If the destination does not have a BTI
+// instruction mayNeedLandingPad is set to true and Thunk::landingPad points
+// to an alternative entry point with a BTI.
class AArch64Thunk : public Thunk {
public:
- AArch64Thunk(Symbol &dest, int64_t addend) : Thunk(dest, addend) {}
+ AArch64Thunk(Symbol &dest, int64_t addend, bool mayNeedLandingPad)
+ : Thunk(dest, addend), mayNeedLandingPad(mayNeedLandingPad) {}
bool getMayUseShortThunk();
void writeTo(uint8_t *buf) override;
+ bool needsSyntheticLandingPad() override;
+
+protected:
+ bool mayNeedLandingPad;
private:
bool mayUseShortThunk = true;
@@ -66,8 +74,8 @@ class AArch64Thunk : public Thunk {
// AArch64 long range Thunks.
class AArch64ABSLongThunk final : public AArch64Thunk {
public:
- AArch64ABSLongThunk(Symbol &dest, int64_t addend)
- : AArch64Thunk(dest, addend) {}
+ AArch64ABSLongThunk(Symbol &dest, int64_t addend, bool mayNeedLandingPad)
+ : AArch64Thunk(dest, addend, mayNeedLandingPad) {}
uint32_t size() override { return getMayUseShortThunk() ? 4 : 16; }
void addSymbols(ThunkSection &isec) override;
@@ -77,7 +85,8 @@ class AArch64ABSLongThunk final : public AArch64Thunk {
class AArch64ADRPThunk final : public AArch64Thunk {
public:
- AArch64ADRPThunk(Symbol &dest, int64_t addend) : AArch64Thunk(dest, addend) {}
+ AArch64ADRPThunk(Symbol &dest, int64_t addend, bool mayNeedLandingPad)
+ : AArch64Thunk(dest, addend, mayNeedLandingPad) {}
uint32_t size() override { return getMayUseShortThunk() ? 4 : 12; }
void addSymbols(ThunkSection &isec) override;
@@ -85,6 +94,27 @@ class AArch64ADRPThunk final : public AArch64Thunk {
void writeLong(uint8_t *buf) override;
};
+// AArch64 BTI Landing Pad
+// When BTI is enabled indirect branches must land on a BTI
+// compatible instruction. When the destination does not have a
+// BTI compatible instruction a Thunk doing an indirect branch
+// targets a Landing Pad Thunk that direct branches to the target.
+class AArch64BTILandingPadThunk final : public Thunk {
+public:
+ AArch64BTILandingPadThunk(Symbol &dest, int64_t addend)
+ : Thunk(dest, addend) {}
+
+ uint32_t size() override { return getMayUseShortThunk() ? 4 : 8; }
+ void addSymbols(ThunkSection &isec) override;
+ InputSection *getTargetInputSection() const override;
+ void writeTo(uint8_t *buf) override;
+
+private:
+ bool getMayUseShortThunk();
+ void writeLong(uint8_t *buf);
+ bool mayUseShortThunk = true;
+};
+
// Base class for ARM thunks.
//
// An ARM thunk may be either short or long. A short thunk is simply a branch
@@ -532,6 +562,12 @@ void AArch64Thunk::writeTo(uint8_t *buf) {
ctx.target->relocateNoSym(buf, R_AARCH64_CALL26, s - p);
}
+bool AArch64Thunk::needsSyntheticLandingPad() {
+ // Short Thunks use a direct branch, no synthetic landing pad
+ // required.
+ return mayNeedLandingPad && !getMayUseShortThunk();
+}
+
// AArch64 long range Thunks.
void AArch64ABSLongThunk::writeLong(uint8_t *buf) {
const uint8_t data[] = {
@@ -540,7 +576,11 @@ void AArch64ABSLongThunk::writeLong(uint8_t *buf) {
0x00, 0x00, 0x00, 0x00, // L0: .xword S
0x00, 0x00, 0x00, 0x00,
};
- uint64_t s = getAArch64ThunkDestVA(destination, addend);
+ // if mayNeedLandingPad is true then destination is an
+ // AArch64BTILandingPadThunk that defines landingPad.
+ assert(!mayNeedLandingPad || landingPad != nullptr);
+ uint64_t s = mayNeedLandingPad ? landingPad->getVA(0)
+ : getAArch64ThunkDestVA(destination, addend);
memcpy(buf, data, sizeof(data));
ctx.target->relocateNoSym(buf + 8, R_AARCH64_ABS64, s);
}
@@ -564,7 +604,11 @@ void AArch64ADRPThunk::writeLong(uint8_t *buf) {
0x10, 0x02, 0x00, 0x91, // add x16, x16, R_AARCH64_ADD_ABS_LO12_NC(Dest)
0x00, 0x02, 0x1f, 0xd6, // br x16
};
- uint64_t s = getAArch64ThunkDestVA(destination, addend);
+ // if mayNeedLandingPad is true then destination is an
+ // AArch64BTILandingPadThunk that defines landingPad.
+ assert(!mayNeedLandingPad || landingPad != nullptr);
+ uint64_t s = mayNeedLandingPad ? landingPad->getVA(0)
+ : getAArch64ThunkDestVA(destination, addend);
uint64_t p = getThunkTargetSym()->getVA();
memcpy(buf, data, sizeof(data));
ctx.target->relocateNoSym(buf, R_AARCH64_ADR_PREL_PG_HI21,
@@ -578,6 +622,48 @@ void AArch64ADRPThunk::addSymbols(ThunkSection &isec) {
addSymbol("$x", STT_NOTYPE, 0, isec);
}
+void AArch64BTILandingPadThunk::addSymbols(ThunkSection &isec) {
+ addSymbol(saver().save("__AArch64BTIThunk_" + destination.getName()),
+ STT_FUNC, 0, isec);
+ addSymbol("$x", STT_NOTYPE, 0, isec);
+}
+
+InputSection *AArch64BTILandingPadThunk::getTargetInputSection() const {
+ auto &dr = cast<Defined>(destination);
+ return dyn_cast<InputSection>(dr.section);
+}
+
+void AArch64BTILandingPadThunk::writeTo(uint8_t *buf) {
+ if (!getMayUseShortThunk()) {
+ writeLong(buf);
+ return;
+ }
+ write32(buf, 0xd503245f); // BTI c
+ // Control falls through to target in following section.
+}
+
+bool AArch64BTILandingPadThunk::getMayUseShortThunk() {
+ if (!mayUseShortThunk)
+ return false;
+ // If the target is the following instruction then
+ // we can fall through without the indirect branch.
+ uint64_t s = destination.getVA(addend);
+ uint64_t p = getThunkTargetSym()->getVA();
+ // <= 4 as Thunks start off with the same offset
+ // within the section as the destination, so
+ // s - p == 0 until addresses are assigned.
+ mayUseShortThunk = (s - p <= 4);
+ return mayUseShortThunk;
+}
+
+void AArch64BTILandingPadThunk::writeLong(uint8_t *buf) {
+ uint64_t s = destination.getVA(addend);
+ uint64_t p = getThunkTargetSym()->getVA() + 4;
+ write32(buf, 0xd503245f); // BTI c
+ write32(buf + 4, 0x14000000); // B S
+ ctx.target->relocateNoSym(buf + 4, R_AARCH64_CALL26, s - p);
+}
+
// ARM Target Thunks
static uint64_t getARMThunkDestVA(const Symbol &s) {
uint64_t v = s.isInPlt() ? s.getPltVA() : s.getVA();
@@ -1264,9 +1350,12 @@ static Thunk *addThunkAArch64(RelType type, Symbol &s, int64_t a) {
if (type != R_AARCH64_CALL26 && type != R_AARCH64_JUMP26 &&
type != R_AARCH64_PLT32)
fatal("unrecognized relocation type");
+ bool mayNeedLandingPad =
+ (config->andFeatures & GNU_PROPERTY_AARCH64_FEATURE_1_BTI) &&
+ !isAArch64BTILandingPad(s, a);
if (config->picThunk)
- return make<AArch64ADRPThunk>(s, a);
- return make<AArch64ABSLongThunk>(s, a);
+ return make<AArch64ADRPThunk>(s, a, mayNeedLandingPad);
+ return make<AArch64ABSLongThunk>(s, a, mayNeedLandingPad);
}
// Creates a thunk for long branches or Thumb-ARM interworking.
@@ -1480,3 +1569,12 @@ Thunk *elf::addThunk(const InputSection &isec, Relocation &rel) {
llvm_unreachable("add Thunk only supported for ARM, AVR, Mips and PowerPC");
}
}
+
+Thunk *elf::addLandingPadThunk(Symbol &s, int64_t a) {
+ switch (config->emachine) {
+ case EM_AARCH64:
+ return make<AArch64BTILandingPadThunk>(s, a);
+ default:
+ llvm_unreachable("add landing pad only supported for AArch64");
+ }
+}
diff --git a/lld/ELF/Thunks.h b/lld/ELF/Thunks.h
index 12ddf08cadc090..4d797434aa70d0 100644
--- a/lld/ELF/Thunks.h
+++ b/lld/ELF/Thunks.h
@@ -54,10 +54,17 @@ class Thunk {
return true;
}
+ // Thunks that indirectly branch to targets may need a synthetic landing
+ // pad generated close to the target. For example AArch64 when BTI is
+ // enabled.
+ virtual bool needsSyntheticLandingPad() { return false; }
+
Defined *getThunkTargetSym() const { return syms[0]; }
Symbol &destination;
int64_t addend;
+ // Alternative target when indirect branch to destination can't be used.
+ Symbol *landingPad = nullptr;
llvm::SmallVector<Defined *, 3> syms;
uint64_t offset = 0;
// The alignment requirement for this Thunk, defaults to the size of the
@@ -68,7 +75,9 @@ class Thunk {
// For a Relocation to symbol S create a Thunk to be added to a synthetic
// ThunkSection.
Thunk *addThunk(const InputSection &isec, Relocation &rel);
-
+// Create a landing pad Thunk for use when indirect branches from Thunks
+// are restricted.
+Thunk *addLandingPadThunk(Symbol &s, int64_t a);
void writePPC32PltCallStub(uint8_t *buf, uint64_t gotPltVA,
const InputFile *file, int64_t addend);
void writePPC64LoadAndBranch(uint8_t *buf, int64_t offset);
diff --git a/lld/test/ELF/aarch64-thunk-bti.s b/lld/test/ELF/aarch64-thunk-bti.s
new file mode 100644
index 00000000000000..f23c7c1c789b09
--- /dev/null
+++ b/lld/test/ELF/aarch64-thunk-bti.s
@@ -0,0 +1,513 @@
+// REQUIRES: aarch64
+// RUN: split-file %s %t
+// RUN: llvm-mc -filetype=obj -triple=aarch64 %t/asm -o %t.o
+// RUN: ld.lld --shared --script=%t/lds %t.o -o %t.so --defsym absolute=0xf0000000
+// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x10000000 --stop-address=0x10000028 %t.so | FileCheck --check-prefix=CHECK-SO %s
+// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x17fc0028 --stop-address=0x17fc0050 %t.so | FileCheck --check-prefix=CHECK-SO2 %s
+// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x18000050 --stop-address=0x180000a8 %t.so | FileCheck --check-prefix=CHECK-BTI %s
+// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x180000b0 --stop-address=0x18000100 %t.so | FileCheck --check-prefix=CHECK-SO3 %s
+// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x30000000 --stop-address=0x300000c0 %t.so | FileCheck --check-prefix=CHECK-SO4 %s
+// RUN: rm %t.so
+// RUN: llvm-mc -filetype=obj -triple=aarch64 %t/shared -o %tshared.o
+// RUN: ld.lld --shared -o %tshared.so %tshared.o
+// RUN: ld.lld %tshared.so --script=%t/lds %t.o -o %t.exe --defsym absolute=0xf0000000
+// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x10000000 --stop-address=0x10000028 %t.exe | FileCheck --check-prefix=CHECK-EXE %s
+// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x17fc0028 --stop-address=0x17fc0050 %t.exe | FileCheck --check-prefix=CHECK-EXE2 %s
+// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x18000050 --stop-address=0x180000a8 %t.exe | FileCheck --check-prefix=CHECK-BTI %s
+// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x180000b0 --stop-address=0x180000e8 %t.exe | FileCheck --check-prefix=CHECK-EXE3 %s
+// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x30000000 --stop-address=0x300000ec %t.exe | FileCheck --check-prefix=CHECK-EXE4 %s
+// RUN: rm %t.o %tshared.o %tshared.so
+
+/// Test thunk generation when destination does not have
+/// a BTI compatible landing pad. Linker must generate
+/// landing pad sections for thunks that use indirect
+/// branches.
+
+//--- asm
+.section ".note.gnu.property", "a"
+.p2align 3
+.long 4
+.long 0x10
+.long 0x5
+.asciz "GNU"
+
+/// Enable BTI.
+.long 0xc0000000 // GNU_PROPERTY_AARCH64_FEATURE_1_AND.
+.long 4
+.long 1 // GNU_PROPERTY_AARCH64_FEATURE_1_BTI.
+.long 0
+
+
+/// Short thunks are direct branches so we don't
+/// need landing pads. Expect all thunks to branch
+/// to target.
+.section .text.0, "ax", %progbits
+.balign 0x1000
+.global _start
+.type _start, %function
+_start:
+ bl bti_c_target
+ bl bti_j_target
+ bl bti_jc_target
+ bl paciasp_target
+ bl pacibsp_target
+ bl fn1
+ bl fn2
+ bl fn3
+ bl fn4
+ bl via_plt
+
+// CHECK-SO-LABEL: <_start>:
+// CHECK-SO-NEXT: 10000000: bl 0x17fc0028 <__AArch64ADRPThunk_>
+// CHECK-SO-NEXT: bl 0x17fc002c <__AArch64ADRPThunk_>
+// CHECK-SO-NEXT: bl 0x17fc0030 <__AArch64ADRPThunk_>
+// CHECK-SO-NEXT: bl 0x17fc0034 <__AArch64ADRPThunk_>
+// CHECK-SO-NEXT: bl 0x17fc0038 <__AArch64ADRPThunk_>
+// CHECK-SO-NEXT: bl 0x17fc003c <__AArch64ADRPThunk_>
+// CHECK-SO-NEXT: bl 0x17fc0040 <__AArch64ADRPThunk_>
+// CHECK-SO-NEXT: bl 0x17fc0044 <__AArch64ADRPThunk_>
+// CHECK-SO-NEXT: bl 0x17fc0048 <__AArch64ADRPThunk_>
+// CHECK-SO-NEXT: bl 0x17fc004c <__AArch64ADRPThunk_via_plt>
+
+// CHECK-SO2: <__AArch64ADRPThunk_>:
+// CHECK-SO2-NEXT:17fc0028: b 0x18000050 <bti_c_target>
+// CHECK-SO2-EMPTY:
+// CHECK-SO2-NEXT: <__AArch64ADRPThunk_>:
+// CHECK-SO2-NEXT: b 0x18000058 <bti_j_target>
+// CHECK-SO2-EMPTY:
+// CHECK-SO2-NEXT: <__AArch64ADRPThunk_>:
+// CHECK-SO2-NEXT: b 0x18000060 <bti_jc_target>
+// CHECK-SO2-EMPTY:
+// CHECK-SO2-NEXT: <__AArch64ADRPThunk_>:
+// CHECK-SO2-NEXT: b 0x18000068 <paciasp_target>
+// CHECK-SO2-EMPTY:
+// CHECK-SO2-NEXT: <__AArch64ADRPThunk_>:
+// CHECK-SO2-NEXT: b 0x18000070 <pacibsp_target>
+// CHECK-SO2-EMPTY:
+// CHECK-SO2-NEXT: <__AArch64ADRPThunk_>:
+// CHECK-SO2-NEXT: b 0x18000088 <fn1>
+// CHECK-SO2-EMPTY:
+// CHECK-SO2-NEXT: <__AArch64ADRPThunk_>:
+// CHECK-SO2-NEXT: b 0x1800008c <fn2>
+// CHECK-SO2-EMPTY:
+// CHECK-SO2-NEXT: <__AArch64ADRPThunk_>:
+// CHECK-SO2-NEXT: b 0x18000094 <fn3>
+// CHECK-SO2-EMPTY:
+// CHECK-SO2-NEXT: <__AArch64ADRPThunk_>:
+// CHECK-SO2-NEXT: b 0x180000a0 <fn4>
+// CHECK-SO2-EMPTY:
+// CHECK-SO2-NEXT: <__AArch64ADRPThunk_via_plt>:
+// CHECK-SO2-NEXT: b 0x180000d0 <fn4+0x30>
+
+// CHECK-EXE-LABEL: <_start>:
+// CHECK-EXE-NEXT: 10000000: bl 0x17fc0028 <__AArch64AbsLongThunk_>
+// CHECK-EXE-NEXT: bl 0x17fc002c <__AArch64AbsLongThunk_>
+// CHECK-EXE-NEXT: bl 0x17fc0030 <__AArch64AbsLongThunk_>
+// CHECK-EXE-NEXT: bl 0x17fc0034 <__AArch64AbsLongThunk_>
+// CHECK-EXE-NEXT: bl 0x17fc0038 <__AArch64AbsLongThunk_>
+// CHECK-EXE-NEXT: bl 0x17fc003c <__AArch64AbsLongThunk_>
+// CHECK-EXE-NEXT: bl 0x17fc0040 <__AArch64AbsLongThunk_>
+// CHECK-EXE-NEXT: bl 0x17fc0044 <__AArch64AbsLongThunk_>
+// CHECK-EXE-NEXT: bl 0x17fc0048 <__AArch64AbsLongThunk_>
+// CHECK-EXE-NEXT: bl 0x17fc004c <__AArch64AbsLongThunk_via_plt>
+
+// CHECK-EXE2: <__AArch64AbsLongThunk_>:
+// CHECK-EXE2-NEXT: 17fc0028: b 0x18000050 <bti_c_target>
+// CH...
[truncated]
|
Hmm, the tests are passing on my local linux machine, not sure what has happened with Windows. Will need to try and find a Windows machine and investigate. [UPDATE] It looks like it is due to the positioning of the .dynstr section, the test's .dynamic section has a DT_NEEDED which has a file path in it, this will affect the offset between the .plt and .got.plt. I'll update the test to be more robust. |
The .dynstr is in between the .plt and the .got.plt. If this contains a DT_NEEDED tag then the pathname of the shared object will be used. This path is machine dependent. Don't hard code the offsets in the .plt in this case. We are only disassembling the .plt to show that there is a BTI at the start of it.
Thanks for the patch. I am back from vacation and will check this shortly. FYI: I plan to replace |
If an output DSO is used to link another component, we do This is error-prone, which is one of the reasons I prefer The initial |
lld/test/ELF/aarch64-thunk-bti.s
Outdated
// RUN: split-file %s %t | ||
// RUN: llvm-mc -filetype=obj -triple=aarch64 %t/asm -o %t.o | ||
// RUN: ld.lld --shared --script=%t/lds %t.o -o %t.so --defsym absolute=0xf0000000 | ||
// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x10000000 --stop-address=0x10000028 %t.so | FileCheck --check-prefix=CHECK-SO %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--print-imm-hex
may be unnecessary.
#109553 might be useful to allow removing some llvm-objdump commands, making maintenance more convenient.
llvm/tools/llvm-objdump/llvm-objdump.cpp:countSkippableZeroBytes
needs to spend time to skip zeroes, so the performance is not as good as multiple llvm-objdump invocations though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rewritten the test using OutputSections to need much fewer zeroes.
lld/test/ELF/aarch64-thunk-bti.s
Outdated
high PT_LOAD FLAGS(0x1 | 0x4); | ||
} | ||
SECTIONS { | ||
.text_low 0x10000000 : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unintended 4-column when PHDRS
uses 2-column indentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. I've made the indentation consistent.
lld/test/ELF/aarch64-thunk-bti.s
Outdated
/// alignment requirement. Check that we don't fall through into | ||
/// alignment padding. | ||
.section .text.9, "ax", %progbits | ||
.balign 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the output DSO, fn4
would be placed at 0x180000b0 even without .balign 16
.
Perhaps we can remove fn4
and add a suitable alignment to fn1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test I've made sure that we don't end up on an alignment boundary for this case.
lld/test/ELF/aarch64-thunk-bti.s
Outdated
// CHECK-BTI-NEXT: ret | ||
|
||
/// PLT has bti c. | ||
// CHECK-SO3: 180000b0: bti c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What triggers the creation of this piece of code at 0x180000b0? I don't find any branches to this address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's PLT[0] which isn't particularly useful so I've taken it out.
lld/test/ELF/aarch64-thunk-bti.s
Outdated
// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x30000000 --stop-address=0x300000ec %t.exe | FileCheck --check-prefix=CHECK-EXE4 %s | ||
// RUN: rm %t.o %tshared.o %tshared.so %t.exe | ||
|
||
/// Test thunk generation when destination does not have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments seem to wrap around column 56, much smaller than the typical 80 (though many tests essentially wrap at a much larger column as we don't rigidly require 80).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reformatted all the comments.
lld/test/ELF/aarch64-thunk-bti.s
Outdated
// place a pool that can short thunk | ||
// to targets. | ||
.section .text.1, "ax", %progbits | ||
.space 0x2000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.space 0x2000000
=> 32MiB in the relocatable file.
We can use multiple output sections with the output section address feature to reduce file sizes, make llvm-objdump faster, and potentially reduce llvm-objdump --start-address= invocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, I've used OutputSections and the output is much smaller now.
lld/test/ELF/aarch64-thunk-bti.s
Outdated
.section .long_calls, "ax", %progbits | ||
.global long_calls | ||
.type long_calls, %function | ||
/// Expect thunk to target as targets have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These commands have a wrap column at about 45.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. Reformatted all comments in file.
lld/test/ELF/aarch64-thunk-bti.s
Outdated
// RUN: llvm-mc -filetype=obj -triple=aarch64 %t/shared -o %tshared.o | ||
// RUN: ld.lld --shared -o %tshared.so %tshared.o | ||
// RUN: ld.lld %tshared.so --script=%t/lds %t.o -o %t.exe --defsym absolute=0xf0000000 | ||
// RUN: llvm-objdump -d --no-show-raw-insn --print-imm-hex --start-address=0x10000000 --stop-address=0x10000028 %t.exe | FileCheck --check-prefix=CHECK-EXE %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -no-pie executable link perhaps provide coverage for abs thunks. Since many check strings are similar to pic thunks, some check prefixes can optionally be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a go at sharing more parts.
lld/test/ELF/aarch64-thunk-bti.s
Outdated
|
||
/// functions do not have BTI compatible landing pads. | ||
/// Expect linker generated landing pads. | ||
.section .text.7, "ax", %progbits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a large alignment for fn1
. bl fn2
can probably be changed to b .text.7+offset
to test thunks with a non-zero addend. (https://reviews.llvm.org/D70637)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a .text+offset case.
I've not added in the alignment here. I reordered fn1 and fn2 so that I could show that the fallthrough didn't happen with alignment, only to discover that the heuristic I was using for short thunks wasn't good enough. I've made a change to the heuristic, which I now need a test for.
lld/ELF/Arch/AArch64.cpp
Outdated
return true; | ||
Defined *d = dyn_cast_or_null<Defined>(&s); | ||
if (d == nullptr || d->section == nullptr || | ||
d->section->kind() != InputSectionBase::Regular) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!isa_and_nonnull<InputSection>(d->section)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
lld/ELF/Arch/AArch64.cpp
Outdated
// PLT entries accessed indirectly have a BTI c. | ||
if (s.isInPlt()) | ||
return true; | ||
Defined *d = dyn_cast_or_null<Defined>(&s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dyn_cast
since &s
cannot be nullptr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
lld/ELF/Arch/AArch64.cpp
Outdated
// the target a BTI landing pad. | ||
return true; | ||
InputSection *isec = cast<InputSection>(d->section); | ||
int64_t off = d->value + a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sum has a type of uint64_t
. Using uint64_t
can avoid off < 0
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
lld/ELF/Arch/AArch64.cpp
Outdated
return true; | ||
const uint8_t *buf = isec->content().begin(); | ||
const uint32_t instr = read32le(buf + off); | ||
if (instr == 0xd503233f || // PACIASP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return instr == ... || ...
or perhaps return is_contained(/*PACIASP*/0xd503233f, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. I've also used the common parts of the HINT encoding to avoid testing all combinations.
lld/ELF/Relocations.cpp
Outdated
@@ -2243,6 +2243,14 @@ std::pair<Thunk *, bool> ThunkCreator::getThunk(InputSection *isec, | |||
return std::make_pair(t, true); | |||
} | |||
|
|||
std::pair<Thunk *, bool> ThunkCreator::getSyntheticLandingPad(Defined &d, | |||
int64_t a) { | |||
Thunk *lp = landingPadsBySectionAndAddend[{{d.section, d.value}, a}]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer try_emplace
over operator[]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
lld/ELF/Relocations.cpp
Outdated
auto &dr = cast<Defined>(t->destination); | ||
std::tie(lpt, isNew) = getSyntheticLandingPad(dr, t->addend); | ||
if (isNew) { | ||
InputSection *targetsec = dyn_cast<InputSection>(dr.section); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that targetSec
could be nullptr in erroneous non-InputSection cases (MergeInputSection) and would cause a null pointer dereference in getISThunkSec
.
If we don't write code guarding against such cases, perhaps just use cast<>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used cast<>
The code in isAArch64BTILandingPad has a test
if (!isa_and_nonnull<InputSection>(d->section))
so t->needsSyntheticLandingPad()
should return false in those cases.
lld/ELF/Thunks.cpp
Outdated
bool AArch64BTILandingPadThunk::getMayUseShortThunk() { | ||
if (!mayUseShortThunk) | ||
return false; | ||
// If the target is the following instruction then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrap column is smaller than 80.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. I've reformatted.
lld/ELF/Thunks.cpp
Outdated
addSymbol("$x", STT_NOTYPE, 0, isec); | ||
} | ||
|
||
InputSection *AArch64BTILandingPadThunk::getTargetInputSection() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not needed.
Relocations.cpp calls ts = getISThunkSec(targetsec)
without checking getTargetInputSection
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is called in mergeCmp()
but I've removed it anyway as it is not a strict requirement that the landing pads are before the target section, they just need to be within direct branch range.
Thanks for the review. I'll work on an updated patch, hopefully this week. Please forgive the github fumbling. Last time I tried to update a PR with a merge conflict I tried rebasing and my pull-request ended up with thousands of commits. I'll post a comment when I've got an update. |
* Remove getTargetInputSection. * Simplify AArch64 BTI detection. * Make short thunk heuristic more robust when there is more than one landing pad thunk.
Use more Output sections to avoid needing close to 128 MiB of zeroes. Permits fewer llvm-objcopy instances as we don't need --start-address etc.
Thanks for your patience. I've reworked the test case to use a lot fewer zeroes. |
lld/ELF/Relocations.cpp
Outdated
{{d.section, d.value}, a}, nullptr); | ||
if (res.second) | ||
res.first->second = addLandingPadThunk(d, a); | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete stray ;
lld/ELF/Relocations.cpp
Outdated
@@ -2257,6 +2257,16 @@ std::pair<Thunk *, bool> ThunkCreator::getThunk(InputSection *isec, | |||
return std::make_pair(t, true); | |||
} | |||
|
|||
std::pair<Thunk *, bool> ThunkCreator::getSyntheticLandingPad(Defined &d, | |||
int64_t a) { | |||
auto res = landingPadsBySectionAndAddend.try_emplace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use auto [it, isNew]
to improve clarity
lld/ELF/Relocations.cpp
Outdated
auto &dr = cast<Defined>(t->destination); | ||
std::tie(lpt, isNew) = getSyntheticLandingPad(dr, t->addend); | ||
if (isNew) { | ||
InputSection *targetsec = cast<InputSection>(dr.section); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(targetsec
=> targetSec
) since the variable is only used once, just inline it.
lld/ELF/Relocations.h
Outdated
@@ -200,6 +203,14 @@ class ThunkCreator { | |||
// The Mips LA25 Thunk is an example of an inline ThunkSection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps update this comment to mention AArch64BTILandingPadThunk
lld/test/ELF/aarch64-thunk-bti.s
Outdated
@@ -0,0 +1,482 @@ | |||
// REQUIRES: aarch64 | |||
// RUN: rm -rf %t && split-file %s %t && cd %t | |||
// RUN: llvm-mc -filetype=obj -triple=aarch64 asm -o %t.o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rm -rf %t && split-file %s %t && cd %t
, %t
can be completely avoided.
For example, name an output file a.o
and output as out
instead %t.exe
asm
Name assembly files .s
. For example, llvm-mc ... a.s -o a.o
Sorry that Relocations.cpp Thunks.cpp need rebasing due to my |
Remove spurious semi-colon. Use structured bindings. Inline function call to remove variable. Improve comment about ThunkedSections to include landing pads. Remove uses of %t in test.
Thanks for the warning. I've applied the final set of review comments and rebased. I'll wait till tomorrow before merging. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
lld/ELF/Relocations.cpp
Outdated
{{d.section, d.value}, a}, nullptr); | ||
if (isNew) | ||
it->second = addLandingPadThunk(ctx, d, a); | ||
return std::make_pair(it->second, isNew); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{it->second, isNew}
is more idiomatic post C++11
lld/ELF/Arch/AArch64.cpp
Outdated
@@ -28,6 +28,39 @@ uint64_t elf::getAArch64Page(uint64_t expr) { | |||
return expr & ~static_cast<uint64_t>(0xFFF); | |||
} | |||
|
|||
// A BTI landing pad is a valid target for an indirect branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reflow the comment after adopting 80-column wrap?
lld/ELF/Thunks.cpp
Outdated
@@ -553,7 +588,11 @@ void AArch64ABSLongThunk::writeLong(uint8_t *buf) { | |||
0x00, 0x00, 0x00, 0x00, // L0: .xword S | |||
0x00, 0x00, 0x00, 0x00, | |||
}; | |||
uint64_t s = getAArch64ThunkDestVA(destination, addend); | |||
// if mayNeedLandingPad is true then destination is an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If
@@ -577,7 +616,11 @@ void AArch64ADRPThunk::writeLong(uint8_t *buf) { | |||
0x10, 0x02, 0x00, 0x91, // add x16, x16, R_AARCH64_ADD_ABS_LO12_NC(Dest) | |||
0x00, 0x02, 0x1f, 0xd6, // br x16 | |||
}; | |||
uint64_t s = getAArch64ThunkDestVA(destination, addend); | |||
// if mayNeedLandingPad is true then destination is an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If
lld/test/ELF/aarch64-thunk-bti.s
Outdated
// RUN: llvm-objdump -d --no-show-raw-insn out.so | FileCheck %s | ||
// RUN: llvm-objdump -d --no-show-raw-insn out.so | FileCheck %s --check-prefix=CHECK-PADS | ||
// RUN: llvm-mc -filetype=obj -triple=aarch64 shared -o shared.o | ||
// RUN: ld.lld --shared -o shared.so shared.o --soname=shared.so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--soname=
could be omitted as we are using relative paths
use {} rather than std::make_pair. reflow comments. remove --soname from test.
Thanks for the comments. I've applied those and merged. |
When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] ARM-software/abi-aa#196
We're hitting asserts after this. I've attached an lld reproducer tarball at https://crbug.com/377438309#comment8
Could you take a look? |
Thanks for the reproducer. I've downloaded it and reproduced. I'll take a look today. |
To give you a quick update on the cause. There is an optimisation that won't create a BTI landing pad for a location if the linker can use a direct branch in the Thunk. https://github.com/llvm/llvm-project/blob/main/lld/ELF/Thunks.cpp#L580 It looks like what is happening is:
I need to make a smaller test case, and to make the landing pad generation run each pass. Hopefully I can get this done Today, but just in case it can't, how urgent is this for you? Right now the hwasan checks
Do not have a BTI or compatible instruction, so these would cause a runtime failure if BTI was enabled and the linker did generate an indirect branch (and code-path went through it). There's a quick fix to disable the optimisation, but that would mean updating a rather large test, so I'd prefer to take the time to fix with the optimisation. |
Move Landing Pad Creation to a new function that checks each thunk every pass to see if it needs a landing pad. This permits a thunk to be created without needing a landing pad, but later needing one due to drifting out of direct branch range and requiring an indirect branch. We record all the Thunks created so far in a new vector rather than trying to iterate over the DenseMap as we need a deterministic order of adding LandingPadThunks due to the short branch fall through. We cannot use normalizeExistingThunk() either as that only iterates through live thunks. Fixes: https://crbug.com/377438309 Original PR: llvm#108989 Sending without a new test case to fix existing test. A new regression test will come in a separate PR as coming up with a small enough reproducer for this case is non-trivial.
I've created #116402 to fix the repro in https://crbug.com/377438309#comment8 I've not got a regression test yet as that will take some time to get one small enough. I hope to add one next week. |
Move Landing Pad Creation to a new function that checks each thunk every pass to see if it needs a landing pad. This permits a thunk to be created without needing a landing pad, but later needing one due to drifting out of direct branch range and requiring an indirect branch. We record all the Thunks created so far in a new vector rather than trying to iterate over the DenseMap as we need a deterministic order of adding LandingPadThunks due to the short branch fall through. We cannot use normalizeExistingThunk() either as that only iterates through live thunks. Fixes: https://crbug.com/377438309 Original PR: #108989 Sending without a new test case to fix existing test. A new regression test will come in a separate PR as coming up with a small enough reproducer for this case is non-trivial.
Thanks for the quick fix! |
When we create a thunk we don't know whether it will be short or long. Move the emission of the long thunk mapping symbol to when we transition to a long thunk. This improves disassembly and binary analysis as tools like BOLT identify thunks by disassembly. This removes a FIXME added in #108989 aarch64-thunk-bti-multipass.s which had a corrupt disassembly due to missing mapping symbols.
When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1].
This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI.
This patch implements the same mechansim as GNU ld. When the output ELF file is setting the
GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of:
BTI c
B
The B can be elided if the thunk can be placed so that control flow drops through. For example:
BTI c
:
This will be common when -ffunction-sections is used.
The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section.
There is some further optimization possible. Consider the case:
.text
fn1
...
fn2
...
If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
[2] ARM-software/abi-aa#196