Skip to content

Commit

Permalink
[CGData][MachineOutliner] Global Outlining (#90074)
Browse files Browse the repository at this point in the history
This commit introduces support for outlining functions across modules
using codegen data generated from previous codegen. The codegen data
currently manages the outlined hash tree, which records outlining
instances that occurred locally in the past.
    
The machine outliner now operates in one of three modes:

1. CGDataMode::None: This is the default outliner mode that uses the
suffix tree to identify (local) outlining candidates within a module.
This mode is also used by (full)LTO to maintain optimal behavior with
the combined module.
2. CGDataMode::Write (`-codegen-data-generate`): This mode is identical
to the default mode, but it also publishes the stable hash sequences of
instructions in the outlined functions into a local outlined hash tree.
It then encodes this into the `__llvm_outline` section, which will be
dead-stripped at link time.
3. CGDataMode::Read (`-codegen-data-use-path={.cgdata}`): This mode
reads a codegen data file (.cgdata) and initializes a global outlined
hash tree. This tree is used to generate global outlining candidates.
Note that the codegen data file has been post-processed with the raw
`__llvm_outline` sections from all native objects using the
`llvm-cgdata` tool (or a linker, `LLD`, or a new ThinLTO pipeline
later).

This depends on #105398. After
this PR, LLD (#90166) and Clang
(#90304) will follow for each
client side support.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
  • Loading branch information
kyulee-com authored Sep 10, 2024
1 parent 46a76c3 commit 0f52545
Show file tree
Hide file tree
Showing 16 changed files with 890 additions and 4 deletions.
6 changes: 6 additions & 0 deletions llvm/include/llvm/ADT/StableHashing.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,12 @@ inline stable_hash stable_hash_combine(stable_hash A, stable_hash B,
// Removes suffixes introduced by LLVM from the name to enhance stability and
// maintain closeness to the original name across different builds.
inline StringRef get_stable_name(StringRef Name) {
// Return the part after ".content." that represents contents.
auto [P0, S0] = Name.rsplit(".content.");
if (!S0.empty())
return S0;

// Ignore these suffixes.
auto [P1, S1] = Name.rsplit(".llvm.");
auto [P2, S2] = P1.rsplit(".__uniq.");
return P2;
Expand Down
40 changes: 38 additions & 2 deletions llvm/include/llvm/CodeGen/MachineOutliner.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "llvm/CodeGen/LiveRegUnits.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/MachineStableHash.h"
#include <initializer_list>

namespace llvm {
Expand Down Expand Up @@ -234,11 +235,11 @@ struct OutlinedFunction {
unsigned FrameConstructionID = 0;

/// Return the number of candidates for this \p OutlinedFunction.
unsigned getOccurrenceCount() const { return Candidates.size(); }
virtual unsigned getOccurrenceCount() const { return Candidates.size(); }

/// Return the number of bytes it would take to outline this
/// function.
unsigned getOutliningCost() const {
virtual unsigned getOutliningCost() const {
unsigned CallOverhead = 0;
for (const Candidate &C : Candidates)
CallOverhead += C.getCallOverhead();
Expand Down Expand Up @@ -272,7 +273,42 @@ struct OutlinedFunction {
}

OutlinedFunction() = delete;
virtual ~OutlinedFunction() = default;
};

/// The information necessary to create an outlined function that is matched
/// globally.
struct GlobalOutlinedFunction : public OutlinedFunction {
explicit GlobalOutlinedFunction(std::unique_ptr<OutlinedFunction> OF,
unsigned GlobalOccurrenceCount)
: OutlinedFunction(*OF), GlobalOccurrenceCount(GlobalOccurrenceCount) {}

unsigned GlobalOccurrenceCount;

/// Return the number of times that appear globally.
/// Global outlining candidate is uniquely created per each match, but this
/// might be erased out when it's overlapped with the previous outlining
/// instance.
unsigned getOccurrenceCount() const override {
assert(Candidates.size() <= 1);
return Candidates.empty() ? 0 : GlobalOccurrenceCount;
}

/// Return the outlining cost using the global occurrence count
/// with the same cost as the first (unique) candidate.
unsigned getOutliningCost() const override {
assert(Candidates.size() <= 1);
unsigned CallOverhead =
Candidates.empty()
? 0
: Candidates[0].getCallOverhead() * getOccurrenceCount();
return CallOverhead + SequenceSize + FrameOverhead;
}

GlobalOutlinedFunction() = delete;
~GlobalOutlinedFunction() = default;
};

} // namespace outliner
} // namespace llvm

Expand Down
26 changes: 25 additions & 1 deletion llvm/lib/CGData/CodeGenData.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,13 @@
using namespace llvm;
using namespace cgdata;

cl::opt<bool>
CodeGenDataGenerate("codegen-data-generate", cl::init(false), cl::Hidden,
cl::desc("Emit CodeGen Data into custom sections"));
cl::opt<std::string>
CodeGenDataUsePath("codegen-data-use-path", cl::init(""), cl::Hidden,
cl::desc("File path to where .cgdata file is read"));

static std::string getCGDataErrString(cgdata_error Err,
const std::string &ErrMsg = "") {
std::string Msg;
Expand Down Expand Up @@ -132,7 +139,24 @@ CodeGenData &CodeGenData::getInstance() {
std::call_once(CodeGenData::OnceFlag, []() {
Instance = std::unique_ptr<CodeGenData>(new CodeGenData());

// TODO: Initialize writer or reader mode for the client optimization.
if (CodeGenDataGenerate)
Instance->EmitCGData = true;
else if (!CodeGenDataUsePath.empty()) {
// Initialize the global CGData if the input file name is given.
// We do not error-out when failing to parse the input file.
// Instead, just emit an warning message and fall back as if no CGData
// were available.
auto FS = vfs::getRealFileSystem();
auto ReaderOrErr = CodeGenDataReader::create(CodeGenDataUsePath, *FS);
if (Error E = ReaderOrErr.takeError()) {
warn(std::move(E), CodeGenDataUsePath);
return;
}
// Publish each CGData based on the data type in the header.
auto Reader = ReaderOrErr->get();
if (Reader->hasOutlinedHashTree())
Instance->publishOutlinedHashTree(Reader->releaseOutlinedHashTree());
}
});
return *(Instance.get());
}
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/CodeGen/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,7 @@ add_llvm_component_library(LLVMCodeGen
Analysis
BitReader
BitWriter
CGData
CodeGenTypes
Core
MC
Expand Down
Loading

0 comments on commit 0f52545

Please sign in to comment.