Skip to content

Commit

Permalink
[serialization] no transitive decl change (#92083)
Browse files Browse the repository at this point in the history
Following of #86912

#### Motivation Example

The motivation of the patch series is that, for a module interface unit
`X`, when the dependent modules of `X` changes, if the changes is not
relevant with `X`, we hope the BMI of `X` won't change. For the specific
patch, we hope if the changes was about irrelevant declaration changes,
we hope the BMI of `X` won't change. **However**, I found the patch
itself is not very useful in practice, since the adding or removing
declarations, will change the state of identifiers and types in most
cases.

That said, for the most simple example,

```
// partA.cppm
export module m:partA;

// partA.v1.cppm
export module m:partA;
export void a() {}

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}
```

the BMI of `onlyUseB` will change after we change the implementation of
`partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new
identifiers and types (the function prototype).

So in this patch, we have to write the tests as:

```
// partA.cppm
export module m:partA;
export int getA() { ... }
export int getA2(int) { ... }

// partA.v1.cppm
export module m:partA;
export int getA() { ... }
export int getA(int) { ... }
export int getA2(int) { ... }

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}
```

so that the new introduced declaration `int getA(int)` doesn't introduce
new identifiers and types, then the BMI of `onlyUseB` can keep
unchanged.

While it looks not so great, the patch should be the base of the patch
to erase the transitive change for identifiers and types since I don't
know how can we introduce new types and identifiers without introducing
new declarations. Given how tightly the relationship between
declarations, types and identifiers, I think we can only reach the ideal
state after we made the series for all of the three entties.

#### Design details

The design of the patch is similar to
#86912, which extends the
32-bit DeclID to 64-bit and use the higher bits to store the module file
index and the lower bits to store the Local Decl ID.

A slight difference is that we only use 48 bits to store the new DeclID
since we try to use the higher 16 bits to store the module ID in the
prefix of Decl class. Previously, we use 32 bits to store the module ID
and 32 bits to store the DeclID. I don't want to allocate additional
space so I tried to make the additional space the same as 64 bits. An
potential interesting thing here is about the relationship between the
module ID and the module file index. I feel we can get the module file
index by the module ID. But I didn't prove it or implement it. Since I
want to make the patch itself as small as possible. We can make it in
the future if we want.

Another change in the patch is the new concept Decl Index, which means
the index of the very big array `DeclsLoaded` in ASTReader. Previously,
the index of a loaded declaration is simply the Decl ID minus
PREDEFINED_DECL_NUMs. So there are some places they got used
ambiguously. But this patch tried to split these two concepts.

#### Overhead

As #86912 did, the change will
increase the on-disk PCM file sizes. As the declaration ID may be the
most IDs in the PCM file, this can have the biggest impact on the size.
In my experiments, this change will bring 6.6% increase of the on-disk
PCM size. No compile-time performance regression observed. Given the
benefits in the motivation example, I think the cost is worthwhile.
  • Loading branch information
ChuanqiXu9 authored Jun 3, 2024
1 parent 84742cd commit ccb73e8
Show file tree
Hide file tree
Showing 12 changed files with 283 additions and 147 deletions.
17 changes: 3 additions & 14 deletions clang/include/clang/AST/DeclBase.h
Original file line number Diff line number Diff line change
Expand Up @@ -701,10 +701,7 @@ class alignas(8) Decl {

/// Set the owning module ID. This may only be called for
/// deserialized Decls.
void setOwningModuleID(unsigned ID) {
assert(isFromASTFile() && "Only works on a deserialized declaration");
*((unsigned*)this - 2) = ID;
}
void setOwningModuleID(unsigned ID);

public:
/// Determine the availability of the given declaration.
Expand Down Expand Up @@ -777,19 +774,11 @@ class alignas(8) Decl {

/// Retrieve the global declaration ID associated with this
/// declaration, which specifies where this Decl was loaded from.
GlobalDeclID getGlobalID() const {
if (isFromASTFile())
return (*((const GlobalDeclID *)this - 1));
return GlobalDeclID();
}
GlobalDeclID getGlobalID() const;

/// Retrieve the global ID of the module that owns this particular
/// declaration.
unsigned getOwningModuleID() const {
if (isFromASTFile())
return *((const unsigned*)this - 2);
return 0;
}
unsigned getOwningModuleID() const;

private:
Module *getOwningModuleSlow() const;
Expand Down
18 changes: 17 additions & 1 deletion clang/include/clang/AST/DeclID.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
#include "llvm/ADT/DenseMapInfo.h"
#include "llvm/ADT/iterator.h"

#include <climits>

namespace clang {

/// Predefined declaration IDs.
Expand Down Expand Up @@ -107,12 +109,16 @@ class DeclIDBase {
///
/// DeclID should only be used directly in serialization. All other users
/// should use LocalDeclID or GlobalDeclID.
using DeclID = uint32_t;
using DeclID = uint64_t;

protected:
DeclIDBase() : ID(PREDEF_DECL_NULL_ID) {}
explicit DeclIDBase(DeclID ID) : ID(ID) {}

explicit DeclIDBase(unsigned LocalID, unsigned ModuleFileIndex) {
ID = (DeclID)LocalID | ((DeclID)ModuleFileIndex << 32);
}

public:
DeclID get() const { return ID; }

Expand All @@ -124,6 +130,10 @@ class DeclIDBase {

bool isInvalid() const { return ID == PREDEF_DECL_NULL_ID; }

unsigned getModuleFileIndex() const { return ID >> 32; }

unsigned getLocalDeclIndex() const;

friend bool operator==(const DeclIDBase &LHS, const DeclIDBase &RHS) {
return LHS.ID == RHS.ID;
}
Expand Down Expand Up @@ -156,6 +166,9 @@ class LocalDeclID : public DeclIDBase {
LocalDeclID(PredefinedDeclIDs ID) : Base(ID) {}
explicit LocalDeclID(DeclID ID) : Base(ID) {}

explicit LocalDeclID(unsigned LocalID, unsigned ModuleFileIndex)
: Base(LocalID, ModuleFileIndex) {}

LocalDeclID &operator++() {
++ID;
return *this;
Expand All @@ -175,6 +188,9 @@ class GlobalDeclID : public DeclIDBase {
GlobalDeclID() : Base() {}
explicit GlobalDeclID(DeclID ID) : Base(ID) {}

explicit GlobalDeclID(unsigned LocalID, unsigned ModuleFileIndex)
: Base(LocalID, ModuleFileIndex) {}

// For DeclIDIterator<GlobalDeclID> to be able to convert a GlobalDeclID
// to a LocalDeclID.
explicit operator LocalDeclID() const { return LocalDeclID(this->ID); }
Expand Down
6 changes: 6 additions & 0 deletions clang/include/clang/Serialization/ASTBitCodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,12 @@ class DeclOffset {
}
};

// The unaligned decl ID used in the Blobs of bistreams.
using unaligned_decl_id_t =
llvm::support::detail::packed_endian_specific_integral<
serialization::DeclID, llvm::endianness::native,
llvm::support::unaligned>;

/// The number of predefined preprocessed entity IDs.
const unsigned int NUM_PREDEF_PP_ENTITY_IDS = 1;

Expand Down
36 changes: 16 additions & 20 deletions clang/include/clang/Serialization/ASTReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -501,12 +501,6 @@ class ASTReader
/// = I + 1 has already been loaded.
llvm::PagedVector<Decl *> DeclsLoaded;

using GlobalDeclMapType = ContinuousRangeMap<GlobalDeclID, ModuleFile *, 4>;

/// Mapping from global declaration IDs to the module in which the
/// declaration resides.
GlobalDeclMapType GlobalDeclMap;

using FileOffset = std::pair<ModuleFile *, uint64_t>;
using FileOffsetsTy = SmallVector<FileOffset, 2>;
using DeclUpdateOffsetsMap = llvm::DenseMap<GlobalDeclID, FileOffsetsTy>;
Expand Down Expand Up @@ -589,10 +583,11 @@ class ASTReader

struct FileDeclsInfo {
ModuleFile *Mod = nullptr;
ArrayRef<LocalDeclID> Decls;
ArrayRef<serialization::unaligned_decl_id_t> Decls;

FileDeclsInfo() = default;
FileDeclsInfo(ModuleFile *Mod, ArrayRef<LocalDeclID> Decls)
FileDeclsInfo(ModuleFile *Mod,
ArrayRef<serialization::unaligned_decl_id_t> Decls)
: Mod(Mod), Decls(Decls) {}
};

Expand All @@ -601,11 +596,7 @@ class ASTReader

/// An array of lexical contents of a declaration context, as a sequence of
/// Decl::Kind, DeclID pairs.
using unaligned_decl_id_t =
llvm::support::detail::packed_endian_specific_integral<
serialization::DeclID, llvm::endianness::native,
llvm::support::unaligned>;
using LexicalContents = ArrayRef<unaligned_decl_id_t>;
using LexicalContents = ArrayRef<serialization::unaligned_decl_id_t>;

/// Map from a DeclContext to its lexical contents.
llvm::DenseMap<const DeclContext*, std::pair<ModuleFile*, LexicalContents>>
Expand Down Expand Up @@ -1486,22 +1477,23 @@ class ASTReader
unsigned ClientLoadCapabilities);

public:
class ModuleDeclIterator : public llvm::iterator_adaptor_base<
ModuleDeclIterator, const LocalDeclID *,
std::random_access_iterator_tag, const Decl *,
ptrdiff_t, const Decl *, const Decl *> {
class ModuleDeclIterator
: public llvm::iterator_adaptor_base<
ModuleDeclIterator, const serialization::unaligned_decl_id_t *,
std::random_access_iterator_tag, const Decl *, ptrdiff_t,
const Decl *, const Decl *> {
ASTReader *Reader = nullptr;
ModuleFile *Mod = nullptr;

public:
ModuleDeclIterator() : iterator_adaptor_base(nullptr) {}

ModuleDeclIterator(ASTReader *Reader, ModuleFile *Mod,
const LocalDeclID *Pos)
const serialization::unaligned_decl_id_t *Pos)
: iterator_adaptor_base(Pos), Reader(Reader), Mod(Mod) {}

value_type operator*() const {
return Reader->GetDecl(Reader->getGlobalDeclID(*Mod, *I));
return Reader->GetDecl(Reader->getGlobalDeclID(*Mod, (LocalDeclID)*I));
}

value_type operator->() const { return **this; }
Expand Down Expand Up @@ -1541,6 +1533,9 @@ class ASTReader
StringRef Arg2 = StringRef(), StringRef Arg3 = StringRef()) const;
void Error(llvm::Error &&Err) const;

/// Translate a \param GlobalDeclID to the index of DeclsLoaded array.
unsigned translateGlobalDeclIDToIndex(GlobalDeclID ID) const;

public:
/// Load the AST file and validate its contents against the given
/// Preprocessor.
Expand Down Expand Up @@ -1912,7 +1907,8 @@ class ASTReader

/// Retrieve the module file that owns the given declaration, or NULL
/// if the declaration is not from a module file.
ModuleFile *getOwningModuleFile(const Decl *D);
ModuleFile *getOwningModuleFile(const Decl *D) const;
ModuleFile *getOwningModuleFile(GlobalDeclID ID) const;

/// Returns the source location for the decl \p ID.
SourceLocation getSourceLocationForDeclID(GlobalDeclID ID);
Expand Down
18 changes: 3 additions & 15 deletions clang/include/clang/Serialization/ModuleFile.h
Original file line number Diff line number Diff line change
Expand Up @@ -454,23 +454,11 @@ class ModuleFile {
/// by the declaration ID (-1).
const DeclOffset *DeclOffsets = nullptr;

/// Base declaration ID for declarations local to this module.
serialization::DeclID BaseDeclID = 0;

/// Remapping table for declaration IDs in this module.
ContinuousRangeMap<serialization::DeclID, int, 2> DeclRemap;

/// Mapping from the module files that this module file depends on
/// to the base declaration ID for that module as it is understood within this
/// module.
///
/// This is effectively a reverse global-to-local mapping for declaration
/// IDs, so that we can interpret a true global ID (for this translation unit)
/// as a local ID (for this module file).
llvm::DenseMap<ModuleFile *, serialization::DeclID> GlobalToLocalDeclIDs;
/// Base declaration index in ASTReader for declarations local to this module.
unsigned BaseDeclIndex = 0;

/// Array of file-level DeclIDs sorted by file.
const LocalDeclID *FileSortedDecls = nullptr;
const serialization::unaligned_decl_id_t *FileSortedDecls = nullptr;
unsigned NumFileSortedDecls = 0;

/// Array of category list location information within this
Expand Down
2 changes: 1 addition & 1 deletion clang/include/clang/Serialization/ModuleManager.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ namespace serialization {
/// Manages the set of modules loaded by an AST reader.
class ModuleManager {
/// The chain of AST files, in the order in which we started to load
/// them (this order isn't really useful for anything).
/// them.
SmallVector<std::unique_ptr<ModuleFile>, 2> Chain;

/// The chain of non-module PCH files. The first entry is the one named
Expand Down
40 changes: 33 additions & 7 deletions clang/lib/AST/DeclBase.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,18 +74,17 @@ void *Decl::operator new(std::size_t Size, const ASTContext &Context,
GlobalDeclID ID, std::size_t Extra) {
// Allocate an extra 8 bytes worth of storage, which ensures that the
// resulting pointer will still be 8-byte aligned.
static_assert(sizeof(unsigned) * 2 >= alignof(Decl),
"Decl won't be misaligned");
static_assert(sizeof(uint64_t) >= alignof(Decl), "Decl won't be misaligned");
void *Start = Context.Allocate(Size + Extra + 8);
void *Result = (char*)Start + 8;

unsigned *PrefixPtr = (unsigned *)Result - 2;
uint64_t *PrefixPtr = (uint64_t *)Result - 1;

// Zero out the first 4 bytes; this is used to store the owning module ID.
PrefixPtr[0] = 0;
*PrefixPtr = ID.get();

// Store the global declaration ID in the second 4 bytes.
PrefixPtr[1] = ID.get();
// We leave the upper 16 bits to store the module IDs. 48 bits should be
// sufficient to store a declaration ID.
assert(*PrefixPtr < llvm::maskTrailingOnes<uint64_t>(48));

return Result;
}
Expand All @@ -111,6 +110,29 @@ void *Decl::operator new(std::size_t Size, const ASTContext &Ctx,
return ::operator new(Size + Extra, Ctx);
}

GlobalDeclID Decl::getGlobalID() const {
if (!isFromASTFile())
return GlobalDeclID();
// See the comments in `Decl::operator new` for details.
uint64_t ID = *((const uint64_t *)this - 1);
return GlobalDeclID(ID & llvm::maskTrailingOnes<uint64_t>(48));
}

unsigned Decl::getOwningModuleID() const {
if (!isFromASTFile())
return 0;

uint64_t ID = *((const uint64_t *)this - 1);
return ID >> 48;
}

void Decl::setOwningModuleID(unsigned ID) {
assert(isFromASTFile() && "Only works on a deserialized declaration");
uint64_t *IDAddress = (uint64_t *)this - 1;
assert(!((*IDAddress) >> 48) && "We should only set the module ID once");
*IDAddress |= (uint64_t)ID << 48;
}

Module *Decl::getOwningModuleSlow() const {
assert(isFromASTFile() && "Not from AST file?");
return getASTContext().getExternalSource()->getModule(getOwningModuleID());
Expand Down Expand Up @@ -2164,3 +2186,7 @@ DependentDiagnostic *DependentDiagnostic::Create(ASTContext &C,

return DD;
}

unsigned DeclIDBase::getLocalDeclIndex() const {
return ID & llvm::maskTrailingOnes<DeclID>(32);
}
Loading

0 comments on commit ccb73e8

Please sign in to comment.