Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serialization] no transitive decl change #92083

Merged
merged 1 commit into from
Jun 3, 2024

Conversation

ChuanqiXu9
Copy link
Member

Following of #86912

Motivation Example

The motivation of the patch series is that, for a module interface unit X, when the dependent modules of X changes, if the changes is not relevant with X, we hope the BMI of X won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of X won't change. However, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases.

That said, for the most simple example,

// partA.cppm
export module m:partA;

// partA.v1.cppm
export module m:partA;
export void a() {}

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}

the BMI of onlyUseB will change after we change the implementation of partA.cppm to partA.v1.cppm. Since partA.v1.cppm introduces new identifiers and types (the function prototype).

So in this patch, we have to write the tests as:

// partA.cppm
export module m:partA;
export int getA() { ... }
export int getA2(int) { ... }

// partA.v1.cppm
export module m:partA;
export int getA() { ... }
export int getA(int) { ... }
export int getA2(int) { ... }

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}

so that the new introduced declaration int getA(int) doesn't introduce new identifiers and types, then the BMI of onlyUseB can keep unchanged.

While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties.

Design details

The design of the patch is similar to #86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID.

A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want.

Another change in the patch is the new concept Decl Index, which means the index of the very big array DeclsLoaded in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts.

Overhead

As #86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.

@ChuanqiXu9 ChuanqiXu9 added the clang:modules C++20 modules and Clang Header Modules label May 14, 2024
@ChuanqiXu9 ChuanqiXu9 self-assigned this May 14, 2024
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 14, 2024
@llvmbot
Copy link
Member

llvmbot commented May 14, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)

Changes

Following of #86912

Motivation Example

The motivation of the patch series is that, for a module interface unit X, when the dependent modules of X changes, if the changes is not relevant with X, we hope the BMI of X won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of X won't change. However, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases.

That said, for the most simple example,

// partA.cppm
export module m:partA;

// partA.v1.cppm
export module m:partA;
export void a() {}

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}

the BMI of onlyUseB will change after we change the implementation of partA.cppm to partA.v1.cppm. Since partA.v1.cppm introduces new identifiers and types (the function prototype).

So in this patch, we have to write the tests as:

// partA.cppm
export module m:partA;
export int getA() { ... }
export int getA2(int) { ... }

// partA.v1.cppm
export module m:partA;
export int getA() { ... }
export int getA(int) { ... }
export int getA2(int) { ... }

// partB.cppm
export module m:partB;
export void b() {}

// m.cppm
export module m;
export import :partA;
export import :partB;

// onlyUseB;
export module onlyUseB;
import m;
export inline void onluUseB() {
    b();
}

so that the new introduced declaration int getA(int) doesn't introduce new identifiers and types, then the BMI of onlyUseB can keep unchanged.

While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties.

Design details

The design of the patch is similar to #86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID.

A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want.

Another change in the patch is the new concept Decl Index, which means the index of the very big array DeclsLoaded in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts.

Overhead

As #86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.


Patch is 32.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92083.diff

12 Files Affected:

  • (modified) clang/include/clang/AST/DeclBase.h (+3-14)
  • (modified) clang/include/clang/AST/DeclID.h (+22-1)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+6)
  • (modified) clang/include/clang/Serialization/ASTReader.h (+16-20)
  • (modified) clang/include/clang/Serialization/ModuleFile.h (+3-15)
  • (modified) clang/include/clang/Serialization/ModuleManager.h (+1-1)
  • (modified) clang/lib/AST/DeclBase.cpp (+27-7)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+85-74)
  • (modified) clang/lib/Serialization/ASTReaderDecl.cpp (+5-7)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+1-6)
  • (modified) clang/lib/Serialization/ModuleFile.cpp (+1-2)
  • (added) clang/test/Modules/no-transitive-decls-change.cppm (+112)
diff --git a/clang/include/clang/AST/DeclBase.h b/clang/include/clang/AST/DeclBase.h
index e43e812cd9455..4bdf27aa99405 100644
--- a/clang/include/clang/AST/DeclBase.h
+++ b/clang/include/clang/AST/DeclBase.h
@@ -701,10 +701,7 @@ class alignas(8) Decl {
 
   /// Set the owning module ID.  This may only be called for
   /// deserialized Decls.
-  void setOwningModuleID(unsigned ID) {
-    assert(isFromASTFile() && "Only works on a deserialized declaration");
-    *((unsigned*)this - 2) = ID;
-  }
+  void setOwningModuleID(unsigned ID);
 
 public:
   /// Determine the availability of the given declaration.
@@ -777,19 +774,11 @@ class alignas(8) Decl {
 
   /// Retrieve the global declaration ID associated with this
   /// declaration, which specifies where this Decl was loaded from.
-  GlobalDeclID getGlobalID() const {
-    if (isFromASTFile())
-      return (*((const GlobalDeclID *)this - 1));
-    return GlobalDeclID();
-  }
+  GlobalDeclID getGlobalID() const;
 
   /// Retrieve the global ID of the module that owns this particular
   /// declaration.
-  unsigned getOwningModuleID() const {
-    if (isFromASTFile())
-      return *((const unsigned*)this - 2);
-    return 0;
-  }
+  unsigned getOwningModuleID() const;
 
 private:
   Module *getOwningModuleSlow() const;
diff --git a/clang/include/clang/AST/DeclID.h b/clang/include/clang/AST/DeclID.h
index 614ba06b63860..a6e4b31f3a6fb 100644
--- a/clang/include/clang/AST/DeclID.h
+++ b/clang/include/clang/AST/DeclID.h
@@ -19,6 +19,8 @@
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/ADT/iterator.h"
 
+#include <climits>
+
 namespace clang {
 
 /// Predefined declaration IDs.
@@ -107,12 +109,16 @@ class DeclIDBase {
   ///
   /// DeclID should only be used directly in serialization. All other users
   /// should use LocalDeclID or GlobalDeclID.
-  using DeclID = uint32_t;
+  using DeclID = uint64_t;
 
 protected:
   DeclIDBase() : ID(PREDEF_DECL_NULL_ID) {}
   explicit DeclIDBase(DeclID ID) : ID(ID) {}
 
+  explicit DeclIDBase(unsigned LocalID, unsigned ModuleFileIndex) {
+    ID = (DeclID)LocalID | ((DeclID)ModuleFileIndex << 32);
+  }
+
 public:
   DeclID get() const { return ID; }
 
@@ -124,6 +130,15 @@ class DeclIDBase {
 
   bool isInvalid() const { return ID == PREDEF_DECL_NULL_ID; }
 
+  unsigned getModuleFileIndex() const { return ID >> 32; }
+
+  unsigned getLocalDeclIndex() const {
+    // Implement it directly instead of calling `llvm::maskTrailingOnes` since
+    // we don't want `MathExtras.h` to be inclued here.
+    const unsigned Bits = CHAR_BIT * sizeof(DeclID);
+    return ID & (DeclID(-1) >> (Bits - 32));
+  }
+
   friend bool operator==(const DeclIDBase &LHS, const DeclIDBase &RHS) {
     return LHS.ID == RHS.ID;
   }
@@ -156,6 +171,9 @@ class LocalDeclID : public DeclIDBase {
   LocalDeclID(PredefinedDeclIDs ID) : Base(ID) {}
   explicit LocalDeclID(DeclID ID) : Base(ID) {}
 
+  explicit LocalDeclID(unsigned LocalID, unsigned ModuleFileIndex)
+      : Base(LocalID, ModuleFileIndex) {}
+
   LocalDeclID &operator++() {
     ++ID;
     return *this;
@@ -175,6 +193,9 @@ class GlobalDeclID : public DeclIDBase {
   GlobalDeclID() : Base() {}
   explicit GlobalDeclID(DeclID ID) : Base(ID) {}
 
+  explicit GlobalDeclID(unsigned LocalID, unsigned ModuleFileIndex)
+      : Base(LocalID, ModuleFileIndex) {}
+
   // For DeclIDIterator<GlobalDeclID> to be able to convert a GlobalDeclID
   // to a LocalDeclID.
   explicit operator LocalDeclID() const { return LocalDeclID(this->ID); }
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index d3538e43d3d78..772452e3afc55 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -255,6 +255,12 @@ class DeclOffset {
   }
 };
 
+// The unaligned decl ID used in the Blobs of bistreams.
+using unalighed_decl_id_t =
+    llvm::support::detail::packed_endian_specific_integral<
+        serialization::DeclID, llvm::endianness::native,
+        llvm::support::unaligned>;
+
 /// The number of predefined preprocessed entity IDs.
 const unsigned int NUM_PREDEF_PP_ENTITY_IDS = 1;
 
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 1bb5fa27a2419..d028d52fc5ef1 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -501,12 +501,6 @@ class ASTReader
   /// = I + 1 has already been loaded.
   llvm::PagedVector<Decl *> DeclsLoaded;
 
-  using GlobalDeclMapType = ContinuousRangeMap<GlobalDeclID, ModuleFile *, 4>;
-
-  /// Mapping from global declaration IDs to the module in which the
-  /// declaration resides.
-  GlobalDeclMapType GlobalDeclMap;
-
   using FileOffset = std::pair<ModuleFile *, uint64_t>;
   using FileOffsetsTy = SmallVector<FileOffset, 2>;
   using DeclUpdateOffsetsMap = llvm::DenseMap<GlobalDeclID, FileOffsetsTy>;
@@ -589,10 +583,11 @@ class ASTReader
 
   struct FileDeclsInfo {
     ModuleFile *Mod = nullptr;
-    ArrayRef<LocalDeclID> Decls;
+    ArrayRef<serialization::unalighed_decl_id_t> Decls;
 
     FileDeclsInfo() = default;
-    FileDeclsInfo(ModuleFile *Mod, ArrayRef<LocalDeclID> Decls)
+    FileDeclsInfo(ModuleFile *Mod,
+                  ArrayRef<serialization::unalighed_decl_id_t> Decls)
         : Mod(Mod), Decls(Decls) {}
   };
 
@@ -601,11 +596,7 @@ class ASTReader
 
   /// An array of lexical contents of a declaration context, as a sequence of
   /// Decl::Kind, DeclID pairs.
-  using unalighed_decl_id_t =
-      llvm::support::detail::packed_endian_specific_integral<
-          serialization::DeclID, llvm::endianness::native,
-          llvm::support::unaligned>;
-  using LexicalContents = ArrayRef<unalighed_decl_id_t>;
+  using LexicalContents = ArrayRef<serialization::unalighed_decl_id_t>;
 
   /// Map from a DeclContext to its lexical contents.
   llvm::DenseMap<const DeclContext*, std::pair<ModuleFile*, LexicalContents>>
@@ -1486,10 +1477,11 @@ class ASTReader
                                unsigned ClientLoadCapabilities);
 
 public:
-  class ModuleDeclIterator : public llvm::iterator_adaptor_base<
-                                 ModuleDeclIterator, const LocalDeclID *,
-                                 std::random_access_iterator_tag, const Decl *,
-                                 ptrdiff_t, const Decl *, const Decl *> {
+  class ModuleDeclIterator
+      : public llvm::iterator_adaptor_base<
+            ModuleDeclIterator, const serialization::unalighed_decl_id_t *,
+            std::random_access_iterator_tag, const Decl *, ptrdiff_t,
+            const Decl *, const Decl *> {
     ASTReader *Reader = nullptr;
     ModuleFile *Mod = nullptr;
 
@@ -1497,11 +1489,11 @@ class ASTReader
     ModuleDeclIterator() : iterator_adaptor_base(nullptr) {}
 
     ModuleDeclIterator(ASTReader *Reader, ModuleFile *Mod,
-                       const LocalDeclID *Pos)
+                       const serialization::unalighed_decl_id_t *Pos)
         : iterator_adaptor_base(Pos), Reader(Reader), Mod(Mod) {}
 
     value_type operator*() const {
-      return Reader->GetDecl(Reader->getGlobalDeclID(*Mod, *I));
+      return Reader->GetDecl(Reader->getGlobalDeclID(*Mod, (LocalDeclID)*I));
     }
 
     value_type operator->() const { return **this; }
@@ -1541,6 +1533,9 @@ class ASTReader
              StringRef Arg2 = StringRef(), StringRef Arg3 = StringRef()) const;
   void Error(llvm::Error &&Err) const;
 
+  /// Translate a \param GlobalDeclID to the index of DeclsLoaded array.
+  unsigned translateGlobalDeclIDToIndex(GlobalDeclID ID) const;
+
 public:
   /// Load the AST file and validate its contents against the given
   /// Preprocessor.
@@ -1912,7 +1907,8 @@ class ASTReader
 
   /// Retrieve the module file that owns the given declaration, or NULL
   /// if the declaration is not from a module file.
-  ModuleFile *getOwningModuleFile(const Decl *D);
+  ModuleFile *getOwningModuleFile(const Decl *D) const;
+  ModuleFile *getOwningModuleFile(GlobalDeclID ID) const;
 
   /// Returns the source location for the decl \p ID.
   SourceLocation getSourceLocationForDeclID(GlobalDeclID ID);
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 7d8cbe3d40f56..5cada272f59d3 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -454,23 +454,11 @@ class ModuleFile {
   /// by the declaration ID (-1).
   const DeclOffset *DeclOffsets = nullptr;
 
-  /// Base declaration ID for declarations local to this module.
-  serialization::DeclID BaseDeclID = 0;
-
-  /// Remapping table for declaration IDs in this module.
-  ContinuousRangeMap<serialization::DeclID, int, 2> DeclRemap;
-
-  /// Mapping from the module files that this module file depends on
-  /// to the base declaration ID for that module as it is understood within this
-  /// module.
-  ///
-  /// This is effectively a reverse global-to-local mapping for declaration
-  /// IDs, so that we can interpret a true global ID (for this translation unit)
-  /// as a local ID (for this module file).
-  llvm::DenseMap<ModuleFile *, serialization::DeclID> GlobalToLocalDeclIDs;
+  /// Base declaration index in ASTReader for declarations local to this module.
+  unsigned BaseDeclIndex = 0;
 
   /// Array of file-level DeclIDs sorted by file.
-  const LocalDeclID *FileSortedDecls = nullptr;
+  const serialization::unalighed_decl_id_t *FileSortedDecls = nullptr;
   unsigned NumFileSortedDecls = 0;
 
   /// Array of category list location information within this
diff --git a/clang/include/clang/Serialization/ModuleManager.h b/clang/include/clang/Serialization/ModuleManager.h
index 3bd379acf7eda..097a1c16a0573 100644
--- a/clang/include/clang/Serialization/ModuleManager.h
+++ b/clang/include/clang/Serialization/ModuleManager.h
@@ -46,7 +46,7 @@ namespace serialization {
 /// Manages the set of modules loaded by an AST reader.
 class ModuleManager {
   /// The chain of AST files, in the order in which we started to load
-  /// them (this order isn't really useful for anything).
+  /// them.
   SmallVector<std::unique_ptr<ModuleFile>, 2> Chain;
 
   /// The chain of non-module PCH files. The first entry is the one named
diff --git a/clang/lib/AST/DeclBase.cpp b/clang/lib/AST/DeclBase.cpp
index 03e1055251c24..225a0015f26ae 100644
--- a/clang/lib/AST/DeclBase.cpp
+++ b/clang/lib/AST/DeclBase.cpp
@@ -74,18 +74,16 @@ void *Decl::operator new(std::size_t Size, const ASTContext &Context,
                          GlobalDeclID ID, std::size_t Extra) {
   // Allocate an extra 8 bytes worth of storage, which ensures that the
   // resulting pointer will still be 8-byte aligned.
-  static_assert(sizeof(unsigned) * 2 >= alignof(Decl),
-                "Decl won't be misaligned");
+  static_assert(sizeof(uint64_t) >= alignof(Decl), "Decl won't be misaligned");
   void *Start = Context.Allocate(Size + Extra + 8);
   void *Result = (char*)Start + 8;
 
-  unsigned *PrefixPtr = (unsigned *)Result - 2;
+  uint64_t *PrefixPtr = (uint64_t *)Result - 1;
 
-  // Zero out the first 4 bytes; this is used to store the owning module ID.
-  PrefixPtr[0] = 0;
+  *PrefixPtr = ID.get();
 
-  // Store the global declaration ID in the second 4 bytes.
-  PrefixPtr[1] = ID.get();
+  // We leave the upper 16 bits to store the module IDs.
+  assert(*PrefixPtr < llvm::maskTrailingOnes<uint64_t>(48));
 
   return Result;
 }
@@ -111,6 +109,28 @@ void *Decl::operator new(std::size_t Size, const ASTContext &Ctx,
   return ::operator new(Size + Extra, Ctx);
 }
 
+GlobalDeclID Decl::getGlobalID() const {
+  if (!isFromASTFile())
+    return GlobalDeclID();
+  uint64_t ID = *((const uint64_t *)this - 1);
+  return GlobalDeclID(ID & llvm::maskTrailingOnes<uint64_t>(48));
+}
+
+unsigned Decl::getOwningModuleID() const {
+  if (!isFromASTFile())
+    return 0;
+
+  uint64_t ID = *((const uint64_t *)this - 1);
+  return ID >> 48;
+}
+
+void Decl::setOwningModuleID(unsigned ID) {
+  assert(isFromASTFile() && "Only works on a deserialized declaration");
+  uint64_t *IDAddress = (uint64_t *)this - 1;
+  assert(!((*IDAddress) >> 48) && "We should only set the module ID once");
+  *IDAddress |= (uint64_t)ID << 48;
+}
+
 Module *Decl::getOwningModuleSlow() const {
   assert(isFromASTFile() && "Not from AST file?");
   return getASTContext().getExternalSource()->getModule(getOwningModuleID());
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 510f61d9cccd8..b0059e449182b 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -1657,7 +1657,7 @@ bool ASTReader::ReadSLocEntry(int ID) {
 
     unsigned NumFileDecls = Record[7];
     if (NumFileDecls && ContextObj) {
-      const LocalDeclID *FirstDecl = F->FileSortedDecls + Record[6];
+      const unalighed_decl_id_t *FirstDecl = F->FileSortedDecls + Record[6];
       assert(F->FileSortedDecls && "FILE_SORTED_DECLS not encountered yet ?");
       FileDeclIDs[FID] =
           FileDeclsInfo(F, llvm::ArrayRef(FirstDecl, NumFileDecls));
@@ -3377,26 +3377,11 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
             "duplicate DECL_OFFSET record in AST file");
       F.DeclOffsets = (const DeclOffset *)Blob.data();
       F.LocalNumDecls = Record[0];
-      unsigned LocalBaseDeclID = Record[1];
-      F.BaseDeclID = getTotalNumDecls();
-
-      if (F.LocalNumDecls > 0) {
-        // Introduce the global -> local mapping for declarations within this
-        // module.
-        GlobalDeclMap.insert(std::make_pair(
-            GlobalDeclID(getTotalNumDecls() + NUM_PREDEF_DECL_IDS), &F));
-
-        // Introduce the local -> global mapping for declarations within this
-        // module.
-        F.DeclRemap.insertOrReplace(
-          std::make_pair(LocalBaseDeclID, F.BaseDeclID - LocalBaseDeclID));
-
-        // Introduce the global -> local mapping for declarations within this
-        // module.
-        F.GlobalToLocalDeclIDs[&F] = LocalBaseDeclID;
+      F.BaseDeclIndex = getTotalNumDecls();
 
+      if (F.LocalNumDecls > 0)
         DeclsLoaded.resize(DeclsLoaded.size() + F.LocalNumDecls);
-      }
+
       break;
     }
 
@@ -3631,7 +3616,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       break;
 
     case FILE_SORTED_DECLS:
-      F.FileSortedDecls = (const LocalDeclID *)Blob.data();
+      F.FileSortedDecls = (const unalighed_decl_id_t *)Blob.data();
       F.NumFileSortedDecls = Record[0];
       break;
 
@@ -4058,7 +4043,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
   RemapBuilder PreprocessedEntityRemap(F.PreprocessedEntityRemap);
   RemapBuilder SubmoduleRemap(F.SubmoduleRemap);
   RemapBuilder SelectorRemap(F.SelectorRemap);
-  RemapBuilder DeclRemap(F.DeclRemap);
   RemapBuilder TypeRemap(F.TypeRemap);
 
   auto &ImportedModuleVector = F.DependentModules;
@@ -4097,8 +4081,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
         endian::readNext<uint32_t, llvm::endianness::little>(Data);
     uint32_t SelectorIDOffset =
         endian::readNext<uint32_t, llvm::endianness::little>(Data);
-    uint32_t DeclIDOffset =
-        endian::readNext<uint32_t, llvm::endianness::little>(Data);
     uint32_t TypeIndexOffset =
         endian::readNext<uint32_t, llvm::endianness::little>(Data);
 
@@ -4116,11 +4098,7 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
               PreprocessedEntityRemap);
     mapOffset(SubmoduleIDOffset, OM->BaseSubmoduleID, SubmoduleRemap);
     mapOffset(SelectorIDOffset, OM->BaseSelectorID, SelectorRemap);
-    mapOffset(DeclIDOffset, OM->BaseDeclID, DeclRemap);
     mapOffset(TypeIndexOffset, OM->BaseTypeIndex, TypeRemap);
-
-    // Global -> local mappings.
-    F.GlobalToLocalDeclIDs[OM] = DeclIDOffset;
   }
 }
 
@@ -7644,18 +7622,25 @@ CXXBaseSpecifier *ASTReader::GetExternalCXXBaseSpecifiers(uint64_t Offset) {
 
 GlobalDeclID ASTReader::getGlobalDeclID(ModuleFile &F,
                                         LocalDeclID LocalID) const {
-  DeclID ID = LocalID.get();
-  if (ID < NUM_PREDEF_DECL_IDS)
-    return GlobalDeclID(ID);
+  if (LocalID.get() < NUM_PREDEF_DECL_IDS)
+    return GlobalDeclID(LocalID.get());
+
+  unsigned OwningModuleFileIndex = LocalID.getModuleFileIndex();
+  DeclID ID = LocalID.getLocalDeclIndex();
 
   if (!F.ModuleOffsetMap.empty())
     ReadModuleOffsetMap(F);
 
-  ContinuousRangeMap<DeclID, int, 2>::iterator I =
-      F.DeclRemap.find(ID - NUM_PREDEF_DECL_IDS);
-  assert(I != F.DeclRemap.end() && "Invalid index into decl index remap");
+  ModuleFile *OwningModuleFile =
+      OwningModuleFileIndex == 0
+          ? &F
+          : F.DependentModules[OwningModuleFileIndex - 1];
+
+  if (OwningModuleFileIndex == 0)
+    ID -= NUM_PREDEF_DECL_IDS;
 
-  return GlobalDeclID(ID + I->second);
+  uint64_t NewModuleFileIndex = OwningModuleFile->Index + 1;
+  return GlobalDeclID(ID, NewModuleFileIndex);
 }
 
 bool ASTReader::isDeclIDFromModule(GlobalDeclID ID, ModuleFile &M) const {
@@ -7663,31 +7648,33 @@ bool ASTReader::isDeclIDFromModule(GlobalDeclID ID, ModuleFile &M) const {
   if (ID.get() < NUM_PREDEF_DECL_IDS)
     return false;
 
-  return ID.get() - NUM_PREDEF_DECL_IDS >= M.BaseDeclID &&
-         ID.get() - NUM_PREDEF_DECL_IDS < M.BaseDeclID + M.LocalNumDecls;
+  unsigned ModuleFileIndex = ID.getModuleFileIndex();
+  return M.Index == ModuleFileIndex - 1;
+}
+
+ModuleFile *ASTReader::getOwningModuleFile(GlobalDeclID ID) const {
+  // Predefined decls aren't from any module.
+  if (ID.get() < NUM_PREDEF_DECL_IDS)
+    return nullptr;
+
+  uint64_t ModuleFileIndex = ID.getModuleFileIndex();
+  assert(ModuleFileIndex && "Untranslated Local Decl?");
+
+  return &getModuleManager()[ModuleFileIndex - 1];
 }
 
-ModuleFile *ASTReader::getOwningModuleFile(const Decl *D) {
+ModuleFile *ASTReader::getOwningModuleFile(const Decl *D) const {
   if (!D->isFromASTFile())
     return nullptr;
-  GlobalDeclMapType::const_iterator I =
-      GlobalDeclMap.find(GlobalDeclID(D->getGlobalID()));
-  assert(I != GlobalDeclMap.end() && "Corrupted global declaration map");
-  return I->second;
+
+  return getOwningModuleFile(GlobalDeclID(D->getGlobalID()));
 }
 
 SourceLocation ASTReader::getSourceLocationForDeclID(GlobalDeclID ID) {
   if (ID.get() < NUM_PREDEF_DECL_IDS)
     return SourceLocation();
 
-  unsigned Index = ID.get() - NUM_PREDEF_DECL_IDS;
-
-  if (Index > DeclsLoaded.size()) {
-    Error("declaration ID out-of-range for AST file");
-    return SourceLocation();
-  }
-
-  if (Decl *D = DeclsLoaded[Index])
+  if (Decl *D = GetExistingDecl(ID))
     return D->getLocation();
 
   SourceLocation Loc;
@@ -7754,8 +7741,19 @@ static Decl *getPredefinedDecl(ASTContext &Context, PredefinedDeclIDs ID) {
   llvm_unreachable("PredefinedDeclIDs unknown enum value");
 }
 
+unsigned ASTReader::translateGlobalDeclIDToIndex(GlobalDeclID GlobalID) const {
+  ModuleFile *OwningModuleFile = getOwningModuleFile(GlobalID);
+  if (!OwningModuleFile) {
+    assert(GlobalID.get() < NUM_PREDEF_DECL_IDS && "Untransalted Global ID?");
+    return GlobalID.get();
+  }
+
+  return OwningModuleFile->BaseDeclIndex + GlobalID.getLocalDeclIndex();
+}
+
 Decl *ASTReader::GetExistingDecl(GlobalDeclID ID) {
   assert(ContextObj && "reading decl with no AST context");
+
   if (ID.get() < NUM_PREDEF_DECL_IDS) {
     Decl *D = getPredefinedDecl(*ContextObj, (PredefinedDeclIDs)ID);
     if (D) {
@@ -7768,7 +7766,7 @@ Decl *ASTReader::GetExistingDecl(GlobalDeclID ID) {
     return D;
   }
 
-  unsigned Index = ID.get() - NUM_PREDEF_DECL_IDS;
+  unsigned Index = translateGlobalDeclIDToIndex(ID);
 
   if (Index >= DeclsLoaded.size()) {
     assert(0 && "declaration ID out-of-range for AST file");
@@ -7783,7 +7781,7 @@ Decl *ASTReader::GetDecl(GlobalDeclID ID) {
   if (ID.get() < NUM_PREDEF_DECL_IDS)
     return GetExistingDecl(ID);
 
-  unsigned Index = ID.get() - NUM_PREDEF_DECL_IDS;
+  unsigned Index = translateGlobalDeclIDToIndex(ID);
 
   if (Index >= DeclsLoaded.size()) {
     assert(0 && "declaration ID out-of-range for AST file");
@@ -7802,20 +7800,31 @@ Decl *ASTReader::GetDecl(GlobalDeclID ID) {
 
 LocalDeclID ASTReader::mapGlobalIDToModuleFileGlobalID(ModuleFile &M,
                                                        GlobalDeclID GlobalID) {
-  DeclID ID = GlobalID.get();
-  if (ID < NUM_PREDEF_DECL_IDS)
+  ...
[truncated]

@ChuanqiXu9
Copy link
Member Author

@jansvoboda11 @Bigcheese gentle ping

Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good conceptually, I left a couple of minor notes.

ChuanqiXu9 added a commit that referenced this pull request May 27, 2024
@ChuanqiXu9 ChuanqiXu9 force-pushed the users/ChuanqiXu9/NoTransitiveDeclChange branch from bf9f481 to 8b7a650 Compare May 27, 2024 06:28
@ChuanqiXu9
Copy link
Member Author

Thanks for reviewing : )

@ChuanqiXu9
Copy link
Member Author

I'd like to land this since:

  • I want to give more time testing this
  • the design of the patch is pretty similar with the previous one ([Modules] No transitive source location change #86912)
  • the scale of the patch is not very big (100+ lines of code except new tests)
  • I do think the functionality is very very important to modules.

@ChuanqiXu9 ChuanqiXu9 merged commit ccb73e8 into main Jun 3, 2024
8 checks passed
@ChuanqiXu9 ChuanqiXu9 deleted the users/ChuanqiXu9/NoTransitiveDeclChange branch June 3, 2024 08:13
@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Jun 3, 2024

I'll revert it as there are some crashes.


The crash log:

********** TEST 'Clang :: Modules/redecl-ivars.m' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: rm -rf /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp
+ rm -rf /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp
RUN: at line 3: split-file /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Modules/redecl-ivars.m /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp
+ split-file /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Modules/redecl-ivars.m /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp
RUN: at line 4: /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/lib/clang/19/include -nostdsysteminc -fsyntax-only -fobjc-runtime=macosx-10.9 -verify -I/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp/include /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp/test-mismatch-in-extension.m
+ /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/lib/clang/19/include -nostdsysteminc -fsyntax-only -fobjc-runtime=macosx-10.9 -verify -I/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp/include /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp/test-mismatch-in-extension.m
RUN: at line 5: /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/lib/clang/19/include -nostdsysteminc -fsyntax-only -fobjc-runtime=macosx-10.9 -verify -I/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp/include /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp/test-mismatch-in-extension.m             -fmodules -fimplicit-module-maps -fmodules-cache-path=/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp/modules.cache
+ /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/lib/clang/19/include -nostdsysteminc -fsyntax-only -fobjc-runtime=macosx-10.9 -verify -I/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp/include /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp/test-mismatch-in-extension.m -fmodules -fimplicit-module-maps -fmodules-cache-path=/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Modules/Output/redecl-ivars.m.tmp/modules.cache
/b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan_ubsan/include/c++/v1/__algorithm/lower_bound.h:39:53: runtime error: reference binding to misaligned address 0x529000045c44 for type 'const clang::serialization::ObjCCategoriesInfo', which requires 8 byte alignment
0x529000045c44: note: pointer points here
  00 00 00 00 02 00 00 00  01 00 00 00 02 00 00 00  00 00 00 00 c3 0d 88 60  0a a8 81 10 03 20 08 00
              ^ 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan_ubsan/include/c++/v1/__algorithm/lower_bound.h:39:53 

--

It looks like misaligned access too. I'll try to fix it.

(It is somewhat hurting that we can't find them on the trunk.)

ChuanqiXu9 added a commit that referenced this pull request Jun 3, 2024
This reverts commit ccb73e8.

It looks like there are some bots complaining about the patch.
See the post commit comment in
#92083 to track it.
@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Jun 3, 2024

This also broke an lldb test: https://lab.llvm.org/buildbot/#/builders/96/builds/58449

lldb-test: ../llvm-project/clang/lib/AST/DeclBase.cpp:132: void clang::Decl::setOwningModuleID(unsigned int): Assertion !((*IDAddress) >> 48) && "We should only set the module ID once"' failed.`

I can't say if right now whether lldb is being too strict or this is a real problem. Could be the result of the same thing UBSAN found.

If you want to reproduce that, there is a ptifall to note:

$ ninja && ninja lldb-test && ./bin/llvm-lit ../llvm-project/lldb/test/Shell/SymbolFile/DWARF/x86/module-ownership.mm -a

Make sure you're rebuilding lldb-test each time, it's not included in ninja but is in ninja check-lldb.

@hctim
Copy link
Collaborator

hctim commented Jun 3, 2024

Hi from the sanitizer buildbot mainter, thanks for the fast revert! For full reproduction, you can use https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild and buildbot_fast.sh. But, you can reproduce it quicker by using -DCMAKE_C_FLAGS="-fsanitize=undefined" -DCMAKE_CXX_FLAGS="-fsanitize=undefined" -DLLVM_USE_SANITIZER=Undefined in your cmake (then {ninja,make} check-clang).

@Michael137
Copy link
Member

Michael137 commented Jun 3, 2024

FYI, also broke the macOS LLDB buildbots: https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/lldb-cmake/2637/execution/node/97/log/

********************
Unresolved Tests (3):
  lldb-api :: commands/expression/namespace_local_var_same_name_obj_c/TestNamespaceLocalVarSameNameObjC.py
  lldb-api :: lang/cpp/gmodules/template-with-same-arg/TestTemplateWithSameArg.py
  lldb-api :: lang/cpp/gmodules/templates/TestGModules.py

********************
Failed Tests (1):
  lldb-shell :: SymbolFile/DWARF/x86/module-ownership.mm
Assertion failed: (!((*IDAddress) >> 48) && "We should only set the module ID once"), function setOwningModuleID, file DeclBase.cpp, line 132.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.

@alexfh
Copy link
Contributor

alexfh commented Jun 13, 2024

Spending ~80% of the compilation time in clang::serialization::MultiOnDiskHashTable seems wrong.

It also seems like #95348 won't solve this problem, though if you have strong reasons to believe it will, I can try.

As for a reproducer, providing reproducers for modular compilations proved to be an extremely time consuming task. In this particular case we're talking about a compilation that loads more than a thousand dependent modules. It can take quite some time to figure out which ones are contributing most to the clang runtime (-ftime-report only shows itemization for Preloading ....pcm, which sum up to less than a second, while the full compilation slowed down from ~200s to ~1000s).

@ilya-biryukov
Copy link
Contributor

I added a printf into FindExternalVisibleDeclsByName where it calls find on MultiOnDiskHashTable to see what names have been causing this and I can see that we stall on the first name we try to look up. The name std is the first one and was previously processed almost instantly and now it takes >5 minutes to return results for it. I believe this is where most, if not all, of the overhead is coming from.

Will be digging more, but maybe this gives some clues to where the problem is.

@ilya-biryukov
Copy link
Contributor

In both cases we are condensing 1105 on-disk hash tables, but it took seconds before and takes multiple minutes after the change.

@ilya-biryukov
Copy link
Contributor

I only managed to confirm that the problem is that merging of individual PCMs into a merged hash table has become much slower with this change.
I tried with #95348 too, but it didn't help.

I will continue my investigations and reductions tomorrow.

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Jun 14, 2024

Thanks for the profiling data. It narrows the scope a lot. But it makes me confusing too. Since the scope is pretty narrow, we can do an analysis here:

void condense() {
MergedTable *Merged = getMergedTable();
if (!Merged)
Merged = new MergedTable;
// Read in all the tables and merge them together.
// FIXME: Be smarter about which tables we merge.
for (auto *ODT : tables()) {
auto &HT = ODT->Table;
Info &InfoObj = HT.getInfoObj();
for (auto I = HT.data_begin(), E = HT.data_end(); I != E; ++I) {
auto *LocalPtr = I.getItem();
// FIXME: Don't rely on the OnDiskHashTable format here.
auto L = InfoObj.ReadKeyDataLength(LocalPtr);
const internal_key_type &Key = InfoObj.ReadKey(LocalPtr, L.first);
data_type_builder ValueBuilder(Merged->Data[Key]);
InfoObj.ReadDataInto(Key, LocalPtr + L.first, L.second,
ValueBuilder);
}
Merged->Files.push_back(ODT->File);
delete ODT;
}
Tables.clear();
Tables.push_back(Table(Merged).getOpaqueValue());
}

In clang::serialization::MultiOnDiskHashTable::condense(), only

auto L = InfoObj.ReadKeyDataLength(LocalPtr);
and
InfoObj.ReadDataInto(Key, LocalPtr + L.first, L.second,
ValueBuilder);
should be affected by this patch. Where the change in
auto L = InfoObj.ReadKeyDataLength(LocalPtr);
should almost be constant since it reads the length.

And for

InfoObj.ReadDataInto(Key, LocalPtr + L.first, L.second,
ValueBuilder);
, its implementation is:
void ASTDeclContextNameLookupTrait::ReadDataInto(internal_key_type,
const unsigned char *d,
unsigned DataLen,
data_type_builder &Val) {
using namespace llvm::support;
for (unsigned NumDecls = DataLen / sizeof(DeclID); NumDecls; --NumDecls) {
LocalDeclID LocalID(endian::readNext<DeclID, llvm::endianness::little>(d));
Val.insert(Reader.getGlobalDeclID(F, LocalID));
}
}

For ASTReader::getGlobalDeclID(), the implementation is

GlobalDeclID ASTReader::getGlobalDeclID(ModuleFile &F,
LocalDeclID LocalID) const {
if (LocalID.get() < NUM_PREDEF_DECL_IDS)
return GlobalDeclID(LocalID.get());
unsigned OwningModuleFileIndex = LocalID.getModuleFileIndex();
DeclID ID = LocalID.getLocalDeclIndex();
if (!F.ModuleOffsetMap.empty())
ReadModuleOffsetMap(F);
ModuleFile *OwningModuleFile =
OwningModuleFileIndex == 0
? &F
: F.TransitiveImports[OwningModuleFileIndex - 1];
if (OwningModuleFileIndex == 0)
ID -= NUM_PREDEF_DECL_IDS;
uint64_t NewModuleFileIndex = OwningModuleFile->Index + 1;
return GlobalDeclID(ID, NewModuleFileIndex);
}
, what is almost bit operation comparing to the previous state.

So it just looks like the reason is just that we extend the DeclID from uint32_t to uint64_t and there are too many on-disk hash tables (not a blame). I feel it is related to memory allocations. But no confidence.

@alexfh could you try to profile on the statements level?


BTW, I feel we may read too many times in MultiOnDiskHashTable , maybe we can improve this.

ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this pull request Jun 14, 2024
…tables

See the post commit message in
llvm#92083 for rationale.

Previously, when we merge the lookup tables, we will read the tables
completely to get the data. But the above page shows that it might be
problematic in specific cases where there a lot of DeclIDs to be read.

In this patch, we mitigated the problem by not reading the contents of
the merged tables. But we may need to pay for the additional memory
usages.
@ChuanqiXu9
Copy link
Member Author

I sent #95506. It is a independent patch which may mitigate the issue you met. @alexfh you can try this when you have time.

@ilya-biryukov
Copy link
Contributor

While this looks like a useful optimization, I am still not sure why 2x more data (is that a correct upper bound?) can lead to a 10x slowdown. I will try to dig further to understand what's going on here.

@ilya-biryukov
Copy link
Contributor

I got to the bottom of it. The problem is that our hash function for 64 bit ints is not very good.

It will have a lot of collision when lower 32 bits are the same and the higher 32 bits of a 64 bit value change. Coincidentally, this is quite common in this case because collisions in low bits of GlobalDeclID are very much expected.

I tried replacing it with a variant of MurmurHash I found randomly on the internet and performance went back to normal.
@ChuanqiXu9 I think changing the default hash function for ints might have some far-reaching effects for LLVM code using DenseSet and would warrant a bigger discussion on what that function should be (I am not an expert in hash functions myself).
I will be happy to start that discussion on Monday, but feel free to jump into it earlier, if you have time.

In the meantime, I would again propose to revert the change until we can fix the hash function.
(Alternatively, we can change the hash function for GlobalDeclID specifically to workaround more locally, but we surely want a better hash function for uint64 in general)

@alexfh
Copy link
Contributor

alexfh commented Jun 14, 2024

Thank you for the analysis, Ilya!

Re: a possible resolution, I wonder why llvm::hash_value from llvm/ADT/Hashing.h is not used in DenseSet? In any case, if this can't be resolved quickly and needs a design discussion, it would be appropriate to revert the patch first.

@ChuanqiXu9
Copy link
Member Author

I got to the bottom of it. The problem is that our hash function for 64 bit ints is not very good.

It will have a lot of collision when lower 32 bits are the same and the higher 32 bits of a 64 bit value change. Coincidentally, this is quite common in this case because collisions in low bits of GlobalDeclID are very much expected.

I tried replacing it with a variant of MurmurHash I found randomly on the internet and performance went back to normal. @ChuanqiXu9 I think changing the default hash function for ints might have some far-reaching effects for LLVM code using DenseSet and would warrant a bigger discussion on what that function should be (I am not an expert in hash functions myself). I will be happy to start that discussion on Monday, but feel free to jump into it earlier, if you have time.

In the meantime, I would again propose to revert the change until we can fix the hash function. (Alternatively, we can change the hash function for GlobalDeclID specifically to workaround more locally, but we surely want a better hash function for uint64 in general)

Thanks for the analysis. It's pretty helpful. For reverting, on the one hand, we've already targeted the root issue. I think then it is not so rationale to me to block the patch to problems in that level. On the other hand, (I think) the patch (series) is pretty important for named modules (and probably clang explicit modules). And I really want to make that in 19. Given 19 is going to be branched in the end of the next month. From my experience, it is not very hopeful to land such a fundamental change quickly. So I propose to change the hash function in Serialization locally as a workaround and propose a discussion about replacing the hash function in discussion parallelly.

ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this pull request Jun 17, 2024
See the comment:
llvm#92083 (comment)

After the patch, llvm#92083, the
lower 32 bits of DeclID can be the same commonly. This may produce many
collisions. It will be best to change the default hash implementation
for uint64_t. But sent this one as a quick workaround.
@ilya-biryukov
Copy link
Contributor

Oh, I didn't realize you were eager to land this in Clang 19, thanks for sharing that.
In that case, I think a specialiazed hash function for GlobalDeclID is indeed the way to go.

I was also worried a little there are other performance implications of this change that would block us, but we won't know until we run a full release testing cycle, which may take a week or more. Hopefully it will be okay, as my observations around the increase in PCM sizes align with the numbers from this patch and performance seems to be on par as well, if we change the hash function.
However, I still wanted to mention it so that we won't come out completely of the blue.

I'll try to help reviewing the two patches you posted, thanks!

@ChuanqiXu9
Copy link
Member Author

Oh, I didn't realize you were eager to land this in Clang 19, thanks for sharing that. In that case, I think a specialiazed hash function for GlobalDeclID is indeed the way to go.

I was also worried a little there are other performance implications of this change that would block us, but we won't know until we run a full release testing cycle, which may take a week or more. Hopefully it will be okay, as my observations around the increase in PCM sizes align with the numbers from this patch and performance seems to be on par as well, if we change the hash function. However, I still wanted to mention it so that we won't come out completely of the blue.

I'll try to help reviewing the two patches you posted, thanks!

Thanks. I've added you as reviewers. I hope to land these 2 patches soon so that we can have more baking time for them.

ChuanqiXu9 added a commit that referenced this pull request Jun 18, 2024
…95730)

See the comment:
#92083 (comment)

After the patch, #92083, the
lower 32 bits of DeclID can be the same commonly. This may produce many
collisions. It will be best to change the default hash implementation
for uint64_t. But sent this one as a quick workaround.

Feel free to update this if you prefer other hash functions.
ChuanqiXu9 added a commit that referenced this pull request Jun 20, 2024
(Background: See the comment:
#92083 (comment))

It looks like the hash function for 64bits integers are not very good:

```
  static unsigned getHashValue(const unsigned long long& Val) {
    return (unsigned)(Val * 37ULL);
  }
```

Since the result is truncated to 32 bits. It looks like the higher 32
bits won't contribute to the result. So that `0x1'00000001` will have
the the same results to `0x2'00000001`, `0x3'00000001`, ...

Then we may meet a lot collisions in such cases. I feel it should
generally good to include higher 32 bits for hashing functions.

Not sure who's the appropriate reviewer, adding some people by
impressions.
ChuanqiXu9 added a commit that referenced this pull request Jun 20, 2024
Following of #92083

The motivation is still cutting of the unnecessary change in the
dependency chain. See the above link (recursively) for details.

After this patch, (and the above patch), we can already do something
pretty interesting. For example,

#### Motivation example

```

//--- m-partA.cppm
export module m:partA;

export inline int getA() {
    return 43;
}

export class A {
public:
    int getMem();
};

export template <typename T>
class ATempl {
public:
    T getT();
};

//--- m-partA.v1.cppm
export module m:partA;

export inline int getA() {
    return 43;
}

// Now we add a new declaration without introducing a new type.
// The consuming module which didn't use m:partA completely is expected to be
// not changed.
export inline int getA2() {
    return 88;
}

export class A {
public:
    int getMem();
    // Now we add a new declaration without introducing a new type.
    // The consuming module which didn't use m:partA completely is expected to be
    // not changed.
    int getMem2();
};

export template <typename T>
class ATempl {
public:
    T getT();
    // Add a new declaration without introducing a new type.
    T getT2();
};

//--- m-partB.cppm
export module m:partB;

export inline int getB() {
    return 430;
}

//--- m.cppm
export module m;
export import :partA;
export import :partB;

//--- useBOnly.cppm
export module useBOnly;
import m;

export inline int get() {
    return getB();
}
```

In this example, module `m` exports two partitions `:partA` and
`:partB`. And a consumer `useBOnly` only consumes the entities from
`:partB`. So we don't hope the BMI of `useBOnly` changes if only
`:partA` changes. After this patch, we can make it if the change of
`:partA` doesn't introduce new types. (And we can get rid of this if we
make no-transitive-type-change).

As the example shows, when we change the implementation of `:partA` from
`m-partA.cppm` to `m-partA.v1.cppm`, we add new function declaration
`getA2()` at the global namespace, add a new member function `getMem2()`
to class `A` and add a new member function to `getT2()` to class
template `ATempl`. And since `:partA` is not used by `useBOnly`
completely, the BMI of `useBOnly` won't change after we made above
changes.

#### Design details

Method used in this patch is similar with
#92083 and
#86912. It extends the 32 bit
IdentifierID to 64 bits and use the higher 32 bits to store the module
file index. So that the encoding of the identifier won't get affected by
other modules.

#### Overhead

Similar with #92083 and
#86912. The change is only
expected to increase the size of the on-disk .pcm files and not affect
the compile-time performances. And from my experiment, the size of the
on-disk change only increase 1%+ and observe no compile-time impacts.

#### Future Plans

I'll try to do the same thing for type ids. IIRC, it won't change the
dependency graph if we add a new type in an unused units. I do think
this is a significant win. And this will be a pretty good answer to "why
modules are better than headers."
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
(Background: See the comment:
llvm#92083 (comment))

It looks like the hash function for 64bits integers are not very good:

```
  static unsigned getHashValue(const unsigned long long& Val) {
    return (unsigned)(Val * 37ULL);
  }
```

Since the result is truncated to 32 bits. It looks like the higher 32
bits won't contribute to the result. So that `0x1'00000001` will have
the the same results to `0x2'00000001`, `0x3'00000001`, ...

Then we may meet a lot collisions in such cases. I feel it should
generally good to include higher 32 bits for hashing functions.

Not sure who's the appropriate reviewer, adding some people by
impressions.
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
Following of llvm#92083

The motivation is still cutting of the unnecessary change in the
dependency chain. See the above link (recursively) for details.

After this patch, (and the above patch), we can already do something
pretty interesting. For example,

#### Motivation example

```

//--- m-partA.cppm
export module m:partA;

export inline int getA() {
    return 43;
}

export class A {
public:
    int getMem();
};

export template <typename T>
class ATempl {
public:
    T getT();
};

//--- m-partA.v1.cppm
export module m:partA;

export inline int getA() {
    return 43;
}

// Now we add a new declaration without introducing a new type.
// The consuming module which didn't use m:partA completely is expected to be
// not changed.
export inline int getA2() {
    return 88;
}

export class A {
public:
    int getMem();
    // Now we add a new declaration without introducing a new type.
    // The consuming module which didn't use m:partA completely is expected to be
    // not changed.
    int getMem2();
};

export template <typename T>
class ATempl {
public:
    T getT();
    // Add a new declaration without introducing a new type.
    T getT2();
};

//--- m-partB.cppm
export module m:partB;

export inline int getB() {
    return 430;
}

//--- m.cppm
export module m;
export import :partA;
export import :partB;

//--- useBOnly.cppm
export module useBOnly;
import m;

export inline int get() {
    return getB();
}
```

In this example, module `m` exports two partitions `:partA` and
`:partB`. And a consumer `useBOnly` only consumes the entities from
`:partB`. So we don't hope the BMI of `useBOnly` changes if only
`:partA` changes. After this patch, we can make it if the change of
`:partA` doesn't introduce new types. (And we can get rid of this if we
make no-transitive-type-change).

As the example shows, when we change the implementation of `:partA` from
`m-partA.cppm` to `m-partA.v1.cppm`, we add new function declaration
`getA2()` at the global namespace, add a new member function `getMem2()`
to class `A` and add a new member function to `getT2()` to class
template `ATempl`. And since `:partA` is not used by `useBOnly`
completely, the BMI of `useBOnly` won't change after we made above
changes.

#### Design details

Method used in this patch is similar with
llvm#92083 and
llvm#86912. It extends the 32 bit
IdentifierID to 64 bits and use the higher 32 bits to store the module
file index. So that the encoding of the identifier won't get affected by
other modules.

#### Overhead

Similar with llvm#92083 and
llvm#86912. The change is only
expected to increase the size of the on-disk .pcm files and not affect
the compile-time performances. And from my experiment, the size of the
on-disk change only increase 1%+ and observe no compile-time impacts.

#### Future Plans

I'll try to do the same thing for type ids. IIRC, it won't change the
dependency graph if we add a new type in an unused units. I do think
this is a significant win. And this will be a pretty good answer to "why
modules are better than headers."
ChuanqiXu9 added a commit that referenced this pull request Jul 19, 2024
(Some backgrounds, not required to read:
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755)

This is the document part for the no-transitive-change
(#86912,
#92083,
#92085,
#92511) to provide the ability
for build system to skip some unnecessary recompilations.

See the patch for examples.
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
(Some backgrounds, not required to read:
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755)

This is the document part for the no-transitive-change
(llvm#86912,
llvm#92083,
llvm#92085,
llvm#92511) to provide the ability
for build system to skip some unnecessary recompilations.

See the patch for examples.
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
(Some backgrounds, not required to read:
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755)

This is the document part for the no-transitive-change
(#86912,
#92083,
#92085,
#92511) to provide the ability
for build system to skip some unnecessary recompilations.

See the patch for examples.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251509
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.