[Modules] No transitive source location change #86912

ChuanqiXu9 · 2024-03-28T05:53:58Z

This is part of "no transitive change" patch series, "no transitive source location change". I talked this with @Bigcheese in the tokyo's WG21 meeting.

The idea comes from @jyknight posted on LLVM discourse. That for:

// A.cppm
export module A;
...

// B.cppm
export module B;
import A;
...

//--- C.cppm
export module C;
import C;

Almost every time A.cppm changes, we need to recompile B. Due to we think the source location is significant to the semantics. But it may be good if we can avoid recompiling C if the change from A wouldn't change the BMI of B.

Motivation Example

This patch only cares source locations. So let's focus on source location's example. We can see the full example from the attached test.

//--- A.cppm
export module A;
export template <class T>
struct C {
    T func() {
        return T(43);
    }
};
export int funcA() {
    return 43;
}

//--- A.v1.cppm
export module A;

export template <class T>
struct C {
    T func() {
        return T(43);
    }
};
export int funcA() {
    return 43;
}

//--- B.cppm
export module B;
import A;

export int funcB() {
    return funcA();
}

//--- C.cppm
export module C;
import A;
export void testD() {
    C<int> c;
    c.func();
}

Here the only difference between A.cppm and A.v1.cppm is that A.v1.cppm has an additional blank line. Then the test shows that two BMI of B.cppm, one specified -fmodule-file=A=A.pcm and the other specified -fmodule-file=A=A.v1.pcm, should have the bit-wise same contents.

However, it is a different story for C, since C instantiates templates from A, and the instantiation records the source information from module A, which is different from A and A.v1, so it is expected that the BMI C.pcm and C.v1.pcm can and should differ.

Internal perspective of status quo

To fully understand the patch, we need to understand how we encodes source locations and how we serialize and deserialize them.

For source locations, we encoded them as:

|
|
| _____ base offset of an imported module
|
|
|
|_____ base offset of another imported module
|
|
|
|
| ___ 0

As the diagram shows, we encode the local (unloaded) source location from 0 to higher bits. And we allocate the space for source locations from the loaded modules from high bits to 0. Then the source locations from the loaded modules will be mapped to our source location space according to the allocated offset.

For example, for,

// a.cppm
export module a;
...

// b.cppm
export module b;
import a;
...

Assuming the offset of a source location (let's name the location as S) in a.cppm is 45 and we will record the value 45 into the BMI a.pcm. Then in b.cppm, when we import a, the source manager will allocate a space for module 'a' (according to the recorded number of source locations) as the base offset of module 'a' in the current source location spaces. Let's assume the allocated base offset as 90 in this example. Then when we want to get the location in the current source location space for S, we can get it simply by adding 45 to 90 to 135. Finally we can get the source location for S in module B as 135.

And when we want to write module b, we would also write the source location of S as 135 directly in the BMI. And to clarify the location S comes from module a, we also need to record the base offset of module a, 90 in the BMI of b.

Then the problem comes. Since the base offset of module 'a' is computed by the number source locations in module 'a'. In module 'b', the recorded base offset of module 'a' will change every time the number of source locations in module 'a' increase or decrease. In other words, the contents of BMI of B will change every time the number of locations in module 'a' changes. This is pretty sensitive. Almost every change will change the number of locations. So this is the problem this patch want to solve.

Let's continue with the existing design to understand what's going on. Another interesting case is:

// c.cppm
export module c;
import whatever;
import a;
import b;
...

In c.cppm, when we import a, we still need to allocate a base location offset for it, let's say the value becomes to 200 somehow. Then when we reach the location S recorded in module b, we need to translate it into the current source location space. The solution is quite simple, we can get it by 135 + (200 - 90) = 245. In another word, the offset of a source location in current module can be computed as Recorded Offset + Base Offset of the its module file - Recorded Base Offset.

Then we're almost done about how we handle the offset of source locations in serializers.

The high level design of current patch

From the abstract level, what we want to do is to remove the hardcoded base offset of imported modules and remain the ability to calculate the source location in a new module unit. To achieve this, we need to be able to find the module file owning a source location from the encoding of the source location.

So in this patch, for each source location, we will store the local offset of the location and the module file index. For the above example, in b.pcm, the source location of S will be recorded as 135 directly. And in the new design, the source location of S will be recorded as <1, 45>. Here 1 stands for the module file index of a in module b. And 45 means the offset of S to the base offset of module a.

So the trade-off here is that, to make the BMI more independent, we need to record more abstract information. And I feel it is worthy. The recompilation problem of modules is really annoying and there are still people complaining this. But if we can make this (including stopping other changes transitively), I think this may be a killer feature for modules. And from @Bigcheese , this should be helpful for clang explicit modules too.

And the benchmarking side, I tested this patch against https://github.com/alibaba/async_simple/tree/CXX20Modules. No significant change on compilation time. The size of .pcm files becomes to 204M from 200M. I think the trade-off is pretty fair.

Some low level details

I didn't use another slot to record the module file index. I tried to use the higher 32 bits of the existing source location encodings to store that information. This design may be safe. Since we use unsigned to store source locations but we use uint64_t in serialization. And generally unsigned is 32 bit width in most platforms. So it might not be a safe problem. Since all the bits we used to store the module file index is not used before. So the new encodings may be:

   |-----------------------|-----------------------|
   |           A           |         B         | C |

  * A: 32 bit. The index of the module file in the module manager + 1. The +1
          here is necessary since we wish 0 stands for the current module file.
  * B: 31 bit. The offset of the source location to the module file containing it.
  * C: The macro bit. We rotate it to the lowest bit so that we can save some 
          space in case the index of the module file is 0.

(The B and C is the existing raw encoding for source locations)

Another reason to reuse the same slot of the source location is to reduce the impact of the patch. Since there are a lot of places assuming we can store and get a source location from a slot. And if I tried to add another slot, a lot of codes breaks. I don't feel it is worhty.

Another impact of this decision is that, the existing small optimizations for encoding source location may be invalided. The key of the optimization is that we can turn large values into small values then we can use VBR6 format to reduce the size. But if we decided to put the module file index into the higher bits, then maybe it simply doesn't work. An example may be the SourceLocationSequence optimization.

This will only affect the size of on-disk .pcm files. I don't expect this impact the speed and memory use of compilations. And seeing my small experiments above, I feel this trade off is worthy.

Correctness

The mental model for handling source location offsets is not so complex and I believe we can solve it by adding module file index to each stored source location.

For the practical side, since the source location is pretty sensitive, and the patch can pass all the in-tree tests and a small scale projects, I feel it should be correct.

Future Plans

I'll continue to work on no transitive decl change and no transitive identifier change (if matters) to achieve the goal to stop the propagation of unnecessary changes. But all of this depends on this patch. Since, clearly, the source locations are the most sensitive thing.

The release nots and documentation will be added seperately.

github-actions · 2024-03-28T05:56:52Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2024-03-28T08:02:12Z

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Chuanqi Xu (ChuanqiXu9)

Changes

This is part of "no transitive change" patch series, "no transitive source location change". I talked this with @Bigcheese in the tokyo's WG21 meeting.

The idea comes from @jyknight posted on LLVM discourse. That for:

// A.cppm
export module A;
...

// B.cppm
export module B;
import A;
...

//--- C.cppm
export module C;
import C;

Almost every time A.cppm changes, we need to recompile C. Due to we think the source location is significant to the semantics. But it may be good if we can avoid recompiling C if the change from A wouldn't change the BMI of B.

Motivation Example

This patch only cares source locations. So let's focus on source location's example. We can see the full example from the attached test.

//--- A.cppm
export module A;
export template &lt;class T&gt;
struct C {
    T func() {
        return T(43);
    }
};
export int funcA() {
    return 43;
}

//--- A.v1.cppm
export module A;

export template &lt;class T&gt;
struct C {
    T func() {
        return T(43);
    }
};
export int funcA() {
    return 43;
}

//--- B.cppm
export module B;
import A;

export int funcB() {
    return funcA();
}

//--- C.cppm
export module C;
import A;
export void testD() {
    C&lt;int&gt; c;
    c.func();
}

Here the only difference between A.cppm and A.v1.cppm is that A.v1.cppm has an additional blank line. Then the test shows that two BMI of B.cppm, one specified -fmodule-file=A=A.pcm and the other specified -fmodule-file=A=A.v1.pcm, should have the bit-wise same contents.

However, it is a different story for C, since C instantiates templates from A, and the instantiation records the source information from module A, which is different from A and A.v1, so it is expected that the BMI C.pcm and C.v1.pcm can and should differ.

Internal perspective of status quo

To fully understand the patch, we need to understand how we encodes source locations and how we serialize and deserialize them.

For source locations, we encoded them as:

|
|
| _____ base offset of an imported module
|
|
|
|_____ base offset of another imported module
|
|
|
|
| ___ 0

As the diagram shows, we encode the local (unloaded) source location from 0 to higher bits. And we allocate the space for source locations from the loaded modules from high bits to 0. Then the source locations from the loaded modules will be mapped to our source location space according to the allocated offset.

For example, for,

// a.cppm
export module a;
...

// b.cppm
export module b;
import a;
...

Assuming the offset of a source location (let's name the location as S) in a.cppm is 45 and we will record the value 45 into the BMI a.pcm. Then in b.cppm, when we import a, the source manager will allocate a space for module 'a' (according to the recorded number of source locations) as the base offset of module 'a' in the current source location spaces. Let's assume the allocated base offset as 90 in this example. Then when we want to get the location in the current source location space for S, we can get it simply by adding 45 to 90 to 135. Finally we can get the source location for S in module B as 135.

And when we want to write module b, we would also write the source location of S as 135 directly in the BMI. And to clarify the location S comes from module a, we also need to record the base offset of module a, 90 in the BMI of b.

Then the problem comes. Since the base offset of module 'a' is computed by the number source locations in module 'a'. In module 'b', the recorded base offset of module 'a' will change every time the number of source locations in module 'a' increase or decrease. In other words, the contents of BMI of B will change every time the number of locations in module 'a' changes. This is pretty sensitive. Almost every change will change the number of locations. So this is the problem this patch want to solve.

Let's continue with the existing design to understand what's going on. Another interesting case is:

// c.cppm
export module c;
import a;
import b;
...

In c.cppm, when we import a, we still need to allocate a base location offset for it, let's say the value becomes to 200 somehow. Then when we reach the location S recorded in module b, we need to translate it into the current source location space. The solution is quite simple, we can get it by 135 + (200 - 90) = 245. In another word, the offset of a source location in current module can be computed as Recorded Offset + Base Offset of the its module file - Recorded Base Offset.

Then we're almost done about how we handle the offset of source locations in serializers.

The high level design of current patch

From the abstract level, what we want to do is to remove the hardcoded base offset of imported modules and remain the ability to calculate the source location in a new module unit. To achieve this, we need to be able to find the module file owning a source location from the encoding of the source location.

So in this patch, for each source location, we will store the local offset of the location and the module file index. For the above example, in b.pcm, the source location of S will be recorded as 135 directly. And in the new design, the source location of S will be recorded as <1, 45>. Here 1 stands for the module file index of a in module b. And 45 means the offset of S to the base offset of module a.

So the trade-off here is that, to make the BMI more independent, we need to record more abstract information. And I feel it is worthy. The recompilation problem of modules is really annoying and there are still people complaining this. But if we can make this (including stopping other changes transitively), I think this may be a killer feature for modules. And from @Bigcheese , this should be helpful for clang explicit modules too.

And the benchmarking side, I tested this patch against https://github.com/alibaba/async_simple/tree/CXX20Modules. No significant change on compilation time. The size of .pcm files becomes to 208M from 200M. I think the trade-off is pretty fair.

Some low level details

I didn't use another slot to record the module file index. I tried to use the higher bits of the existing source location encodings to store that information. This design may be safe. Since we use unsigned to store source locations but we use uint64_t in serialization. And generally unsigned is 32 bit width in most platforms. So it might not be a safe problem.

Another reason to reuse the same slot of the source location is to reduce the impact of the patch. Since there are a lot of places assuming we can store and get a source location from a slot. And if I tried to add another slot, a lot of codes breaks. I don't feel it is worhty.

Another impact of this decision is that, the existing small optimizations for encoding source location may be invalided. The key of the optimization is that we can turn large values into small values then we can use VBR6 format to reduce the size. But if we decided to put the module file index into the higher bits, then maybe it simply doesn't work. An example may be the SourceLocationSequence optimization.

This will only affect the size of on-disk .pcm files. I don't expect this impact the speed and memory use of compilations. And seeing my small experiments above, I feel this trade off is worthy. I don't remove optimizations as SourceLocationSequence in the current patch to avoid increasing the size of the current patch. I'll try to remove that as a NFC patch after this landed.

Correctness

The mental model for handling source location offsets is not so complex and I believe we can solve it by adding module file index to each stored source location. For the practical side, since the source location is pretty sensitive, and the patch can pass all the in-tree tests and a small scale projects, I feel it may be correct.

Future Plans

I'll continue to work on no transitive decl change and no transitive identifier change (if matters) to achieve the goal to stop the propagation of unnecessary changes. But all of this depends on this patch. Since, clearly, the source locations are the most sensitive thing.

Patch is 42.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/86912.diff

15 Files Affected:

(modified) clang/include/clang/Basic/SourceLocation.h (+1)
(modified) clang/include/clang/Serialization/ASTBitCodes.h (+25-31)
(modified) clang/include/clang/Serialization/ASTReader.h (+34-20)
(modified) clang/include/clang/Serialization/ASTWriter.h (+4)
(modified) clang/include/clang/Serialization/ModuleFile.h (-4)
(modified) clang/include/clang/Serialization/SourceLocationEncoding.h (+53-25)
(modified) clang/lib/Frontend/ASTUnit.cpp (-2)
(modified) clang/lib/Serialization/ASTReader.cpp (+33-53)
(modified) clang/lib/Serialization/ASTReaderDecl.cpp (+1-1)
(modified) clang/lib/Serialization/ASTWriter.cpp (+34-7)
(modified) clang/lib/Serialization/ASTWriterDecl.cpp (+5-3)
(modified) clang/lib/Serialization/ModuleFile.cpp (-1)
(added) clang/test/Modules/no-transitive-source-location-change.cppm (+69)
(modified) clang/test/Modules/pr61067.cppm (-25)
(modified) clang/unittests/Serialization/SourceLocationEncodingTest.cpp (+5-61)

diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h
index 00b1e0fa855b7a..7a0f5ba8d1270b 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -90,6 +90,7 @@ class SourceLocation {
   friend class ASTWriter;
   friend class SourceManager;
   friend struct llvm::FoldingSetTrait<SourceLocation, void>;
+  friend class SourceLocationEncoding;
 
 public:
   using UIntTy = uint32_t;
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index f31efa5117f0d1..628ce03572fea6 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -22,6 +22,7 @@
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/OperatorKinds.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Serialization/SourceLocationEncoding.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Bitstream/BitCodes.h"
 #include <cassert>
@@ -175,45 +176,38 @@ const unsigned int NUM_PREDEF_SUBMODULE_IDS = 1;
 
 /// Source range/offset of a preprocessed entity.
 struct PPEntityOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
 
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
   /// Offset in the AST file relative to ModuleFile::MacroOffsetsBase.
   uint32_t BitOffset;
 
-  PPEntityOffset(SourceRange R, uint32_t BitOffset)
-      : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()),
-        BitOffset(BitOffset) {}
-
-  SourceLocation getBegin() const {
-    return SourceLocation::getFromRawEncoding(Begin);
-  }
+  PPEntityOffset(RawLocEncoding Begin, RawLocEncoding End, uint32_t BitOffset)
+      : Begin(Begin), End(End), BitOffset(BitOffset) {}
 
-  SourceLocation getEnd() const {
-    return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Source range of a skipped preprocessor region
 struct PPSkippedRange {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location of beginning of range.
-  SourceLocation::UIntTy Begin;
+  RawLocEncoding Begin;
   /// Raw source location of end of range.
-  SourceLocation::UIntTy End;
+  RawLocEncoding End;
 
-  PPSkippedRange(SourceRange R)
-      : Begin(R.getBegin().getRawEncoding()), End(R.getEnd().getRawEncoding()) {
-  }
+  PPSkippedRange(RawLocEncoding Begin, RawLocEncoding End)
+      : Begin(Begin), End(End) {}
 
-  SourceLocation getBegin() const {
-    return SourceLocation::getFromRawEncoding(Begin);
-  }
-  SourceLocation getEnd() const {
-    return SourceLocation::getFromRawEncoding(End);
-  }
+  RawLocEncoding getBegin() const { return Begin; }
+  RawLocEncoding getEnd() const { return End; }
 };
 
 /// Offset in the AST file. Use splitted 64-bit integer into low/high
@@ -239,8 +233,10 @@ struct UnderalignedInt64 {
 
 /// Source location and bit offset of a declaration.
 struct DeclOffset {
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Raw source location.
-  SourceLocation::UIntTy Loc = 0;
+  RawLocEncoding RawLoc = 0;
 
   /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep
   /// structure alignment 32-bit and avoid padding gap because undefined
@@ -248,17 +244,15 @@ struct DeclOffset {
   UnderalignedInt64 BitOffset;
 
   DeclOffset() = default;
-  DeclOffset(SourceLocation Loc, uint64_t BitOffset,
-             uint64_t DeclTypesBlockStartOffset) {
-    setLocation(Loc);
+  DeclOffset(RawLocEncoding RawLoc, uint64_t BitOffset,
+             uint64_t DeclTypesBlockStartOffset)
+      : RawLoc(RawLoc) {
     setBitOffset(BitOffset, DeclTypesBlockStartOffset);
   }
 
-  void setLocation(SourceLocation L) { Loc = L.getRawEncoding(); }
+  void setRawLoc(RawLocEncoding Loc) { RawLoc = Loc; }
 
-  SourceLocation getLocation() const {
-    return SourceLocation::getFromRawEncoding(Loc);
-  }
+  RawLocEncoding getRawLoc() const { return RawLoc; }
 
   void setBitOffset(uint64_t Offset, const uint64_t DeclTypesBlockStartOffset) {
     BitOffset.setBitOffset(Offset - DeclTypesBlockStartOffset);
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 370d8037a4da17..017c6b76a91495 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -696,7 +696,7 @@ class ASTReader
   /// Mapping from global submodule IDs to the module file in which the
   /// submodule resides along with the offset that should be added to the
   /// global submodule ID to produce a local ID.
-  GlobalSubmoduleMapType GlobalSubmoduleMap;
+  mutable GlobalSubmoduleMapType GlobalSubmoduleMap;
 
   /// A set of hidden declarations.
   using HiddenNames = SmallVector<Decl *, 2>;
@@ -942,6 +942,12 @@ class ASTReader
   /// Sema tracks these to emit deferred diags.
   llvm::SmallSetVector<serialization::DeclID, 4> DeclsToCheckForDeferredDiags;
 
+  /// The module files imported by different module files. Indirectly imported
+  /// module files are included too. The information comes from
+  /// ReadModuleOffsetMap(ModuleFile&).
+  mutable llvm::DenseMap<ModuleFile *, llvm::SmallVector<ModuleFile *>>
+      ImportedModuleFiles;
+
 private:
   struct ImportedSubmodule {
     serialization::SubmoduleID ID;
@@ -1761,6 +1767,7 @@ class ASTReader
 
   /// Retrieve the module manager.
   ModuleManager &getModuleManager() { return ModuleMgr; }
+  const ModuleManager &getModuleManager() const { return ModuleMgr; }
 
   /// Retrieve the preprocessor.
   Preprocessor &getPreprocessor() const { return PP; }
@@ -2170,8 +2177,8 @@ class ASTReader
 
   /// Retrieve the global submodule ID given a module and its local ID
   /// number.
-  serialization::SubmoduleID
-  getGlobalSubmoduleID(ModuleFile &M, unsigned LocalID);
+  serialization::SubmoduleID getGlobalSubmoduleID(ModuleFile &M,
+                                                  unsigned LocalID) const;
 
   /// Retrieve the submodule that corresponds to a global submodule ID.
   ///
@@ -2184,7 +2191,7 @@ class ASTReader
 
   /// Retrieve the module file with a given local ID within the specified
   /// ModuleFile.
-  ModuleFile *getLocalModuleFile(ModuleFile &M, unsigned ID);
+  ModuleFile *getLocalModuleFile(ModuleFile &M, unsigned ID) const;
 
   /// Get an ID for the given module file.
   unsigned getModuleFileID(ModuleFile *M);
@@ -2220,40 +2227,47 @@ class ASTReader
     return Sema::AlignPackInfo::getFromRawEncoding(Raw);
   }
 
+  using RawLocEncoding = SourceLocationEncoding::RawLocEncoding;
+
   /// Read a source location from raw form and return it in its
   /// originating module file's source location space.
-  SourceLocation ReadUntranslatedSourceLocation(SourceLocation::UIntTy Raw,
-                                                LocSeq *Seq = nullptr) const {
+  std::pair<SourceLocation, unsigned>
+  ReadUntranslatedSourceLocation(RawLocEncoding Raw,
+                                 LocSeq *Seq = nullptr) const {
     return SourceLocationEncoding::decode(Raw, Seq);
   }
 
   /// Read a source location from raw form.
-  SourceLocation ReadSourceLocation(ModuleFile &ModuleFile,
-                                    SourceLocation::UIntTy Raw,
-                                    LocSeq *Seq = nullptr) const {
-    SourceLocation Loc = ReadUntranslatedSourceLocation(Raw, Seq);
-    return TranslateSourceLocation(ModuleFile, Loc);
+  SourceLocation ReadRawSourceLocation(ModuleFile &MF, RawLocEncoding Raw,
+                                       LocSeq *Seq = nullptr) const {
+    if (!MF.ModuleOffsetMap.empty())
+      ReadModuleOffsetMap(MF);
+
+    auto [Loc, ModuleFileIndex] = ReadUntranslatedSourceLocation(Raw, Seq);
+    ModuleFile *ModuleFileHomingLoc =
+        ModuleFileIndex ? ImportedModuleFiles[&MF][ModuleFileIndex - 1] : &MF;
+    return TranslateSourceLocation(*ModuleFileHomingLoc, Loc);
   }
 
   /// Translate a source location from another module file's source
   /// location space into ours.
   SourceLocation TranslateSourceLocation(ModuleFile &ModuleFile,
                                          SourceLocation Loc) const {
-    if (!ModuleFile.ModuleOffsetMap.empty())
-      ReadModuleOffsetMap(ModuleFile);
-    assert(ModuleFile.SLocRemap.find(Loc.getOffset()) !=
-               ModuleFile.SLocRemap.end() &&
-           "Cannot find offset to remap.");
-    SourceLocation::IntTy Remap =
-        ModuleFile.SLocRemap.find(Loc.getOffset())->second;
-    return Loc.getLocWithOffset(Remap);
+    if (Loc.isInvalid())
+      return Loc;
+
+    // It implies that the Loc is already translated.
+    if (SourceMgr.isLoadedSourceLocation(Loc))
+      return Loc;
+
+    return Loc.getLocWithOffset(ModuleFile.SLocEntryBaseOffset - 2);
   }
 
   /// Read a source location.
   SourceLocation ReadSourceLocation(ModuleFile &ModuleFile,
                                     const RecordDataImpl &Record, unsigned &Idx,
                                     LocSeq *Seq = nullptr) {
-    return ReadSourceLocation(ModuleFile, Record[Idx++], Seq);
+    return ReadRawSourceLocation(ModuleFile, Record[Idx++], Seq);
   }
 
   /// Read a FileID.
diff --git a/clang/include/clang/Serialization/ASTWriter.h b/clang/include/clang/Serialization/ASTWriter.h
index 3ed9803fa3745b..70bf204ed598ef 100644
--- a/clang/include/clang/Serialization/ASTWriter.h
+++ b/clang/include/clang/Serialization/ASTWriter.h
@@ -648,6 +648,10 @@ class ASTWriter : public ASTDeserializationListener,
   void AddSourceLocation(SourceLocation Loc, RecordDataImpl &Record,
                          LocSeq *Seq = nullptr);
 
+  /// Return the raw encodings for source locations.
+  SourceLocationEncoding::RawLocEncoding
+  getRawSourceLocationEncoding(SourceLocation Loc, LocSeq *Seq = nullptr);
+
   /// Emit a source range.
   void AddSourceRange(SourceRange Range, RecordDataImpl &Record,
                       LocSeq *Seq = nullptr);
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index bc0aa89966c2b4..ea24b44e5e411b 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -295,10 +295,6 @@ class ModuleFile {
   /// AST file.
   const uint32_t *SLocEntryOffsets = nullptr;
 
-  /// Remapping table for source locations in this module.
-  ContinuousRangeMap<SourceLocation::UIntTy, SourceLocation::IntTy, 2>
-      SLocRemap;
-
   // === Identifiers ===
 
   /// The number of identifiers in this AST file.
diff --git a/clang/include/clang/Serialization/SourceLocationEncoding.h b/clang/include/clang/Serialization/SourceLocationEncoding.h
index 9bb0dbe2e4d6f6..3925b08c1837ed 100644
--- a/clang/include/clang/Serialization/SourceLocationEncoding.h
+++ b/clang/include/clang/Serialization/SourceLocationEncoding.h
@@ -6,23 +6,26 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// Source locations are stored pervasively in the AST, making up a third of
-// the size of typical serialized files. Storing them efficiently is important.
+// We wish to encode the SourceLocation from other module file not dependent
+// on the other module file. So that the source location changes from other
+// module file may not affect the contents of the current module file. Then the
+// users don't need to recompile the whole project due to a new line in a module
+// unit in the root of the dependency graph.
 //
-// We use integers optimized by VBR-encoding, because:
-//  - when abbreviations cannot be used, VBR6 encoding is our only choice
-//  - in the worst case a SourceLocation can be ~any 32-bit number, but in
-//    practice they are highly predictable
+// To achieve this, we need to encode the index of the module file into the
+// encoding of the source location. The encoding of the source location may be:
 //
-// We encode the integer so that likely values encode as small numbers that
-// turn into few VBR chunks:
-//  - the invalid sentinel location is a very common value: it encodes as 0
-//  - the "macro or not" bit is stored at the bottom of the integer
-//    (rather than at the top, as in memory), so macro locations can have
-//    small representations.
-//  - related locations (e.g. of a left and right paren pair) are usually
-//    similar, so when encoding a sequence of locations we store only
-//    differences between successive elements.
+//      |-----------------------|-----------------------|
+//      |          A            |         B         | C |
+//
+//  * A: 32 bit. The index of the module file in the module manager + 1. The +1
+//  here
+//       is necessary since we wish 0 stands for the current module file.
+//  * B: 31 bit. The offset of the source location to the module file containing
+//  it.
+//  * C: The macro bit. We rotate it to the lowest bit so that we can save some
+//  space
+//       in case the index of the module file is 0.
 //
 //===----------------------------------------------------------------------===//
 
@@ -52,11 +55,20 @@ class SourceLocationEncoding {
   friend SourceLocationSequence;
 
 public:
-  static uint64_t encode(SourceLocation Loc,
-                         SourceLocationSequence * = nullptr);
-  static SourceLocation decode(uint64_t, SourceLocationSequence * = nullptr);
+  using RawLocEncoding = uint64_t;
+
+  static RawLocEncoding encode(SourceLocation Loc, UIntTy BaseOffset,
+                               unsigned BaseModuleFileIndex,
+                               SourceLocationSequence * = nullptr);
+  static std::pair<SourceLocation, unsigned>
+  decode(RawLocEncoding, SourceLocationSequence * = nullptr);
 };
 
+/// TODO: Remove SourceLocationSequence since it is not used now.
+/// Since we will put the index for ModuleFile in the high bits in the encodings
+/// for source locations, it is meaningless to reduce the size of source
+/// locations.
+///
 /// Serialized encoding of a sequence of SourceLocations.
 ///
 /// Optimized to produce small values when locations with the sequence are
@@ -149,14 +161,30 @@ class SourceLocationSequence::State {
   operator SourceLocationSequence *() { return &Seq; }
 };
 
-inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc,
-                                               SourceLocationSequence *Seq) {
-  return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding());
+inline SourceLocationEncoding::RawLocEncoding
+SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset,
+                               unsigned BaseModuleFileIndex,
+                               SourceLocationSequence *Seq) {
+  if (Loc.isInvalid())
+    return 0;
+
+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));
+
+  assert(BaseModuleFileIndex < ((RawLocEncoding)1 << 32));
+  Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32;
+  return Encoded;
 }
-inline SourceLocation
-SourceLocationEncoding::decode(uint64_t Encoded, SourceLocationSequence *Seq) {
-  return Seq ? Seq->decode(Encoded)
-             : SourceLocation::getFromRawEncoding(decodeRaw(Encoded));
+inline std::pair<SourceLocation, unsigned>
+SourceLocationEncoding::decode(RawLocEncoding Encoded,
+                               SourceLocationSequence *Seq) {
+  unsigned ModuleFileIndex = Encoded >> 32;
+  Encoded &= ((RawLocEncoding)1 << 33) - 1;
+  SourceLocation Loc = SourceLocation::getFromRawEncoding(decodeRaw(Encoded));
+
+  return {Loc, ModuleFileIndex};
 }
 
 } // namespace clang
diff --git a/clang/lib/Frontend/ASTUnit.cpp b/clang/lib/Frontend/ASTUnit.cpp
index 3610a08831e79a..1c655260b09eb5 100644
--- a/clang/lib/Frontend/ASTUnit.cpp
+++ b/clang/lib/Frontend/ASTUnit.cpp
@@ -2373,8 +2373,6 @@ bool ASTUnit::serialize(raw_ostream &OS) {
   return serializeUnit(Writer, Buffer, getSema(), OS);
 }
 
-using SLocRemap = ContinuousRangeMap<unsigned, int, 2>;
-
 void ASTUnit::TranslateStoredDiagnostics(
                           FileManager &FileMgr,
                           SourceManager &SrcMgr,
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 28e8d27fef08c6..625666ead0fb49 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -1645,7 +1645,7 @@ bool ASTReader::ReadSLocEntry(int ID) {
     if (!File)
       return true;
 
-    SourceLocation IncludeLoc = ReadSourceLocation(*F, Record[1]);
+    SourceLocation IncludeLoc = ReadRawSourceLocation(*F, Record[1]);
     if (IncludeLoc.isInvalid() && F->Kind != MK_MainFile) {
       // This is the module's main file.
       IncludeLoc = getImportLocation(F);
@@ -1687,7 +1687,7 @@ bool ASTReader::ReadSLocEntry(int ID) {
     unsigned Offset = Record[0];
     SrcMgr::CharacteristicKind
       FileCharacter = (SrcMgr::CharacteristicKind)Record[2];
-    SourceLocation IncludeLoc = ReadSourceLocation(*F, Record[1]);
+    SourceLocation IncludeLoc = ReadRawSourceLocation(*F, Record[1]);
     if (IncludeLoc.isInvalid() && F->isModule()) {
       IncludeLoc = getImportLocation(F);
     }
@@ -1707,9 +1707,9 @@ bool ASTReader::ReadSLocEntry(int ID) {
 
   case SM_SLOC_EXPANSION_ENTRY: {
     LocSeq::State Seq;
-    SourceLocation SpellingLoc = ReadSourceLocation(*F, Record[1], Seq);
-    SourceLocation ExpansionBegin = ReadSourceLocation(*F, Record[2], Seq);
-    SourceLocation ExpansionEnd = ReadSourceLocation(*F, Record[3], Seq);
+    SourceLocation SpellingLoc = ReadRawSourceLocation(*F, Record[1], Seq);
+    SourceLocation ExpansionBegin = ReadRawSourceLocation(*F, Record[2], Seq);
+    SourceLocation ExpansionEnd = ReadRawSourceLocation(*F, Record[3], Seq);
     SourceMgr.createExpansionLoc(SpellingLoc, ExpansionBegin, ExpansionEnd,
                                  Record[5], Record[4], ID,
                                  BaseOffset + Record[0]);
@@ -3038,8 +3038,10 @@ ASTReader::ReadControlBlock(ModuleFile &F,
         // The import location will be the local one for now; we will adjust
         // all import locations of module imports after the global source
         // location info are setup, in ReadAST.
-        SourceLocation ImportLoc =
+        auto [ImportLoc, ImportModuleFileIndex] =
             ReadUntranslatedSourceLocation(Record[Idx++]);
+        // The import location must belong to the current module file itself.
+        assert(ImportModuleFileIndex == 0);
         off_t StoredSize = !IsImportingStdCXXModule ? (off_t)Record[Idx++] : 0;
         time_t StoredModTime =
             !IsImportingStdCXXModule ? (time_t)Record[Idx++] : 0;
@@ -3658,13 +3660,6 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
           std::make_pair(SourceManager::MaxLoadedOffset - F.SLocEntryBaseOffset
                            - SLocSpaceSize,&F));
 
-      // Initialize the remapping table.
-      // Invalid stays invalid.
-      F.SLocRemap.insertOrReplace(std::make_pair(0U, 0));
-      // This module. Base was 2 when being compiled.
-      F.SLocRemap.insertOrReplace(std::make_pair(
-          2U, static_cast<SourceLocation::IntTy>(F.SLocEntryBaseOffset - 2)));
-
       TotalNumSLocEntries += F.LocalNumSLocEntries;
       break;
     }
@@ -3941,7 +3936,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       if (Record.size() != 1)
         return llvm::createStringError(std::errc::illegal_byte_sequence,
                                        "invalid pragma optimize record");
-      OptimizeOffPragmaLocation = ReadSourceLocation(F, Record[0]);
+      OptimizeOffPragmaLocation = ReadRawSourceLocation(F, Record[0]);
       break;
 
     case MSSTRUCT_PRAGMA_OPTIONS:
@@ -3957,7 +3952,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
             std::errc::illegal_byte_sequence,
             "invalid pragma pointers to members record");
       PragmaMSPointersToMembersState = Record[0];
-      PointersToMembersPragmaLocation = ReadSourceLocation(F, Record[1]);
+      PointersToMembersPragmaLocation = ReadRawSourceLocation(F, Record[1]);
       break;
 
     case UNUSED_LOCAL_TYPEDEF_NAME_CA...
[truncated]

ChuanqiXu9 · 2024-03-28T08:39:40Z

BTW, after this patch, with reduced BMI (#85050), we can already do something more interesting than reformating:

//--- A.cppm
export module A;
int funcA0();
int funcA1();
export int funcA() {
    return funcA0();
}

//--- A.v1.cppm
export module A;

int funcA0();
int funcA1();
export int funcA() {
    return funcA0() + funcA1();
}

//--- B.cppm
export module B;
import A;

export int funcB() {
    return funcA();
}

Now the B.pcm will keep unchanged with A.pcm from A.cppm and A.v1.pcm from A.v1.cppm. We changed the implementation of funcA() from return funcA0(); to return funcA0() + funcA1();. And the B.pcm can still get the same content.

jyknight · 2024-03-28T14:32:24Z

+1 on the high-level plan. Switching from a linear offset to a {local-module-index, offset-within-module} pair sounds great!

jansvoboda11 · 2024-03-28T18:21:37Z

By default, SourceLocation is 32 bits. One bit is used to distinguish macro expansions. Looking at libc++'s module map, it contains 999 top-level modules at this moment. That's 10 bits just to be able to import the (entire) standard library. That leaves 21 bits, restricting local SourceLocation space to 2 MB. This doesn't sound feasible. Did I misunderstand?

ChuanqiXu9 · 2024-03-29T01:28:55Z

By default, SourceLocation is 32 bits. One bit is used to distinguish macro expansions. Looking at libc++'s module map, it contains 999 top-level modules at this moment. That's 10 bits just to be able to import the (entire) standard library. That leaves 21 bits, restricting local SourceLocation space to 2 MB. This doesn't sound feasible. Did I misunderstand?

Yes, I explained this in Some low level details section. The size of source location won't be affected. Since the size of source location is unsigned (practically, it is 32 bits in most platforms). And we use uint64_t as a unit in the serializer. So there are 32 bit not used completely. The plan is to store the module file index in the higher 32 bits and it shouldn't be a safe problem. Maybe the original wording is not so clear. I've updated it.

The only trade-off I saw about this change is that it may increase the size of on-disk .pcm files due to we use VBR6 format to decrease the size of small numbers. But on the one side, we still need to pay for more spaces if we want to use {local-module-index, offset-within-module} pair (Thanks for the good name suggestion). On the other hand, from the experiment, it shows the overhead is acceptable.

jansvoboda11 · 2024-03-29T02:50:18Z

Yes, I explained this in Some low level details section. The size of source location won't be affected. Since the size of source location is unsigned (practically, it is 32 bits in most platforms). And we use uint64_t as a unit in the serializer. So there are 32 bit not used completely. The plan is to store the module file index in the higher 32 bits and it shouldn't be a safe problem. Maybe the original wording is not so clear. I've updated it.

Thank you, using 64 bits in the serialization format makes sense! This also means that whenever Clang is configured with 64 bit SourceLocation, we should be using 96 bits for serialization: 32 bits for the module file index and 64 bits for the offset itself, correct?

The only trade-off I saw about this change is that it may increase the size of on-disk .pcm files due to we use VBR6 format to decrease the size of small numbers. But on the one side, we still need to pay for more spaces if we want to use {local-module-index, offset-within-module} pair (Thanks for the good name suggestion). On the other hand, from the experiment, it shows the overhead is acceptable.

Sorry, I don't quite understand. Are you saying you did or did not try to encode this as two separate 32bit values?

ChuanqiXu9 · 2024-03-29T03:02:51Z

Yes, I explained this in Some low level details section. The size of source location won't be affected. Since the size of source location is unsigned (practically, it is 32 bits in most platforms). And we use uint64_t as a unit in the serializer. So there are 32 bit not used completely. The plan is to store the module file index in the higher 32 bits and it shouldn't be a safe problem. Maybe the original wording is not so clear. I've updated it.

Thank you, using 64 bits in the serialization format makes sense! This also means that whenever Clang is configured with 64 bit SourceLocation, we should be using 96 bits for serialization: 32 bits for the module file index and 64 bits for the offset itself, correct?

If Clang is configured with 64 bit SourceLocation, we can't use 96 bits for serialization. We can at most use 64 bits for a slot. In that case, we can only assume the offset of source location in its own module (not the global offset!) is not large than 2^32. I feel this assumption may be valid in a lot of places.

Or otherwise we can use less bits for module file index (32 bits seems to be too much honestly), then we can use higher 16 bits to store the module file index, and leave the lower 48 bits to store the source location. In this case, the assumption becomes to "the offset of the source location may not large than 2^48". But it is slightly hard to believe we can reach such extreme cases.

The only trade-off I saw about this change is that it may increase the size of on-disk .pcm files due to we use VBR6 format to decrease the size of small numbers. But on the one side, we still need to pay for more spaces if we want to use {local-module-index, offset-within-module} pair (Thanks for the good name suggestion). On the other hand, from the experiment, it shows the overhead is acceptable.

Sorry, I don't quite understand. Are you saying you did or did not try to encode this as two separate 32bit values?

I tried to encode this as two separate 32bit values. But it will break too many codes. Since a lot of places assume that we can encode the source location as an uint64_t.

What I mean is, with VBR6 format (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer), we can save more space for small integers in on-disk .pcm files (the memory representation should be the same). For example, for a 64 bits unsigned int 1, VBR6 can use only 6 bits to store that 000001 to represent the 64 bits value 1 in the on-disk representations. So that even if I don't use more slots to store the module file index, the size of the .pcm files will increase after all.

jansvoboda11 · 2024-03-29T16:32:48Z

If Clang is configured with 64 bit SourceLocation, we can't use 96 bits for serialization. We can at most use 64 bits for a slot. In that case, we can only assume the offset of source location in its own module (not the global offset!) is not large than 2^32. I feel this assumption may be valid in a lot of places.

Or otherwise we can use less bits for module file index (32 bits seems to be too much honestly), then we can use higher 16 bits to store the module file index, and leave the lower 48 bits to store the source location. In this case, the assumption becomes to "the offset of the source location may not large than 2^48". But it is slightly hard to believe we can reach such extreme cases.

Let's see if @statham-arm (who introduced the SourceLocation::[U]IntTy typedefs) wants to weight in here.

I tried to encode this as two separate 32bit values. But it will break too many codes. Since a lot of places assume that we can encode the source location as an uint64_t.

What I mean is, with VBR6 format (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer), we can save more space for small integers in on-disk .pcm files (the memory representation should be the same). For example, for a 64 bits unsigned int 1, VBR6 can use only 6 bits to store that 000001 to represent the 64 bits value 1 in the on-disk representations. So that even if I don't use more slots to store the module file index, the size of the .pcm files will increase after all.

Right. My thinking was that single 64bit value with the module file index in the upper 32 bits would basically disable VBR6 encoding for the lower 32 bits. If we split this thing into two separate 32bit values, we are more likely to VBR6 encode both of them. But this would actually increase size for (what I assume is the most common case) local source locations. Still, I think having a rough idea of how alternative implementations compare would be great.

Do you have any data on how much recompilation this can save for real world projects?

ChuanqiXu9 · 2024-04-01T01:54:39Z

If Clang is configured with 64 bit SourceLocation, we can't use 96 bits for serialization. We can at most use 64 bits for a slot. In that case, we can only assume the offset of source location in its own module (not the global offset!) is not large than 2^32. I feel this assumption may be valid in a lot of places.
Or otherwise we can use less bits for module file index (32 bits seems to be too much honestly), then we can use higher 16 bits to store the module file index, and leave the lower 48 bits to store the source location. In this case, the assumption becomes to "the offset of the source location may not large than 2^48". But it is slightly hard to believe we can reach such extreme cases.

Let's see if @statham-arm (who introduced the SourceLocation::[U]IntTy typedefs) wants to weight in here.

I tried to encode this as two separate 32bit values. But it will break too many codes. Since a lot of places assume that we can encode the source location as an uint64_t.
What I mean is, with VBR6 format (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer), we can save more space for small integers in on-disk .pcm files (the memory representation should be the same). For example, for a 64 bits unsigned int 1, VBR6 can use only 6 bits to store that 000001 to represent the 64 bits value 1 in the on-disk representations. So that even if I don't use more slots to store the module file index, the size of the .pcm files will increase after all.

Right. My thinking was that single 64bit value with the module file index in the upper 32 bits would basically disable VBR6 encoding for the lower 32 bits. If we split this thing into two separate 32bit values, we are more likely to VBR6 encode both of them. But this would actually increase size for (what I assume is the most common case) local source locations.

Yes, this is the trade offs.

Still, I think having a rough idea of how alternative implementations compare would be great.

Do you have any data on how much recompilation this can save for real world projects?

I don't have a specific data though. But I think it is understandable that the feature is super important. And we can save countless unnecessary compilations after this feature.

After the patch, for reduced BMI, we're already able to avoid changing the BMI if we only change the definitions of non-inline function bodies in the module unit.

Further more, based on this patch, we can do further optimizations. e.g., if we split declaration ID like we did in this patch, we may be able to avoid recompilations if a unreferenced module unit adds or deletes declarations. This may be pretty common in practice:

export module interface:part1;
...

//--- 
export module interface:part2;
...

//--- export module interface;
export import :part1;
export import :part2;

//--- consumer.cppm
export module consumer;
import interface;
// only used declarations in interface:part1;

Then if the user only changes module unit interface:part2, then we can keep the BMI for consumer.cppm the same. That implies every user of consumer.cppm can avoid recompilations.

I see this is a killer feature for C++20 modules. I think it is significantly valuable. So I really want to make this (including later optimizations) into clang19.

statham-arm · 2024-04-02T16:09:44Z

Let's see if @statham-arm (who introduced the SourceLocation::[U]IntTy typedefs) wants to weight in here.

I'm afraid my knowledge of C++ modules is very close to zero. They were mentioned in a training course I did last year, but not in much detail.

On 64-bit SourceLocation in general: our patch series to implement those as an option in clang was never fully landed, because the second half of it stalled in review. I'd still like to see it finished off, though I'm sure it would need some vigorous rebasing and retesting by now. The 32-bit SourceLocation limit is a problem for at least some of our users, apparently because there's a library of header files that break the limit all by themselves. (I'm not sure how; I haven't seen them. Maybe by including each other multiple times with different #defines?)

But I have no idea whether the same considerations would apply to modules, because I don't really know enough about modules, sorry!

ChuanqiXu9 · 2024-04-03T02:17:28Z

Let's see if @statham-arm (who introduced the SourceLocation::[U]IntTy typedefs) wants to weight in here.

I'm afraid my knowledge of C++ modules is very close to zero. They were mentioned in a training course I did last year, but not in much detail.

On 64-bit SourceLocation in general: our patch series to implement those as an option in clang was never fully landed, because the second half of it stalled in review. I'd still like to see it finished off, though I'm sure it would need some vigorous rebasing and retesting by now. The 32-bit SourceLocation limit is a problem for at least some of our users, apparently because there's a library of header files that break the limit all by themselves. (I'm not sure how; I haven't seen them. Maybe by including each other multiple times with different #defines?)

But I have no idea whether the same considerations would apply to modules, because I don't really know enough about modules, sorry!

Hi Statham, thanks for quick reply. I guess the abstract question beyond modules may be: is it safe to use 48 bit to store the source location? Then we can use the higher 16 bit to represent module specific informations, e.g., module file index.

statham-arm · 2024-04-03T09:24:25Z

Yes, we think that should be safe – we don't think any of our users is generating 2^48 bytes of preprocessed output. Parsing it afterwards would take prohibitively long if they did! Probably they aren't breaking the 2^32 barrier by much.

ChuanqiXu9 · 2024-04-03T09:26:46Z

Yes, we think that should be safe – we don't think any of our users is generating 2^48 bytes of preprocessed output. Parsing it afterwards would take prohibitively long if they did! Probably they aren't breaking the 2^32 barrier by much.

Thanks. And that makes sense. I never heard compilation crash due to too long source fils before : )

ChuanqiXu9 · 2024-04-07T02:58:33Z

Rebase with main and enabling the sequence optimization when the module file index is 0. Now the size of the PCMs goes to 204M (originally it is 200M). It looks better.

@jansvoboda11 @Bigcheese ping

Bigcheese

I think the general approach makes sense. I'll take a closer look at the specific changes.

jansvoboda11 · 2024-04-09T23:37:34Z

clang/include/clang/Serialization/SourceLocationEncoding.h

+  assert(Loc.getOffset() >= BaseOffset);
+  Loc = Loc.getLocWithOffset(-BaseOffset);
+  RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding());
+  assert(Encoded < ((RawLocEncoding)1 << 32));


My compiler says:

warning: result of comparison of constant 4294967296 with expression of type 'unsigned int' is always true [-Wtautological-constant-out-of-range-compare]

I guess you may refer to the next line and that one is indeed comparing unsigned with (1<<32), and I fixed that. This line should be fine since Encoded has type RawLocEncoding .

No, I'm referring to this line. I assume the compiler sees through RawLocEncoding and knows it's being initialized from SourceLocation::UIntTy.

Oh, the compiler may be too smart : )

My intention was like, if someday we turned 64 bit source location on, this assertion can make us remember to update here.

But given the warning, I don't know how to write the assertion here. So I just removed it.

jansvoboda11 · 2024-04-09T23:42:50Z

clang/include/clang/Serialization/ASTReader.h

+  mutable llvm::DenseMap<ModuleFile *, llvm::SmallVector<ModuleFile *>>
+      ImportedModuleFiles;


Why does this live in ASTReader? If we moved it to ModuleFile, we could remove mutable and the map lookups.

Done in the new commit.

There was a similar member Imports in ModuleFile, which records the directly imported module files from the coding. But, yes, it is good to remove the map lookups and let's try to handle the ambiguous by commenting.

jansvoboda11 · 2024-04-09T23:45:32Z

clang/lib/Serialization/ASTWriter.cpp

+    // 0 means the location is not loaded. So we need to add 1 to the index to
+    // make it clear.
+    ModuleFileIndex = F->Index + 1;
+    assert(&getChain()->getModuleManager()[F->Index] == F);


This seems a bit tautological.

Yes but I'd like to remain it. The assertions make me feel better.

I don't think ASTWriter should be re-checking ModuleManager invariants either way.

This reverts commit 6c31104. Required by the post commit comments: #86912

ChuanqiXu9 · 2024-04-30T14:34:11Z

I'll revert this. Due to I can't reproduce this. When the bot gets stable, please tell if it is the real problem.

You can reproduce this: the GCC compile farm does have a Solaris/sparcv9 system (cfarm215) which is perfectly equipped to run LLVM builds (I've tried).

I think the stack traces from the bot are a pretty strong indication that your patch is the culprit:

Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang-19  0x00000001076d87b8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 36
1  clang-19  0x00000001076d910c SignalHandler(int) + 896
2  libc.so.1 0x00007fffff0c62a8 __sighndlr + 12
3  libc.so.1 0x00007fffff0b8b50 call_user_handler + 1024
4  libc.so.1 0x00007fffff0b8f10 sigacthandler + 160
5  clang-19  0x00000001083824e0 clang::ASTReader::DeclCursorForID(clang::GlobalDeclID, clang::SourceLocation&) + 168
6  clang-19  0x000000010838aca0 clang::ASTReader::ReadDeclRecord(clang::GlobalDeclID) + 48
7  clang-19  0x00000001082fb4ec clang::ASTReader::GetDecl(clang::GlobalDeclID) + 232
8  clang-19  0x00000001082cb820 clang::ASTReader::SetGloballyVisibleDecls(clang::IdentifierInfo*, llvm::SmallVectorImpl<clang::GlobalDeclID> const&, llvm::SmallVectorImpl<clang::Decl*>*) + 252
9  clang-19  0x00000001083144a0 clang::ASTReader::finishPendingActions() + 572
10 clang-19  0x0000000108319e10 clang::ASTReader::FinishedDeserializing() + 92
11 clang-19  0x000000010830dbf4 clang::ASTReader::get(llvm::StringRef) + 680
12 clang-19  0x00000001078a84fc clang::IdentifierTable::get(llvm::StringRef) + 84
13 clang-19  0x000000010a130fcc clang::Sema::Initialize() + 1208
14 clang-19  0x0000000109fd1814 clang::Parser::Initialize() + 1260
15 clang-19  0x0000000109fccb68 clang::ParseAST(clang::Sema&, bool, bool) + 556
16 clang-19  0x00000001081b10d8 clang::ASTFrontendAction::ExecuteAction() + 248
17 clang-19  0x00000001081b06f8 clang::FrontendAction::Execute() + 92
18 clang-19  0x00000001081196c8 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1572
19 clang-19  0x00000001082b87b8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 684
20 clang-19  0x00000001048a2980 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 4296
21 clang-19  0x000000010489f6f8 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) + 1184
22 clang-19  0x000000010489e018 clang_main(int, char**, llvm::ToolContext const&) + 4424
23 clang-19  0x00000001048aee0c main + 60
24 clang-19  0x000000010489c904 _start + 100
/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/PCH/Output/opencl-extensions.cl.script: line 2: 12701 Bus Error               /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/bin/clang -cc1 -internal-isystem /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/lib/clang/19/include -nostdsysteminc -include-pch /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/PCH/Output/opencl-extensions.cl.tmp -fsyntax-only /vol/llvm/src/llvm-project/dist/clang/test/PCH/opencl-extensions.cl -triple spir-unknown-unknown

One thing I see immediately that this uses a triple the bot is not configured to handle. Nonetheless clang shouldn't die with SIGBUS in such as case.

Reverted. It looks like the configuration isn't in our bots actually. I can't open that site. I need to take another look at the code though.

ChuanqiXu9 · 2024-04-30T14:40:52Z

Oh, maybe I found the reason. It is because my patch breaks the alignments of DeclOffset:

llvm-project/clang/include/clang/Serialization/ASTBitCodes.h

Lines 237 to 240 in 8d28e58

    
           /// Offset relative to the start of the DECLTYPES_BLOCK block. Keep 
        
           /// structure alignment 32-bit and avoid padding gap because undefined 
        
           /// value in the padding affects AST hash. 
        
           UnderalignedInt64 BitOffset;

then it explains why it work well in some platforms but not in other platforms. I'll fix this.

rorth · 2024-05-01T08:19:44Z

This is certainly a case of unaligned access. In a local build, I've run the first failing clang invocation under truss (the Solaris syscall tracer). For

/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/bin/clang -cc1 -internal-isystem /var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/lib/clang/19/include -nostdsysteminc -fmodules -Wno-private-module -fimplicit-module-maps -fmodules-cache-path=/var/llvm/dist-sparcv9-release-stage2-A-flang-clang18/tools/clang/stage2-bins/tools/clang/test/APINotes/Output/availability.m.tmp/ModulesCache -fapinotes-modules -fsyntax-only -I /vol/llvm/src/llvm-project/dist/clang/test/APINotes/Inputs/Headers -F /vol/llvm/src/llvm-project/dist/clang/test/APINotes/Inputs/Frameworks /vol/llvm/src/llvm-project/dist/clang/test/APINotes/availability.m -verify

this reveals

14552:      Incurred fault #5, FLTACCESS  %pc = 0x1083824E0
14552:        siginfo: SIGBUS BUS_ADRALN addr=0xFFFFFFFF7F5E7B6C
14552:      Received signal #10, SIGBUS [caught]
14552:        siginfo: SIGBUS BUS_ADRALN addr=0xFFFFFFFF7F5E7B6C

gdb shows exactly that:

Thread 2 received signal SIGBUS, Bus error.
[Switching to Thread 1 (LWP 1)]
0x00000001083824e0 in clang::ASTReader::DeclCursorForID(clang::GlobalDeclID, clang::SourceLocation&) ()
1: x/i $pc
=> 0x1083824e0 <_ZN5clang9ASTReader15DeclCursorForIDENS_12GlobalDeclIDERNS_14SourceLocationE+168>:	ldx  [ %i5 + %i1 ], %o2
(gdb) p/x $i5
$1 = 0xffffffff7f5e7b4c
(gdb) p/x $i1
$2 = 0x20

The ldx insn (Load Extended Word) takes a doubleword address with natural (64-bit) alignment.

ChuanqiXu9 · 2024-05-01T09:04:15Z

thanks, it is pretty helpful.

@Bigcheese

This relands 6c31104. The patch was reverted due to incorrectly introduced alignment. And the patch was re-commited after fixing the alignment issue. Following off are the original message: This is part of "no transitive change" patch series, "no transitive source location change". I talked this with @Bigcheese in the tokyo's WG21 meeting. The idea comes from @jyknight posted on LLVM discourse. That for: ``` // A.cppm export module A; ... // B.cppm export module B; import A; ... //--- C.cppm export module C; import C; ``` Almost every time A.cppm changes, we need to recompile `B`. Due to we think the source location is significant to the semantics. But it may be good if we can avoid recompiling `C` if the change from `A` wouldn't change the BMI of B. This patch only cares source locations. So let's focus on source location's example. We can see the full example from the attached test. ``` //--- A.cppm export module A; export template <class T> struct C { T func() { return T(43); } }; export int funcA() { return 43; } //--- A.v1.cppm export module A; export template <class T> struct C { T func() { return T(43); } }; export int funcA() { return 43; } //--- B.cppm export module B; import A; export int funcB() { return funcA(); } //--- C.cppm export module C; import A; export void testD() { C<int> c; c.func(); } ``` Here the only difference between `A.cppm` and `A.v1.cppm` is that `A.v1.cppm` has an additional blank line. Then the test shows that two BMI of `B.cppm`, one specified `-fmodule-file=A=A.pcm` and the other specified `-fmodule-file=A=A.v1.pcm`, should have the bit-wise same contents. However, it is a different story for C, since C instantiates templates from A, and the instantiation records the source information from module A, which is different from `A` and `A.v1`, so it is expected that the BMI `C.pcm` and `C.v1.pcm` can and should differ. To fully understand the patch, we need to understand how we encodes source locations and how we serialize and deserialize them. For source locations, we encoded them as: ``` | | | _____ base offset of an imported module | | | |_____ base offset of another imported module | | | | | ___ 0 ``` As the diagram shows, we encode the local (unloaded) source location from 0 to higher bits. And we allocate the space for source locations from the loaded modules from high bits to 0. Then the source locations from the loaded modules will be mapped to our source location space according to the allocated offset. For example, for, ``` // a.cppm export module a; ... // b.cppm export module b; import a; ... ``` Assuming the offset of a source location (let's name the location as `S`) in a.cppm is 45 and we will record the value `45` into the BMI `a.pcm`. Then in b.cppm, when we import a, the source manager will allocate a space for module 'a' (according to the recorded number of source locations) as the base offset of module 'a' in the current source location spaces. Let's assume the allocated base offset as 90 in this example. Then when we want to get the location in the current source location space for `S`, we can get it simply by adding `45` to `90` to `135`. Finally we can get the source location for `S` in module B as `135`. And when we want to write module `b`, we would also write the source location of `S` as `135` directly in the BMI. And to clarify the location `S` comes from module `a`, we also need to record the base offset of module `a`, 90 in the BMI of `b`. Then the problem comes. Since the base offset of module 'a' is computed by the number source locations in module 'a'. In module 'b', the recorded base offset of module 'a' will change every time the number of source locations in module 'a' increase or decrease. In other words, the contents of BMI of B will change every time the number of locations in module 'a' changes. This is pretty sensitive. Almost every change will change the number of locations. So this is the problem this patch want to solve. Let's continue with the existing design to understand what's going on. Another interesting case is: ``` // c.cppm export module c; import whatever; import a; import b; ... ``` In `c.cppm`, when we import `a`, we still need to allocate a base location offset for it, let's say the value becomes to `200` somehow. Then when we reach the location `S` recorded in module `b`, we need to translate it into the current source location space. The solution is quite simple, we can get it by `135 + (200 - 90) = 245`. In another word, the offset of a source location in current module can be computed as `Recorded Offset + Base Offset of the its module file - Recorded Base Offset`. Then we're almost done about how we handle the offset of source locations in serializers. From the abstract level, what we want to do is to remove the hardcoded base offset of imported modules and remain the ability to calculate the source location in a new module unit. To achieve this, we need to be able to find the module file owning a source location from the encoding of the source location. So in this patch, for each source location, we will store the local offset of the location and the module file index. For the above example, in `b.pcm`, the source location of `S` will be recorded as `135` directly. And in the new design, the source location of `S` will be recorded as `<1, 45>`. Here `1` stands for the module file index of `a` in module `b`. And `45` means the offset of `S` to the base offset of module `a`. So the trade-off here is that, to make the BMI more independent, we need to record more abstract information. And I feel it is worthy. The recompilation problem of modules is really annoying and there are still people complaining this. But if we can make this (including stopping other changes transitively), I think this may be a killer feature for modules. And from @Bigcheese , this should be helpful for clang explicit modules too. And the benchmarking side, I tested this patch against https://github.com/alibaba/async_simple/tree/CXX20Modules. No significant change on compilation time. The size of .pcm files becomes to 204M from 200M. I think the trade-off is pretty fair. I didn't use another slot to record the module file index. I tried to use the higher 32 bits of the existing source location encodings to store that information. This design may be safe. Since we use `unsigned` to store source locations but we use uint64_t in serialization. And generally `unsigned` is 32 bit width in most platforms. So it might not be a safe problem. Since all the bits we used to store the module file index is not used before. So the new encodings may be: ``` |-----------------------|-----------------------| | A | B | C | * A: 32 bit. The index of the module file in the module manager + 1. * The +1 here is necessary since we wish 0 stands for the current module file. * B: 31 bit. The offset of the source location to the module file * containing it. * C: The macro bit. We rotate it to the lowest bit so that we can save * some space in case the index of the module file is 0. ``` (The B and C is the existing raw encoding for source locations) Another reason to reuse the same slot of the source location is to reduce the impact of the patch. Since there are a lot of places assuming we can store and get a source location from a slot. And if I tried to add another slot, a lot of codes breaks. I don't feel it is worhty. Another impact of this decision is that, the existing small optimizations for encoding source location may be invalided. The key of the optimization is that we can turn large values into small values then we can use VBR6 format to reduce the size. But if we decided to put the module file index into the higher bits, then maybe it simply doesn't work. An example may be the `SourceLocationSequence` optimization. This will only affect the size of on-disk .pcm files. I don't expect this impact the speed and memory use of compilations. And seeing my small experiments above, I feel this trade off is worthy. The mental model for handling source location offsets is not so complex and I believe we can solve it by adding module file index to each stored source location. For the practical side, since the source location is pretty sensitive, and the patch can pass all the in-tree tests and a small scale projects, I feel it should be correct. I'll continue to work on no transitive decl change and no transitive identifier change (if matters) to achieve the goal to stop the propagation of unnecessary changes. But all of this depends on this patch. Since, clearly, the source locations are the most sensitive thing. --- The release nots and documentation will be added seperately.

ChuanqiXu9 · 2024-05-06T05:36:54Z

I've relanded this in 947b062.
Let's see what happens.

into the current module Following of #86912. After #86912, with reduced BMI, the BMI can keep unchange if the dependent modules only changes the implementation (without introduing new decls). However, this is not strictly correct. For example: ``` // a.cppm export module a; export inline int a() { ... } // b.cppm export module b; import a; export inline int b() { return a(); } ``` Since both `a()` and `b()` are inline, we need to make sure the BMI of `b.pcm` will change after the implementation of `a()` changes. We can't get that naturally since we won't record the body of `a()` during the writing process. We can't reuse ODRHash here since ODRHash won't calculate the called function recursively. So ODRHash will be problematic if `a()` calls other inline functions. Probably we can solve this by a new hash mechanism. But the safety and efficiency may a problem too. Here we just combine the hash value of the used modules conservatively.

Following of #86912 #### Motivation Example The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. #### Design details The design of the patch is similar to #86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. #### Overhead As #86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.

Following of #86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to #86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As #86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.

Following of llvm#86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to llvm#86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As llvm#86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.

Following of #86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to #86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As #86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.

Following of llvm#86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to llvm#86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As llvm#86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.

Following of #86912 The motivation of the patch series is that, for a module interface unit `X`, when the dependent modules of `X` changes, if the changes is not relevant with `X`, we hope the BMI of `X` won't change. For the specific patch, we hope if the changes was about irrelevant declaration changes, we hope the BMI of `X` won't change. **However**, I found the patch itself is not very useful in practice, since the adding or removing declarations, will change the state of identifiers and types in most cases. That said, for the most simple example, ``` // partA.cppm export module m:partA; // partA.v1.cppm export module m:partA; export void a() {} // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` the BMI of `onlyUseB` will change after we change the implementation of `partA.cppm` to `partA.v1.cppm`. Since `partA.v1.cppm` introduces new identifiers and types (the function prototype). So in this patch, we have to write the tests as: ``` // partA.cppm export module m:partA; export int getA() { ... } export int getA2(int) { ... } // partA.v1.cppm export module m:partA; export int getA() { ... } export int getA(int) { ... } export int getA2(int) { ... } // partB.cppm export module m:partB; export void b() {} // m.cppm export module m; export import :partA; export import :partB; // onlyUseB; export module onlyUseB; import m; export inline void onluUseB() { b(); } ``` so that the new introduced declaration `int getA(int)` doesn't introduce new identifiers and types, then the BMI of `onlyUseB` can keep unchanged. While it looks not so great, the patch should be the base of the patch to erase the transitive change for identifiers and types since I don't know how can we introduce new types and identifiers without introducing new declarations. Given how tightly the relationship between declarations, types and identifiers, I think we can only reach the ideal state after we made the series for all of the three entties. The design of the patch is similar to #86912, which extends the 32-bit DeclID to 64-bit and use the higher bits to store the module file index and the lower bits to store the Local Decl ID. A slight difference is that we only use 48 bits to store the new DeclID since we try to use the higher 16 bits to store the module ID in the prefix of Decl class. Previously, we use 32 bits to store the module ID and 32 bits to store the DeclID. I don't want to allocate additional space so I tried to make the additional space the same as 64 bits. An potential interesting thing here is about the relationship between the module ID and the module file index. I feel we can get the module file index by the module ID. But I didn't prove it or implement it. Since I want to make the patch itself as small as possible. We can make it in the future if we want. Another change in the patch is the new concept Decl Index, which means the index of the very big array `DeclsLoaded` in ASTReader. Previously, the index of a loaded declaration is simply the Decl ID minus PREDEFINED_DECL_NUMs. So there are some places they got used ambiguously. But this patch tried to split these two concepts. As #86912 did, the change will increase the on-disk PCM file sizes. As the declaration ID may be the most IDs in the PCM file, this can have the biggest impact on the size. In my experiments, this change will bring 6.6% increase of the on-disk PCM size. No compile-time performance regression observed. Given the benefits in the motivation example, I think the cost is worthwhile.

Following of #92083 The motivation is still cutting of the unnecessary change in the dependency chain. See the above link (recursively) for details. After this patch, (and the above patch), we can already do something pretty interesting. For example, #### Motivation example ``` //--- m-partA.cppm export module m:partA; export inline int getA() { return 43; } export class A { public: int getMem(); }; export template <typename T> class ATempl { public: T getT(); }; //--- m-partA.v1.cppm export module m:partA; export inline int getA() { return 43; } // Now we add a new declaration without introducing a new type. // The consuming module which didn't use m:partA completely is expected to be // not changed. export inline int getA2() { return 88; } export class A { public: int getMem(); // Now we add a new declaration without introducing a new type. // The consuming module which didn't use m:partA completely is expected to be // not changed. int getMem2(); }; export template <typename T> class ATempl { public: T getT(); // Add a new declaration without introducing a new type. T getT2(); }; //--- m-partB.cppm export module m:partB; export inline int getB() { return 430; } //--- m.cppm export module m; export import :partA; export import :partB; //--- useBOnly.cppm export module useBOnly; import m; export inline int get() { return getB(); } ``` In this example, module `m` exports two partitions `:partA` and `:partB`. And a consumer `useBOnly` only consumes the entities from `:partB`. So we don't hope the BMI of `useBOnly` changes if only `:partA` changes. After this patch, we can make it if the change of `:partA` doesn't introduce new types. (And we can get rid of this if we make no-transitive-type-change). As the example shows, when we change the implementation of `:partA` from `m-partA.cppm` to `m-partA.v1.cppm`, we add new function declaration `getA2()` at the global namespace, add a new member function `getMem2()` to class `A` and add a new member function to `getT2()` to class template `ATempl`. And since `:partA` is not used by `useBOnly` completely, the BMI of `useBOnly` won't change after we made above changes. #### Design details Method used in this patch is similar with #92083 and #86912. It extends the 32 bit IdentifierID to 64 bits and use the higher 32 bits to store the module file index. So that the encoding of the identifier won't get affected by other modules. #### Overhead Similar with #92083 and #86912. The change is only expected to increase the size of the on-disk .pcm files and not affect the compile-time performances. And from my experiment, the size of the on-disk change only increase 1%+ and observe no compile-time impacts. #### Future Plans I'll try to do the same thing for type ids. IIRC, it won't change the dependency graph if we add a new type in an unused units. I do think this is a significant win. And this will be a pretty good answer to "why modules are better than headers."

Following of llvm#92083 The motivation is still cutting of the unnecessary change in the dependency chain. See the above link (recursively) for details. After this patch, (and the above patch), we can already do something pretty interesting. For example, #### Motivation example ``` //--- m-partA.cppm export module m:partA; export inline int getA() { return 43; } export class A { public: int getMem(); }; export template <typename T> class ATempl { public: T getT(); }; //--- m-partA.v1.cppm export module m:partA; export inline int getA() { return 43; } // Now we add a new declaration without introducing a new type. // The consuming module which didn't use m:partA completely is expected to be // not changed. export inline int getA2() { return 88; } export class A { public: int getMem(); // Now we add a new declaration without introducing a new type. // The consuming module which didn't use m:partA completely is expected to be // not changed. int getMem2(); }; export template <typename T> class ATempl { public: T getT(); // Add a new declaration without introducing a new type. T getT2(); }; //--- m-partB.cppm export module m:partB; export inline int getB() { return 430; } //--- m.cppm export module m; export import :partA; export import :partB; //--- useBOnly.cppm export module useBOnly; import m; export inline int get() { return getB(); } ``` In this example, module `m` exports two partitions `:partA` and `:partB`. And a consumer `useBOnly` only consumes the entities from `:partB`. So we don't hope the BMI of `useBOnly` changes if only `:partA` changes. After this patch, we can make it if the change of `:partA` doesn't introduce new types. (And we can get rid of this if we make no-transitive-type-change). As the example shows, when we change the implementation of `:partA` from `m-partA.cppm` to `m-partA.v1.cppm`, we add new function declaration `getA2()` at the global namespace, add a new member function `getMem2()` to class `A` and add a new member function to `getT2()` to class template `ATempl`. And since `:partA` is not used by `useBOnly` completely, the BMI of `useBOnly` won't change after we made above changes. #### Design details Method used in this patch is similar with llvm#92083 and llvm#86912. It extends the 32 bit IdentifierID to 64 bits and use the higher 32 bits to store the module file index. So that the encoding of the identifier won't get affected by other modules. #### Overhead Similar with llvm#92083 and llvm#86912. The change is only expected to increase the size of the on-disk .pcm files and not affect the compile-time performances. And from my experiment, the size of the on-disk change only increase 1%+ and observe no compile-time impacts. #### Future Plans I'll try to do the same thing for type ids. IIRC, it won't change the dependency graph if we add a new type in an unused units. I do think this is a significant win. And this will be a pretty good answer to "why modules are better than headers."

(Some backgrounds, not required to read: https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755) This is the document part for the no-transitive-change (#86912, #92083, #92085, #92511) to provide the ability for build system to skip some unnecessary recompilations. See the patch for examples.

Summary: (Some backgrounds, not required to read: https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755) This is the document part for the no-transitive-change (#86912, #92083, #92085, #92511) to provide the ability for build system to skip some unnecessary recompilations. See the patch for examples. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251509

ChuanqiXu9 force-pushed the no-transitve-source-location-change branch from 9046d21 to 8d4e349 Compare March 28, 2024 06:48

ChuanqiXu9 marked this pull request as ready for review March 28, 2024 08:01

ChuanqiXu9 requested a review from jansvoboda11 March 28, 2024 08:01

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Mar 28, 2024

ChuanqiXu9 requested review from Bigcheese and dwblaikie March 28, 2024 08:01

ChuanqiXu9 requested review from jyknight and zygoloid March 28, 2024 08:02

ChuanqiXu9 force-pushed the no-transitve-source-location-change branch from 8d4e349 to 8e07cbd Compare April 7, 2024 02:56

Bigcheese reviewed Apr 9, 2024

View reviewed changes

jansvoboda11 reviewed Apr 9, 2024

View reviewed changes

ChuanqiXu9 added a commit that referenced this pull request Apr 30, 2024

Revert "[Modules] No transitive source location change (#86912)"

d333a0d

This reverts commit 6c31104. Required by the post commit comments: #86912

This was referenced May 13, 2024

[serialization] no transitive decl change #91914

Closed

[serialization] no transitive decl change #92083

Merged

[Serialization] No transitive identifier change #92085

Merged

ChuanqiXu9 mentioned this pull request Jun 24, 2024

[Doc] Update documentation for no-transitive-change #96453

Merged

		mutable llvm::DenseMap<ModuleFile , llvm::SmallVector<ModuleFile >>
		ImportedModuleFiles;

[Modules] No transitive source location change #86912

[Modules] No transitive source location change #86912

Conversation

ChuanqiXu9 commented Mar 28, 2024 • edited Loading

Motivation Example

Internal perspective of status quo

The high level design of current patch

Some low level details

Correctness

Future Plans

github-actions bot commented Mar 28, 2024 • edited Loading

llvmbot commented Mar 28, 2024 • edited Loading

Motivation Example

Internal perspective of status quo

The high level design of current patch

Some low level details

Correctness

Future Plans

ChuanqiXu9 commented Mar 28, 2024

jyknight commented Mar 28, 2024

jansvoboda11 commented Mar 28, 2024

ChuanqiXu9 commented Mar 29, 2024 • edited Loading

jansvoboda11 commented Mar 29, 2024

ChuanqiXu9 commented Mar 29, 2024 • edited Loading

jansvoboda11 commented Mar 29, 2024 • edited Loading

ChuanqiXu9 commented Apr 1, 2024

statham-arm commented Apr 2, 2024

ChuanqiXu9 commented Apr 3, 2024

statham-arm commented Apr 3, 2024

ChuanqiXu9 commented Apr 3, 2024

ChuanqiXu9 commented Apr 7, 2024

Bigcheese left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChuanqiXu9 commented Apr 30, 2024

ChuanqiXu9 commented Apr 30, 2024 • edited Loading

rorth commented May 1, 2024

ChuanqiXu9 commented May 1, 2024

ChuanqiXu9 commented May 6, 2024

ChuanqiXu9 commented Mar 28, 2024 •

edited

Loading

github-actions bot commented Mar 28, 2024 •

edited

Loading

llvmbot commented Mar 28, 2024 •

edited

Loading

ChuanqiXu9 commented Mar 29, 2024 •

edited

Loading

ChuanqiXu9 commented Mar 29, 2024 •

edited

Loading

jansvoboda11 commented Mar 29, 2024 •

edited

Loading

ChuanqiXu9 commented Apr 30, 2024 •

edited

Loading