Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serialization] Storing DeclID separately #95897

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ChuanqiXu9
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 commented Jun 18, 2024

In #92083, there are some unaddressed question about the increased size of PCMs. And I prepared this draft to try to address this.

This is patch is not ready for review yet. There are many places to polish. But I believe it is functional correct for how sensitive the DeclID is. So this is majorly for testing and discussing higher level ideas. (I may land some cleanup codes directly. So this may have conflict with main)

To understand the problem better, we need to understand how we encodes values in PCM. We're using VBR6 format (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer) now to record values except blobs.

In VBR6 format, the value '1' is stored as 0x1 directly no matter its type. And the value 0xFF will be stored as 0b000111'111111. The first 1 in the right hand side of ' indicates the higher bits still contirbute to the current value. And the 12th bit is 0 so that the higher value doesn't relate to the current value. Then we can combine the lower 5 bits and the higher 6 bits to get the value 0xFF.

And for my patch, it changes the DeclID from consecutive numbers to <ModuleFileIndex, LocalDeclID> pairs. So that when we record a reference to an external declaration, the serialized data in the disk may be much larger. But it is not a 100% regression, when we record a reference to a local declaration (where ModuleFileIndex is 0), the serialized data in the disk may be smaller since the LocalDeclID won't be based on the sum of number of imported declarations.

So for the question, the un-strict upper bound in theory may be the case that we have many modules (more than 2^16) but super less declarations (less than 2^5). Then the estimated upper bound may be near 10x. Since we may need more than 50 bits to store the index now but previously we can only use 6 bits to do that. But I guess this won't be true in practice.

In practice, how the size changes with the last patch should be highly dependent to the code bases.

Then for this patch, what I did is, to store the <ModuleFileIndex, LocalDeclID> pair separately instead of putting them into the one slot. This should align to the design more naturally. But from I mentioned above, we can see this may not be 100% pure win. Since for local declarations, now we need additional 6 bits to present the index. And for my local use cases, since we don't have so many modules, I see the size of PCMs increases very slightly after this patch.

For landing plans, this patch doesn't have a lot priority in my side, I am chasing #92085 #92511 more now. And from practice, I feel the actual increased size doesn't affect us practically. So I'd like to reduce the size more after landing above patches. Also we need to do the similar things for source locations, identifier IDs and type IDs. But all of them may not have such a strong impact as DeclID has.

ChuanqiXu9 added a commit that referenced this pull request Jun 19, 2024
Now we can create a LocalDeclID directly with an integer without
verifying. It may be hard to refactor if we want to change the way we
serialize DeclIDs (See #95897).
Also it is hard for us to debug if someday someone construct a
LocalDeclID with an incorrect value.

So in this patch, I tried to unify the way we can construct a
LocalDeclID in ASTReader, where we will construct the LocalDeclID from
the serialized data. Also, now we can verify the constructed LocalDeclID
sooner in the new interface.
@ChuanqiXu9 ChuanqiXu9 marked this pull request as ready for review June 24, 2024 06:21
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Jun 24, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 24, 2024

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Chuanqi Xu (ChuanqiXu9)

Changes

In #92083, there are some unaddressed question about the increased size of PCMs. And I prepared this draft to try to address this.

This is patch is not ready for review yet. There are many places to polish. But I believe it is functional correct for how sensitive the DeclID is. So this is majorly for testing and discussing higher level ideas. (I may land some cleanup codes directly. So this may have conflict with main)

To understand the problem better, we need to understand how we encodes values in PCM. We're using VBR6 format (https://llvm.org/docs/BitCodeFormat.html#variable-width-integer) now to record values except blobs.

In VBR6 format, the value '1' is stored as 0x1 directly no matter its type. And the value 0xFF will be stored as 0b000111'111111. The first 1 in the right hand side of ' indicates the higher bits still contirbute to the current value. And the 12th bit is 0 so that the higher value doesn't relate to the current value. Then we can combine the lower 5 bits and the higher 6 bits to get the value 0xFF.

And for my patch, it changes the DeclID from consecutive numbers to &lt;ModuleFileIndex, LocalDeclID&gt; pairs. So that when we record a reference to an external declaration, the serialized data in the disk may be much larger. But it is not a 100% regression, when we record a reference to a local declaration (where ModuleFileIndex is 0), the serialized data in the disk may be smaller since the LocalDeclID won't be based on the sum of number of imported declarations.

So for the question, the un-strict upper bound in theory may be the case that we have many modules (more than 2^16) but super less declarations (less than 2^5). Then the estimated upper bound may be near 10x. Since we may need more than 50 bits to store the index now but previously we can only use 6 bits to do that. But I guess this won't be true in practice.

In practice, how the size changes with the last patch should be highly dependent to the code bases.

Then for this patch, what I did is, to store the &lt;ModuleFileIndex, LocalDeclID&gt; pair separately instead of putting them into the one slot. This should align to the design more naturally. But from I mentioned above, we can see this may not be 100% pure win. Since for local declarations, now we need additional 6 bits to present the index. And for my local use cases, since we don't have so many modules, I see the size of PCMs increases very slightly after this patch.

For landing plans, this patch doesn't have a lot priority in my side, I am chasing #92085 #92511 more now. And from practice, I feel the actual increased size doesn't affect us practically. So I'd like to reduce the size more after landing above patches. Also we need to do the similar things for source locations, identifier IDs and type IDs. But all of them may not have such a strong impact as DeclID has.


Patch is 34.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95897.diff

11 Files Affected:

  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3)
  • (modified) clang/include/clang/Serialization/ASTReader.h (+2-3)
  • (modified) clang/include/clang/Serialization/ASTRecordReader.h (+19)
  • (modified) clang/include/clang/Serialization/ASTRecordWriter.h (+27)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+23-21)
  • (modified) clang/lib/Serialization/ASTReaderDecl.cpp (+8-10)
  • (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+5-4)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+19-9)
  • (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+60-52)
  • (modified) clang/test/Modules/codegen-nodep.test (+1-1)
  • (modified) clang/test/Modules/decl-params-determinisim.m (+8-8)
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index 38502a23f805e..977fa7359ef1b 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -285,6 +285,9 @@ using unaligned_decl_id_t =
         serialization::DeclID, llvm::endianness::native,
         llvm::support::unaligned>;
 
+/// The number of slots needed to record a DeclID in bitstreams.
+const unsigned int DeclIDSerialiazedSize = 2;
+
 /// The number of predefined preprocessed entity IDs.
 const unsigned int NUM_PREDEF_PP_ENTITY_IDS = 1;
 
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index f41c473c97cd9..ab257314e2c74 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -592,7 +592,7 @@ class ASTReader
 
   /// An array of lexical contents of a declaration context, as a sequence of
   /// Decl::Kind, DeclID pairs.
-  using LexicalContents = ArrayRef<serialization::unaligned_decl_id_t>;
+  using LexicalContents = ArrayRef<uint32_t>;
 
   /// Map from a DeclContext to its lexical contents.
   llvm::DenseMap<const DeclContext*, std::pair<ModuleFile*, LexicalContents>>
@@ -945,8 +945,7 @@ class ASTReader
   SmallVector<uint64_t, 8> DelayedDeleteExprs;
 
   // A list of late parsed template function data with their module files.
-  SmallVector<std::pair<ModuleFile *, SmallVector<uint64_t, 1>>, 4>
-      LateParsedTemplates;
+  SmallVector<std::pair<ModuleFile *, RecordData>, 4> LateParsedTemplates;
 
   /// The IDs of all decls to be checked for deferred diags.
   ///
diff --git a/clang/include/clang/Serialization/ASTRecordReader.h b/clang/include/clang/Serialization/ASTRecordReader.h
index 2561418b78ca7..ae0254214d0c9 100644
--- a/clang/include/clang/Serialization/ASTRecordReader.h
+++ b/clang/include/clang/Serialization/ASTRecordReader.h
@@ -86,6 +86,11 @@ class ASTRecordReader
   /// Skips the specified number of values.
   void skipInts(unsigned N) { Idx += N; }
 
+  /// Skips the specified number of DeclIDs.
+  void skipDeclRefs(unsigned N) {
+    Idx += N * serialization::DeclIDSerialiazedSize;
+  }
+
   /// Retrieve the global submodule ID its local ID number.
   serialization::SubmoduleID
   getGlobalSubmoduleID(unsigned LocalID) {
@@ -187,12 +192,26 @@ class ASTRecordReader
   /// Reads a declaration from the given position in a record in the
   /// given module, advancing Idx.
   Decl *readDecl() {
+#ifndef NDEBUG
+    unsigned OldIdx = Idx;
+    Decl *D = Reader->ReadDecl(*F, Record, Idx);
+    assert(Idx - OldIdx == serialization::DeclIDSerialiazedSize);
+    return D;
+#endif
     return Reader->ReadDecl(*F, Record, Idx);
   }
   Decl *readDeclRef() {
     return readDecl();
   }
 
+  template <class DeclKind, class Func> void readDeclArray(Func &&ConsumeFunc) {
+    unsigned LengthOfArray = readInt();
+    unsigned End = Idx + LengthOfArray;
+
+    while (Idx < End)
+      ConsumeFunc(readDeclAs<DeclKind>());
+  }
+
   /// Reads a declaration from the given position in the record,
   /// advancing Idx.
   ///
diff --git a/clang/include/clang/Serialization/ASTRecordWriter.h b/clang/include/clang/Serialization/ASTRecordWriter.h
index 0c8ac75fc40f4..30e786510ac14 100644
--- a/clang/include/clang/Serialization/ASTRecordWriter.h
+++ b/clang/include/clang/Serialization/ASTRecordWriter.h
@@ -234,12 +234,39 @@ class ASTRecordWriter
 
   /// Emit a reference to a declaration.
   void AddDeclRef(const Decl *D) {
+#ifndef NDEBUG
+    unsigned OldSize = size();
+    Writer->AddDeclRef(D, *Record);
+    assert(size() - OldSize == serialization::DeclIDSerialiazedSize);
+    return;
+#endif
     return Writer->AddDeclRef(D, *Record);
   }
   void writeDeclRef(const Decl *D) {
     AddDeclRef(D);
   }
 
+  void writeNullDeclRef() {
+#ifndef NDEBUG
+    unsigned OldSize = size();
+#endif
+
+    push_back(0);
+    push_back(0);
+
+#ifndef NDEBUG
+    assert(size() - OldSize == serialization::DeclIDSerialiazedSize);
+#endif
+  }
+
+  template <class DeclKind> void writeDeclArray(ArrayRef<DeclKind *> Array) {
+    unsigned ElementNum = Array.size();
+    push_back(ElementNum * serialization::DeclIDSerialiazedSize);
+
+    for (DeclKind *D : Array)
+      AddDeclRef(D);
+  }
+
   /// Emit a declaration name.
   void AddDeclarationName(DeclarationName Name) {
     writeDeclarationName(Name);
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 079ac3f0e3545..b2bf0f3a46194 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -1306,9 +1306,8 @@ bool ASTReader::ReadLexicalDeclContextStorage(ModuleFile &M,
   auto &Lex = LexicalDecls[DC];
   if (!Lex.first) {
     Lex = std::make_pair(
-        &M, llvm::ArrayRef(
-                reinterpret_cast<const unaligned_decl_id_t *>(Blob.data()),
-                Blob.size() / sizeof(DeclID)));
+        &M, llvm::ArrayRef(reinterpret_cast<const uint32_t *>(Blob.data()),
+                           Blob.size() / sizeof(uint32_t)));
   }
   DC->setHasExternalLexicalStorage(true);
   return false;
@@ -3422,8 +3421,8 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
     case TU_UPDATE_LEXICAL: {
       DeclContext *TU = ContextObj->getTranslationUnitDecl();
       LexicalContents Contents(
-          reinterpret_cast<const unaligned_decl_id_t *>(Blob.data()),
-          static_cast<unsigned int>(Blob.size() / sizeof(DeclID)));
+          reinterpret_cast<const uint32_t *>(Blob.data()),
+          static_cast<unsigned int>(Blob.size() / sizeof(uint32_t)));
       TULexicalDecls.push_back(std::make_pair(&F, Contents));
       TU->setHasExternalLexicalStorage(true);
       break;
@@ -3696,7 +3695,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       break;
 
     case VTABLE_USES:
-      if (Record.size() % 3 != 0)
+      if (Record.size() % (DeclIDSerialiazedSize + 1 + 1) != 0)
         return llvm::createStringError(std::errc::illegal_byte_sequence,
                                        "Invalid VTABLE_USES record");
 
@@ -3714,8 +3713,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       break;
 
     case PENDING_IMPLICIT_INSTANTIATIONS:
-
-      if (Record.size() % 2 != 0)
+      if (Record.size() % (DeclIDSerialiazedSize + 1) != 0)
         return llvm::createStringError(
             std::errc::illegal_byte_sequence,
             "Invalid PENDING_IMPLICIT_INSTANTIATIONS block");
@@ -3728,7 +3726,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       break;
 
     case SEMA_DECL_REFS:
-      if (Record.size() != 3)
+      if (Record.size() != 3 * serialization::DeclIDSerialiazedSize)
         return llvm::createStringError(std::errc::illegal_byte_sequence,
                                        "Invalid SEMA_DECL_REFS block");
       for (unsigned I = 0, N = Record.size(); I != N; /*in loop*/)
@@ -3786,7 +3784,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
     }
 
     case DECL_UPDATE_OFFSETS:
-      if (Record.size() % 2 != 0)
+      if (Record.size() % (DeclIDSerialiazedSize + 1) != 0)
         return llvm::createStringError(
             std::errc::illegal_byte_sequence,
             "invalid DECL_UPDATE_OFFSETS block in AST file");
@@ -3803,7 +3801,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       break;
 
     case DELAYED_NAMESPACE_LEXICAL_VISIBLE_RECORD: {
-      if (Record.size() % 3 != 0)
+      if (Record.size() % (DeclIDSerialiazedSize + 2) != 0)
         return llvm::createStringError(
             std::errc::illegal_byte_sequence,
             "invalid DELAYED_NAMESPACE_LEXICAL_VISIBLE_RECORD block in AST "
@@ -3898,7 +3896,7 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
       break;
 
     case UNDEFINED_BUT_USED:
-      if (Record.size() % 2 != 0)
+      if (Record.size() % (DeclIDSerialiazedSize + 1) != 0)
         return llvm::createStringError(std::errc::illegal_byte_sequence,
                                        "invalid undefined-but-used record");
       for (unsigned I = 0, N = Record.size(); I != N; /* in loop */) {
@@ -7893,7 +7891,10 @@ GlobalDeclID ASTReader::ReadDeclID(ModuleFile &F, const RecordDataImpl &Record,
     return GlobalDeclID(0);
   }
 
-  return getGlobalDeclID(F, LocalDeclID::get(*this, F, Record[Idx++]));
+  uint32_t ModuleFileIndex = Record[Idx++];
+  uint32_t LocalDeclIndex = Record[Idx++];
+  return getGlobalDeclID(
+      F, LocalDeclID::get(*this, F, ModuleFileIndex, LocalDeclIndex));
 }
 
 /// Resolve the offset of a statement into a statement.
@@ -7922,25 +7923,26 @@ void ASTReader::FindExternalLexicalDecls(
     SmallVectorImpl<Decl *> &Decls) {
   bool PredefsVisited[NUM_PREDEF_DECL_IDS] = {};
 
-  auto Visit = [&] (ModuleFile *M, LexicalContents LexicalDecls) {
-    assert(LexicalDecls.size() % 2 == 0 && "expected an even number of entries");
-    for (int I = 0, N = LexicalDecls.size(); I != N; I += 2) {
+  auto Visit = [&](ModuleFile *M, LexicalContents LexicalDecls) {
+    assert(LexicalDecls.size() % 3 == 0 && "incorrect number of entries");
+    for (int I = 0, N = LexicalDecls.size(); I != N; I += 3) {
       auto K = (Decl::Kind)+LexicalDecls[I];
       if (!IsKindWeWant(K))
         continue;
 
-      auto ID = (DeclID) + LexicalDecls[I + 1];
+      LocalDeclID ID =
+          LocalDeclID::get(*this, *M, LexicalDecls[I + 1], LexicalDecls[I + 2]);
 
       // Don't add predefined declarations to the lexical context more
       // than once.
       if (ID < NUM_PREDEF_DECL_IDS) {
-        if (PredefsVisited[ID])
+        if (PredefsVisited[ID.getRawValue()])
           continue;
 
-        PredefsVisited[ID] = true;
+        PredefsVisited[ID.getRawValue()] = true;
       }
 
-      if (Decl *D = GetLocalDecl(*M, LocalDeclID::get(*this, *M, ID))) {
+      if (Decl *D = GetLocalDecl(*M, ID)) {
         assert(D->getKind() == K && "wrong kind for lexical decl");
         if (!DC->isDeclInLexicalTraversal(D))
           Decls.push_back(D);
@@ -8837,7 +8839,7 @@ void ASTReader::ReadLateParsedTemplates(
         &LPTMap) {
   for (auto &LPT : LateParsedTemplates) {
     ModuleFile *FMod = LPT.first;
-    RecordDataImpl &LateParsed = LPT.second;
+    RecordData &LateParsed = LPT.second;
     for (unsigned Idx = 0, N = LateParsed.size(); Idx < N;
          /* In loop */) {
       FunctionDecl *FD = ReadDeclAs<FunctionDecl>(*FMod, LateParsed, Idx);
diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 4b8b515c02c70..0e55c831ce34c 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -1020,9 +1020,8 @@ void ASTDeclReader::VisitFunctionDecl(FunctionDecl *FD) {
   case FunctionDecl::TK_DependentFunctionTemplateSpecialization: {
     // Templates.
     UnresolvedSet<8> Candidates;
-    unsigned NumCandidates = Record.readInt();
-    while (NumCandidates--)
-      Candidates.addDecl(readDeclAs<NamedDecl>());
+    Record.readDeclArray<NamedDecl>(
+        [&Candidates](NamedDecl *ND) { Candidates.addDecl(ND); });
 
     // Templates args.
     TemplateArgumentListInfo TemplArgsWritten;
@@ -1152,11 +1151,9 @@ void ASTDeclReader::VisitFunctionDecl(FunctionDecl *FD) {
   FD->setIsPureVirtual(Pure);
 
   // Read in the parameters.
-  unsigned NumParams = Record.readInt();
   SmallVector<ParmVarDecl *, 16> Params;
-  Params.reserve(NumParams);
-  for (unsigned I = 0; I != NumParams; ++I)
-    Params.push_back(readDeclAs<ParmVarDecl>());
+  Record.readDeclArray<ParmVarDecl>(
+      [&Params](ParmVarDecl *ParmD) { Params.push_back(ParmD); });
   FD->setParams(Reader.getContext(), Params);
 }
 
@@ -2309,7 +2306,7 @@ void ASTDeclReader::VisitCXXMethodDecl(CXXMethodDecl *D) {
   } else {
     // We don't care about which declarations this used to override; we get
     // the relevant information from the canonical declaration.
-    Record.skipInts(NumOverridenMethods);
+    Record.skipDeclRefs(NumOverridenMethods);
   }
 }
 
@@ -4354,8 +4351,9 @@ void ASTReader::loadPendingDeclChain(Decl *FirstLocal, uint64_t LocalOffset) {
   // FIXME: We have several different dispatches on decl kind here; maybe
   // we should instead generate one loop per kind and dispatch up-front?
   Decl *MostRecent = FirstLocal;
-  for (unsigned I = 0, N = Record.size(); I != N; ++I) {
-    unsigned Idx = N - I - 1;
+  for (unsigned I = 0, N = Record.size(); I != N;
+       I += serialization::DeclIDSerialiazedSize) {
+    unsigned Idx = N - I - serialization::DeclIDSerialiazedSize;
     auto *D = ReadDecl(*M, Record, Idx);
     ASTDeclReader::attachPreviousDecl(*this, D, MostRecent, CanonDecl);
     MostRecent = D;
diff --git a/clang/lib/Serialization/ASTReaderStmt.cpp b/clang/lib/Serialization/ASTReaderStmt.cpp
index e23ceffb10bfe..975f03391935f 100644
--- a/clang/lib/Serialization/ASTReaderStmt.cpp
+++ b/clang/lib/Serialization/ASTReaderStmt.cpp
@@ -351,14 +351,15 @@ void ASTStmtReader::VisitDeclStmt(DeclStmt *S) {
   S->setStartLoc(readSourceLocation());
   S->setEndLoc(readSourceLocation());
 
-  if (Record.size() - Record.getIdx() == 1) {
+  unsigned NumDecls =
+      (Record.size() - Record.getIdx()) / serialization::DeclIDSerialiazedSize;
+  if (NumDecls == 1) {
     // Single declaration
     S->setDeclGroup(DeclGroupRef(readDecl()));
   } else {
     SmallVector<Decl *, 16> Decls;
-    int N = Record.size() - Record.getIdx();
-    Decls.reserve(N);
-    for (int I = 0; I < N; ++I)
+    Decls.reserve(NumDecls);
+    for (unsigned I = 0; I < NumDecls; ++I)
       Decls.push_back(readDecl());
     S->setDeclGroup(DeclGroupRef(DeclGroup::Create(Record.getContext(),
                                                    Decls.data(),
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 5b39055cf9f27..ef358ac9dca55 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -3321,7 +3321,7 @@ uint64_t ASTWriter::WriteDeclContextLexicalBlock(ASTContext &Context,
     return 0;
 
   uint64_t Offset = Stream.GetCurrentBitNo();
-  SmallVector<DeclID, 128> KindDeclPairs;
+  SmallVector<uint32_t, 128> KindDeclPairs;
   for (const auto *D : DC->decls()) {
     if (DoneWritingDeclsAndTypes && !wasDeclEmitted(D))
       continue;
@@ -3336,7 +3336,9 @@ uint64_t ASTWriter::WriteDeclContextLexicalBlock(ASTContext &Context,
       continue;
 
     KindDeclPairs.push_back(D->getKind());
-    KindDeclPairs.push_back(GetDeclRef(D).getRawValue());
+    LocalDeclID ID = GetDeclRef(D);
+    KindDeclPairs.push_back(ID.getModuleFileIndex());
+    KindDeclPairs.push_back(ID.getLocalDeclIndex());
   }
 
   ++NumLexicalDeclContexts;
@@ -4451,8 +4453,9 @@ void ASTWriter::WriteDeclContextVisibleUpdate(const DeclContext *DC) {
     DC = cast<DeclContext>(Chain->getKeyDeclaration(cast<Decl>(DC)));
 
   // Write the lookup table
-  RecordData::value_type Record[] = {UPDATE_VISIBLE,
-                                     getDeclID(cast<Decl>(DC)).getRawValue()};
+  LocalDeclID ID = getDeclID(cast<Decl>(DC));
+  RecordData::value_type Record[] = {UPDATE_VISIBLE, ID.getModuleFileIndex(),
+                                     ID.getLocalDeclIndex()};
   Stream.EmitRecordWithBlob(UpdateVisibleAbbrev, Record, LookupTable);
 }
 
@@ -5243,9 +5246,10 @@ void ASTWriter::WriteSpecialDeclRecords(Sema &SemaRef) {
   RecordData SemaDeclRefs;
   if (SemaRef.StdNamespace || SemaRef.StdBadAlloc || SemaRef.StdAlignValT) {
     auto AddEmittedDeclRefOrZero = [this, &SemaDeclRefs](Decl *D) {
-      if (!D || !wasDeclEmitted(D))
+      if (!D || !wasDeclEmitted(D)) {
         SemaDeclRefs.push_back(0);
-      else
+        SemaDeclRefs.push_back(0);
+      } else
         AddDeclRef(D, SemaDeclRefs);
     };
 
@@ -5679,7 +5683,7 @@ void ASTWriter::WriteDeclAndTypes(ASTContext &Context) {
   const TranslationUnitDecl *TU = Context.getTranslationUnitDecl();
   // Create a lexical update block containing all of the declarations in the
   // translation unit that do not come from other AST files.
-  SmallVector<DeclID, 128> NewGlobalKindDeclPairs;
+  SmallVector<uint32_t, 128> NewGlobalKindDeclPairs;
   for (const auto *D : TU->noload_decls()) {
     if (D->isFromASTFile())
       continue;
@@ -5689,7 +5693,9 @@ void ASTWriter::WriteDeclAndTypes(ASTContext &Context) {
       continue;
 
     NewGlobalKindDeclPairs.push_back(D->getKind());
-    NewGlobalKindDeclPairs.push_back(GetDeclRef(D).getRawValue());
+    LocalDeclID ID = GetDeclRef(D);
+    NewGlobalKindDeclPairs.push_back(ID.getModuleFileIndex());
+    NewGlobalKindDeclPairs.push_back(ID.getLocalDeclIndex());
   }
 
   auto Abv = std::make_shared<llvm::BitCodeAbbrev>();
@@ -5704,6 +5710,7 @@ void ASTWriter::WriteDeclAndTypes(ASTContext &Context) {
   Abv = std::make_shared<llvm::BitCodeAbbrev>();
   Abv->Add(llvm::BitCodeAbbrevOp(UPDATE_VISIBLE));
   Abv->Add(llvm::BitCodeAbbrevOp(llvm::BitCodeAbbrevOp::VBR, 6));
+  Abv->Add(llvm::BitCodeAbbrevOp(llvm::BitCodeAbbrevOp::VBR, 6));
   Abv->Add(llvm::BitCodeAbbrevOp(llvm::BitCodeAbbrevOp::Blob));
   UpdateVisibleAbbrev = Stream.EmitAbbrev(std::move(Abv));
 
@@ -6195,7 +6202,10 @@ void ASTWriter::AddEmittedDeclRef(const Decl *D, RecordDataImpl &Record) {
 }
 
 void ASTWriter::AddDeclRef(const Decl *D, RecordDataImpl &Record) {
-  Record.push_back(GetDeclRef(D).getRawValue());
+  LocalDeclID ID = GetDeclRef(D);
+
+  Record.push_back(ID.getModuleFileIndex());
+  Record.push_back(ID.getLocalDeclIndex());
 }
 
 LocalDeclID ASTWriter::GetDeclRef(const Decl *D) {
diff --git a/clang/lib/Serialization/ASTWriterDecl.cpp b/clang/lib/Serialization/ASTWriterDecl.cpp
index 59d94c3d79824..3a64f831755a4 100644
--- a/clang/lib/Serialization/ASTWriterDecl.cpp
+++ b/clang/lib/Serialization/ASTWriterDecl.cpp
@@ -243,12 +243,15 @@ namespace clang {
         assert(D->isCanonicalDecl() && "non-canonical decl in set");
         AddFirstDeclFromEachModule(D, /*IncludeLocal*/true);
       }
-      Record.append(
-          DeclIDIterator<GlobalDeclID, DeclID>(LazySpecializations.begin()),
-          DeclIDIterator<GlobalDeclID, DeclID>(LazySpecializations.end()));
+
+      for (GlobalDeclID LazyID : LazySpecializations) {
+        Record.push_back(LazyID.getModuleFileIndex());
+        Record.push_back(LazyID.getLocalDeclIndex());
+      }
 
       // Update the size entry we added earlier.
-      Record[I] = Record.size() - I - 1;
+      Record[I] =
+          (Record.size() - I - 1) / serialization::DeclIDSerialiazedSize;
     }
 
     /// Ensure that this template specialization is associated with the specified
@@ -695,9 +698,7 @@ void ASTDeclWriter::VisitFunctionDecl(FunctionDecl *D) {
       DFTSInfo = D->getDependentSpecializationInfo();
 
     // Candidates.
-    Record.push_back(DFTSInfo->getCandidates().size());
-    for (FunctionTemplateDecl *FTD : DFTSInfo->getCandidates())
-      Record.AddDeclRef(FTD);
+    Record.writeDeclArray(DFTSInfo->getCandidates());
 
     // Templates args.
     Record.push_back(DFTSInfo->TemplateArgumentsAsWritten != nullptr);
@@ -769,9 +770,7 @@ void ASTDeclWriter::VisitFunctionDecl(FunctionDecl *D) {
     }
   }
 
-  Record.push_back(D->param_size());
-  for (auto *P : D->parameters())
-    Record.AddDeclRef(P);
+  Record.writeDeclArray(D->parameters());
   Code = serialization::DECL_FUNCTION;
 }
 
@@ -1527,7 +1526,7 @@ void ASTDeclWriter::VisitCXXRecordDecl(CXXRecordDecl *D) {
       Record.AddDeclRef(Context);
       Record.push_back(D->getLambdaIndexInContext());
     } else {
-      Record.push_back(0);
+      Record.writeNullDeclRef();
     }
   } else {
     Record.push_back(CXXRecNotTemplate);
@@ -2060,7 +2059,8 @@ void ASTDeclWriter::VisitRedeclarable(Redeclarable<T> *D) {
       if (Writer.Chain)
         AddFirstDeclFromEachModule(DAsT, /*IncludeLocal*/false);
       // This is the number of imported first declarations + 1.
-      Record[I] = Record.size() - I;
+      Record[I] =
+          ((Record.size() - I - 1) / serialization::DeclIDSerialiazedSize) + 1;
 
       // Collect the set of local redeclarations of this declaration, from
       // newest to oldest.
@@ -2091,8 +2091,8 @@ void ASTDeclWriter::VisitRedeclarable(Redeclarable<T> *D) {
     (void)Writer.GetDeclRef(D->getPreviousDecl());
     (void)Writer.GetDeclRef(MostRecent);
   } else {
-    // We use the sentinel value 0 to indicate an only declaration.
-    Record.push_back(...
[truncated]

@ChuanqiXu9
Copy link
Member Author

@ilya-biryukov I think this is ready to review and test. I think this can mitigate the size increase with the DeclID change.

@ilya-biryukov
Copy link
Contributor

Sorry for not getting to it today, I'll send a review tomorrow.

@ilya-biryukov
Copy link
Contributor

Thanks for this patch, I will try it on our codebase to see the effects of it and report back to you.
I am also trying to understand in which cases this would be a win and when not.
IIUC, this is always a win when ModuleFileIndex != 0 and a pessimization otherwise.

When is ModuleFileIndex == 0? Does this only happen for predefined decls (that we never deserialize) or also for the declarations owned by the currently written module?

@ChuanqiXu9
Copy link
Member Author

Thanks for this patch, I will try it on our codebase to see the effects of it and report back to you. I am also trying to understand in which cases this would be a win and when not. IIUC, this is always a win when ModuleFileIndex != 0 and a pessimization otherwise.

Yes, I think.

When is ModuleFileIndex == 0? Does this only happen for predefined decls (that we never deserialize) or also for the declarations owned by the currently written module?

Yes, this is what I thought.

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
Now we can create a LocalDeclID directly with an integer without
verifying. It may be hard to refactor if we want to change the way we
serialize DeclIDs (See llvm#95897).
Also it is hard for us to debug if someday someone construct a
LocalDeclID with an incorrect value.

So in this patch, I tried to unify the way we can construct a
LocalDeclID in ASTReader, where we will construct the LocalDeclID from
the serialized data. Also, now we can verify the constructed LocalDeclID
sooner in the new interface.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants