-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deterministic MVID and add PdbChecksum #31
Conversation
Deterministic MVID So far the deterministic MVID was computed as a hash of the IL metadata. It didn't include PE headers. This means that if there's an assembly which only differs in the PE architecture it would have the same MVID. This can happen relatively easily for facades (assemblies with only type forwarders) as those are typically identical across platforms, possibly except for the architecture. The correct way to calculate MVID is to hash the content of the entire file with the MVID itself and the strong name signature (if presenet) zeroed out. This change modified the MVID computation to work that way: Write the entire image with MVID all zeroes, compute the hash, write the MVID and then compute the strong name. PdbChecksum In order for the produced PDB to be fully verifiable the assembly must contain a PdbChecksum debug header entry which has a cryptographic hash of the associated PDB. The wayt to calculate the checksum is prescribed as other programs must be able to validate it. See https://github.com/dotnet/runtime/blob/main/docs/design/specs/PE-COFF.md#pdb-checksum-debug-directory-entry-type-19. This change implements that behavior. Symbols are now written explicitly once all metadata is processed but before the final assembly image is written. For portable PDBs the full file is written with PDB ID set to all zeroes. Then the crypto hash is calculated (using SHA256). The hash is then written into the PdbChecksum debug header entry of the assembly image. Deterministic Portable PDB ID So far the PDB ID was set to the same value as MVID. With the fix for MVID described above, this doesn't work anymore. In case of an embedded PDB, it's not possible to calculate MVID using the hash with the PDB embedded, as the PDB would now contain the MVID (cycle). The recommended way to calculate PDB ID is to use the same mechanism as for calculating PDB checksum and use the first 20 bytes of the has as the PDB ID. This change implements this by using the first 20 bytes of the hash and sets the PDB ID as the last change of the portable PDB. The calculated PDB ID is also then set into the CodeView debug header entry of the assembly. Added tests for reading and writing both portable and embedded portable PDBs with checksums. Added test to validate stability of PDB ID across multiple writes. Added test to validate stability of MVID across multiple writes and the fact that it changes when just the PE architecture is changed.
Mono.Cecil.Cil/PortablePdb.cs
Outdated
@@ -171,7 +171,7 @@ public ISymbolReader GetSymbolReader (ModuleDefinition module, string fileName) | |||
throw new InvalidOperationException (); | |||
|
|||
return new EmbeddedPortablePdbReader ( | |||
(PortablePdbReader) new PortablePdbReaderProvider ().GetSymbolReader (module, GetPortablePdbStream (entry))); | |||
(PortablePdbReader)new PortablePdbReaderProvider ().GetSymbolReader (module, GetPortablePdbStream (entry))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo any line damage or formating changes, they make upstream merges pain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently I hit something in VS which reformatted the entire file. I'm so glad we have lint in dotnet/linker 😉.
Anyway - hopefully I was able to revert all of these now.
This should mostly get rid of any breaking changes made by this change.
This should be fully ready now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to make upstream PR after you merge it here
Ported this to jbevain/cecil as jbevain#810. |
The new Cecil fixes the deterministic MVID and PDB checksum. Details in dotnet/cecil#31.
The new Cecil fixes the deterministic MVID and PDB checksum. Details in dotnet/cecil#31.
* Fix deterministic MVID and add PdbChecksum (#31) * Fix how pdb path is calculated in the tests * Fix portable PDB stamp in CodeView header (#32) * Introduce ISymbolWriter.Write This mostly cleans up the code to make it easier to understand. `ISymbolWriter.GetDebugHeader` no longer actually writes the symbols, there's a new `Write` method for just that. The assembly writer calls `Write` first and then the image writer calls `GetDebugHeader` when it's needed. This is partially taken from jbevain#617.
This reverts commit ff616bf.
…otnet/linker#2384) The new Cecil fixes the deterministic MVID and PDB checksum. Details in dotnet/cecil#31. Commit migrated from dotnet/linker@4f4a670
Make deterministic MVID unique even when the only change is in PE headers. Implement deterministic PDB ID generation independent of MVID and implement writing of PdbCheckum debug header entry.
Observable differences (other than the above):
GetDebugHeader
(well except for MPDB writer which I didn't change as it's really tricky). Callers should not notice much of a difference if they assumed onlyDispose
would actually write.Module.Write(stream
now requires the stream to be read/write. It already did so for cases where strong name was written, now it requires it also when deterministic MVID is turned on.Deterministic MVID
So far the deterministic MVID was computed as a hash of the IL metadata. It didn't include PE headers.
This means that if there's an assembly which only differs in the PE architecture it would have the same MVID.
This can happen relatively easily for facades (assemblies with only type forwarders) as those are typically
identical across platforms, possibly except for the architecture.
The correct way to calculate MVID is to hash the content of the entire file with the MVID itself and
the strong name signature (if presenet) zeroed out.
This change modified the MVID computation to work that way: Write the entire image with MVID all zeroes,
compute the hash, write the MVID and then compute the strong name.
PdbChecksum
In order for the produced PDB to be fully verifiable the assembly must contain a PdbChecksum
debug header entry which has a cryptographic hash of the associated PDB. The wayt to calculate the checksum
is prescribed as other programs must be able to validate it. See https://github.com/dotnet/runtime/blob/main/docs/design/specs/PE-COFF.md#pdb-checksum-debug-directory-entry-type-19.
This change implements that behavior. Symbols are now written explicitly once all metadata is processed
but before the final assembly image is written. For portable PDBs the full file is written with PDB ID
set to all zeroes. Then the crypto hash is calculated (using SHA256).
The hash is then written into the PdbChecksum debug header entry of the assembly image.
Deterministic Portable PDB ID
So far the PDB ID was set to the same value as MVID. With the fix for MVID described above, this doesn't
work anymore. In case of an embedded PDB, it's not possible to calculate MVID using the hash
with the PDB embedded, as the PDB would now contain the MVID (cycle).
The recommended way to calculate PDB ID is to use the same mechanism as for calculating PDB checksum
and use the first 20 bytes of the has as the PDB ID.
This change implements this by using the first 20 bytes of the hash and sets the PDB ID as the last
change of the portable PDB.
The calculated PDB ID is also then set into the CodeView debug header entry of the assembly.
Added tests for reading and writing both portable and embedded portable PDBs with checksums.
Added test to validate stability of PDB ID across multiple writes.
Added test to validate stability of MVID across multiple writes and the fact that it changes
when just the PE architecture is changed.
This is a fix for dotnet/linker#2203.