Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why sha256sum changes when compiler the same program twice #7385

Closed
guoyejun opened this issue Nov 14, 2022 · 9 comments
Closed

why sha256sum changes when compiler the same program twice #7385

guoyejun opened this issue Nov 14, 2022 · 9 comments
Labels
bug Something isn't working Stale

Comments

@guoyejun
Copy link
Contributor

sha256sum of 'a.out' changes when compiler the same program twice with same option

no matter what the program is,
$ dpcpp -O2 -g trydpcpp.cpp
$ sha256sum a.out
70069a95e68870cdab5ba915b0d91d3492a3c7592a3befa1b060b4a57703fa0b a.out
$ dpcpp -O2 -g trydpcpp.cpp
$ sha256sum a.out
a13654be008b447edb4b518831bcad3a30fae7fb37bb8b9b223b1e25a953b8b9 a.out

I tried a normal c++ program with g++, and the sha256sum of a.out is always the same.

@guoyejun guoyejun added the bug Something isn't working label Nov 14, 2022
@zjin-lcf
Copy link
Contributor

They are also different with the nvcc compiler

@guoyejun
Copy link
Contributor Author

could you share why it is different? thanks.

@AlexeySachkov
Copy link
Contributor

Thanks for the observation, @guoyejun!

I don't know the exact reason of why the hash changes, but DPC++ compilation flow is more complicated than a regular C++ compilation flow and most likely at some step we are inserting something non-deterministic.

My best guess would be that this is caused by integration footer we add: we generate a temporary file, which has a random name and we compile that file instead of your original input. When debug info is enabled, the filename is incorporated into the binary and it changes hash. Integration footer was introduced to implement features like SYCL 2020 specialization constants and sycl_ext_oneapi_device_global extension. You can find technical design description here

To confirm or disprove that we can try the following experiments:

  • try to disable integration footer by passing -fno-sycl-use-footer option to the compiler
  • try to disable debug info generation by omitting -g flag

My current guess is that only combination of integration footer + debug info triggers non-deterministic binaries. Disabling one of those should allow to workaround that. There was a proposal to redesign that part of the implementation to remove integration footer, but it is postponed for now (#5910)

@guoyejun
Copy link
Contributor Author

thanks @AlexeySachkov !

I found that both of them are needed, the checksum is the same with:
dpcpp -O2 -fno-sycl-use-footer trydpcpp.cpp

@AlexeySachkov
Copy link
Contributor

I found that both of them are needed, the checksum is the same with:
dpcpp -O2 -fno-sycl-use-footer trydpcpp.cpp

That's interesting, it actually means that even if no debug info is requested, we still emit some non-deterministic data which is encoded into binaries

@guoyejun
Copy link
Contributor Author

yes, and the file size is still the same even the checksum is different.

@Bach234
Copy link

Bach234 commented Nov 24, 2022

This difference is from SourceFileName in llvm/lib/IR/Module.cpp, we can delete SourceFileName to solve this problem.

@guoyejun
Copy link
Contributor Author

thanks, it is useful when we compare the .so file in different systems for debugging.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

4 participants