-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary module interface #4545
Binary module interface #4545
Conversation
f51bac6
to
2d1f543
Compare
2d1f543
to
b27698c
Compare
@mgsloan would you recommend a reviewer, please? @haitlahcen Thanks for the PR! I do have one major concern. The interface changes for every major GHC version, and so this will need to be revisited for every major version; it surely means old stack versions will not work with new GHC versions? I may be wrong, however — I don't know how stack handles this at the moment. Will a change to the dumped output format mean this is an issue already?
The page you linked refers to the GHC wiki. GHC is available as a library on hackage. I don't know whether current GHC can read interface files from old GHC versions. I suspect there's a package which handles this (like for template-haskell) — perhaps one of these packages does something. In any case, I'm glad it works, for now. Are you able to use your fork, then? |
What is the behaviour if you use it with an unsupported (new) GHC version? We do need some kind of reasonable fallback to support new versions, especially prerelease GHC versions, without having to release a new |
The older version will still work but the new interface will not be deserialized (which is only used to check for dirtiness btw).
Yes, this already happened when they added a hash after the path of the TH dependencies.
GHC is not supporting older versions of the protocol. I think the main reason is that interface files are generated at compile time and were meant to be used temporarily (destroyed right after compilation).
I did not found any package (other than GHC) implementing the protocols, but I might have missed something !
Indeed, I've successfully compiled Stack using the patched version !
You are right, the current behavior just ignore the deserialization failure, hence not following deps dirtiness. |
@haitlahcen Thanks for the detailed responses — they've certainly addressed my concerns. We should be able to review it, soon.
I take it then that you are able to use your patched version of stack in production. Thanks for contributing back! |
See https://neilmitchell.blogspot.com/2019/02/announcing-ghc-lib.html — it may be worth considering from a maintenance perspective. Good timing! I wonder whether @ndmitchell saw this issue… |
@dbaynard I did not. So From what I saw, the use of the information obtained by parsing the .hi files is purely for warnings? If that's the case, shouldn't the failure to parse the .hi files be a warning, and then just ignore the other warnings, and it would work for pre-releases just fine. Do not that Weeder also piggy backs off the .hi files that Stack creates, so if Stack could still create them with some flag that would be helpful for Weeder. Alternatively, Weeder could move to dumping the .hi-dump files itself on demand. As a consequence, this change is a major change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to see a unit test or two added showing that this parsing works, ideally including .hi
files from a few different versions of GHC. Overall, I like this direction, thank you!
@snoyberg Thanks for the review, I'll take some time to add the according tests. |
Hey @haitlahcen, just bumping. Are you still planning on adding the tests here? |
@snoyberg indeed, just busy right now but will definitely do it |
Cool, thanks for the quick response 👍 |
I'm going to recommend we move ahead with merging this as-is, even without the additional tests, and rely on integration tests to catch bugs. A future PR to improve the unit tests will definitely be appreciated. |
…ull-compiler No null compiler in snapshot output
…ve-excess-space Remove excess space to extra-deps recommendations
…rruptible-file-locks Switch to interruptible branch of filelock
@haitlahcen Did you notice my last comments? |
…-lock-dist-dir Lock the dist directory (fixes commercialhaskell#2730)
Specifically, we're adding in this commit: snoyberg/filelock@4f08049 Without this change, very often on large builds we end up in a situation where: * Multiple threads are working in parallel * Thread A needs to work with the database * Thread A takes a file lock on the database * Thread B forks a child process, which inherits the database lock * Thread A releases the file lock * Thread C attempts to take the file lock, but is blocked by B's child As a demonstration, lsof showed me this on my system: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME stack 20216 michael 12w REG 1,4 0 8613331527 stack.sqlite3.pantry-write-lock Cabal-sim 22736 michael 12w REG 1,4 0 8613331527 stack.sqlite3.pantry-write-lock ghc 22800 michael 12w REG 1,4 0 8613331527 stack.sqlite3.pantry-write-lock With this change in place, I witnessed 0 cases of the file lock not acquired message from Stack on a large build.
…-files-2 Lock files attempt 2
…xec-filelock Use patched filelock with CLOEXEC
Instead of FromJSON instances for Repo and PackageMetadata, use the Data.Aeson.Extended mechanisms to properly track which fields are used.
The completed package information already contains the package name, bypassing the need for a package completion call if the cache (lock file) already contains that information. This bypasses a spurious warning about lacking cryptographic hashes, and likely improves performance, for the repo and archive cases. This wasn't discovered initially because the Hackage use case never had an overhead: the specification of a RPLIHackage already contains the package name in all cases.
This reduces the amount of recompilation that has to happen when iterating on the code base.
…-improved-from-json-instances Fix some misplaced parse warnings commercialhaskell#4789
…ve-th-cpp-in-main Remove CPP and TH in Main module
…-symlinks-to-dirs Support symlinks to directories (fixes commercialhaskell#4776)
…-package-name-from-completed Get package name from completed information commercialhaskell#4789
…en/stack into haitlahcen-binary-module-interface
Just noticed your comments @ndmitchell.
No, the primary purpose is to determine which files are depended upon by Template Haskell, so that proper dirtiness checking can occur.
Yes, failure to parse should be a warning, agreed. I just checked the current code, and from what I can tell that's exactly what occurs.
Do you mean the
Also alternatively: we could take this
Can you elaborate on what you mean here by "major change?" |
Weeder parses the
Major in the sense that there is some fallout. But it's not an issue. |
Thanks for the clarification @ndmitchell. I've created an updated PR including the changes here, as well as:
|
This PR try to fix #4541 by implementing the binary format of the module interface files.
This is mostly reverse engineering of https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/IfaceFiles.
I did not found any strong documentation that would give the complete binary spec.
I have tested the implementation with https://gist.github.com/haitlahcen/de33a593added86f86fc9a126d5b4164 and
test.sh
:Using the extracted implementation https://gist.github.com/haitlahcen/6232e957608d742a16581789e31dc918
GHC versions passing
./test.sh && nix-shell --pure --run "cabal run ./Test.hi"
I would also like to know if you guys consider this major change so that I update the changelog acordingly!