Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary module interface #4545

Closed

Conversation

haitlahcen
Copy link
Contributor

@haitlahcen haitlahcen commented Jan 26, 2019

This PR try to fix #4541 by implementing the binary format of the module interface files.
This is mostly reverse engineering of https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/IfaceFiles.
I did not found any strong documentation that would give the complete binary spec.

I have tested the implementation with https://gist.github.com/haitlahcen/de33a593added86f86fc9a126d5b4164 and test.sh:

#! /usr/bin/env nix-shell
#! nix-shell --pure -i bash -p haskell.compiler.{GHC_VERSION_TESTED}
ghc -fforce-recomp Test.hs && ghc --show-iface Test.hi

Using the extracted implementation https://gist.github.com/haitlahcen/6232e957608d742a16581789e31dc918

GHC versions passing ./test.sh && nix-shell --pure --run "cabal run ./Test.hi"

  • ghc7103
  • ghc802
  • ghc822
  • ghc843
  • ghc844
  • ghc861
  • ghc862
  • ghc863

I would also like to know if you guys consider this major change so that I update the changelog acordingly!

@haitlahcen haitlahcen force-pushed the binary-module-interface branch 2 times, most recently from f51bac6 to 2d1f543 Compare January 26, 2019 16:33
@dbaynard
Copy link
Contributor

dbaynard commented Feb 4, 2019

@mgsloan would you recommend a reviewer, please?

@haitlahcen Thanks for the PR!

I do have one major concern. The interface changes for every major GHC version, and so this will need to be revisited for every major version; it surely means old stack versions will not work with new GHC versions? I may be wrong, however — I don't know how stack handles this at the moment. Will a change to the dumped output format mean this is an issue already?

I did not found any strong documentation that would give the complete binary spec.

The page you linked refers to the GHC wiki. GHC is available as a library on hackage. I don't know whether current GHC can read interface files from old GHC versions. I suspect there's a package which handles this (like for template-haskell) — perhaps one of these packages does something.


In any case, I'm glad it works, for now. Are you able to use your fork, then?

@borsboom
Copy link
Contributor

borsboom commented Feb 4, 2019

What is the behaviour if you use it with an unsupported (new) GHC version? We do need some kind of reasonable fallback to support new versions, especially prerelease GHC versions, without having to release a new stack version. If I'm reading the code right, I think it'll just not track TH dependent files in this case, which I think is fine.

@haitlahcen
Copy link
Contributor Author

haitlahcen commented Feb 5, 2019

@dbaynard

I do have one major concern. The interface changes for every major GHC version, and so this will need to be revisited for every major version; it surely means old stack versions will not work with new GHC versions?

The older version will still work but the new interface will not be deserialized (which is only used to check for dirtiness btw).

Will a change to the dumped output format mean this is an issue already?

Yes, this already happened when they added a hash after the path of the TH dependencies.

I don't know whether current GHC can read interface files from old GHC versions.

GHC is not supporting older versions of the protocol. I think the main reason is that interface files are generated at compile time and were meant to be used temporarily (destroyed right after compilation).

The page you linked refers to the GHC wiki. GHC is available as a library on hackage. I don't know whether current GHC can read interface files from old GHC versions.

I did not found any package (other than GHC) implementing the protocols, but I might have missed something !
The GHC package is not usable because:

  • It only deserialize the latest version
  • Loading a file through the package requires an effectful "compilation context" (here we just want to read the dependencies with no further processing).

In any case, I'm glad it works, for now. Are you able to use your fork, then?

Indeed, I've successfully compiled Stack using the patched version !

@borsboom

What is the behaviour if you use it with an unsupported (new) GHC version? We do need some kind of reasonable fallback to support new versions, especially prerelease GHC versions, without having to release a new stack version. If I'm reading the code right, I think it'll just not track TH dependent files in this case, which I think is fine.

You are right, the current behavior just ignore the deserialization failure, hence not following deps dirtiness.

@dbaynard
Copy link
Contributor

dbaynard commented Feb 5, 2019

@haitlahcen Thanks for the detailed responses — they've certainly addressed my concerns. We should be able to review it, soon.

Indeed, I've successfully compiled Stack using the patched version !

I take it then that you are able to use your patched version of stack in production. Thanks for contributing back!

@dbaynard
Copy link
Contributor

dbaynard commented Feb 5, 2019

See https://neilmitchell.blogspot.com/2019/02/announcing-ghc-lib.html — it may be worth considering from a maintenance perspective. Good timing! I wonder whether @ndmitchell saw this issue…

@ndmitchell
Copy link
Contributor

@dbaynard I did not. So ghc-lib would certainly allow you to load one particular .hi file format from one particular GHC version across many GHC versions. But you need the slightly harder problem of leading N different versions across many GHC versions - which you might be able to depend on multiple ghc-lib versions simultaneously. However, it's likely a lot more complex than the code above, and still wouldn't work with pre-releases.

From what I saw, the use of the information obtained by parsing the .hi files is purely for warnings? If that's the case, shouldn't the failure to parse the .hi files be a warning, and then just ignore the other warnings, and it would work for pre-releases just fine.

Do not that Weeder also piggy backs off the .hi files that Stack creates, so if Stack could still create them with some flag that would be helpful for Weeder. Alternatively, Weeder could move to dumping the .hi-dump files itself on demand. As a consequence, this change is a major change.

Copy link
Contributor

@snoyberg snoyberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to see a unit test or two added showing that this parsing works, ideally including .hi files from a few different versions of GHC. Overall, I like this direction, thank you!

src/Stack/Build/Execute.hs Outdated Show resolved Hide resolved
src/Stack/Package.hs Outdated Show resolved Hide resolved
src/Stack/ModuleInterface.hs Outdated Show resolved Hide resolved
src/Stack/ModuleInterface.hs Show resolved Hide resolved
@haitlahcen
Copy link
Contributor Author

@snoyberg Thanks for the review, I'll take some time to add the according tests.

@snoyberg
Copy link
Contributor

snoyberg commented Mar 6, 2019

Hey @haitlahcen, just bumping. Are you still planning on adding the tests here?

@haitlahcen
Copy link
Contributor Author

@snoyberg indeed, just busy right now but will definitely do it

@snoyberg
Copy link
Contributor

snoyberg commented Mar 6, 2019

Cool, thanks for the quick response 👍

@snoyberg
Copy link
Contributor

snoyberg commented Apr 9, 2019

I'm going to recommend we move ahead with merging this as-is, even without the additional tests, and rely on integration tests to catch bugs. A future PR to improve the unit tests will definitely be appreciated.

@snoyberg
Copy link
Contributor

@haitlahcen Did you notice my last comments?

snoyberg and others added 17 commits April 30, 2019 06:42
Specifically, we're adding in this commit:

snoyberg/filelock@4f08049

Without this change, very often on large builds we end up in a situation
where:

* Multiple threads are working in parallel
* Thread A needs to work with the database
* Thread A takes a file lock on the database
* Thread B forks a child process, which inherits the database lock
* Thread A releases the file lock
* Thread C attempts to take the file lock, but is blocked by B's child

As a demonstration, lsof showed me this on my system:

COMMAND     PID    USER   FD   TYPE DEVICE SIZE/OFF       NODE NAME
stack     20216 michael   12w   REG    1,4        0 8613331527 stack.sqlite3.pantry-write-lock
Cabal-sim 22736 michael   12w   REG    1,4        0 8613331527 stack.sqlite3.pantry-write-lock
ghc       22800 michael   12w   REG    1,4        0 8613331527 stack.sqlite3.pantry-write-lock

With this change in place, I witnessed 0 cases of the file lock not
acquired message from Stack on a large build.
…xec-filelock

Use patched filelock with CLOEXEC
Instead of FromJSON instances for Repo and PackageMetadata, use the
Data.Aeson.Extended mechanisms to properly track which fields are used.
The completed package information already contains the package name,
bypassing the need for a package completion call if the cache (lock
file) already contains that information. This bypasses a spurious
warning about lacking cryptographic hashes, and likely improves
performance, for the repo and archive cases.

This wasn't discovered initially because the Hackage use case never had
an overhead: the specification of a RPLIHackage already contains the
package name in all cases.
This reduces the amount of recompilation that has to happen when
iterating on the code base.
…-improved-from-json-instances

Fix some misplaced parse warnings commercialhaskell#4789
…ve-th-cpp-in-main

Remove CPP and TH in Main module
…-symlinks-to-dirs

Support symlinks to directories (fixes commercialhaskell#4776)
…-package-name-from-completed

Get package name from completed information commercialhaskell#4789
@snoyberg
Copy link
Contributor

snoyberg commented May 3, 2019

Just noticed your comments @ndmitchell.

From what I saw, the use of the information obtained by parsing the .hi files is purely for warnings?

No, the primary purpose is to determine which files are depended upon by Template Haskell, so that proper dirtiness checking can occur.

If that's the case, shouldn't the failure to parse the .hi files be a warning, and then just ignore the other warnings, and it would work for pre-releases just fine.

Yes, failure to parse should be a warning, agreed. I just checked the current code, and from what I can tell that's exactly what occurs.

Do not that Weeder also piggy backs off the .hi files that Stack creates, so if Stack could still create them with some flag that would be helpful for Weeder.

Do you mean the .hi files, or the .hi-dump files? If the former: we're definitely still generating them. If the latter: this change does break things for weeder currently, though it's relatively trivial to add back the behavior yourself with:

--ghc-options "-ddump-to-file -ddump-hi"

Alternatively, Weeder could move to dumping the .hi-dump files itself on demand.

Also alternatively: we could take this .hi parsing code, extract it to a separate library, and both Weeder and Stack could depend upon it. That would bypass the need for either tool to deal with the slow text file generation and parsing.

As a consequence, this change is a major change.

Can you elaborate on what you mean here by "major change?"

@ndmitchell
Copy link
Contributor

Weeder parses the .hi-dump files. Given the domain of Weeder, I am less concerned about slowness, and the information I grab from .hi files may be a superset of what Stack uses. If a nice .hi file parsing library appears I'll use it, otherwise not a big deal.

As a consequence, this change is a major change.

Major in the sense that there is some fallout. But it's not an issue.

@snoyberg
Copy link
Contributor

snoyberg commented May 5, 2019

Thanks for the clarification @ndmitchell. I've created an updated PR including the changes here, as well as:

  • Separating the .hi file parsing code into a new sub-package called hi-file-parser
  • Removing the extraneous whitespace changes in Stack.Package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compilation not terminating due to giant .dump-hi (4gb) file when using type families
8 participants