Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design discussion: for implicit snapshots, unify snapshot and local DB? #3783

Closed
mgsloan opened this issue Jan 16, 2018 · 3 comments
Closed

Comments

@mgsloan
Copy link
Contributor

mgsloan commented Jan 16, 2018

Previously, the discussion of implicit snapshots has mostly been around moving dependency packages (things from hackage, git repos) into the snapshot DB such that there can be more package sharing. The primary motivation, however, is making it so that build options can reliably applied to all snapshot packages without much hassle (see #3782).

However, what if we went even further and possibly simpler, and put everything in one package DB (except for global + extra)? So local project packages in the filesystem would also be installed to this DB. This would unify the local and snapshot DB.

I'm thinking this would work something like this:

  • Each stack.yaml would get its own package DB.

  • All packages, even local file ones, get installed somewhere in the stack root (~/.stack). The ones involved in the current project get registered to the project's package DB. For local file packages, the install path is based on hashing the local file package's path.

    • I considered hashing the contents of the package's files and using this for the path, but this would lead to lots of garbage. It could potentially mean a lot less rebuilds when switching branches, though. Hmm! Could make this an option.
  • When possible, this means that multiple project configurations that reference the same local package can share compilation results! There is an issue with only hashing the local package's path, though, switching configurations would often require recompilation when sharing is not possible.

@snoyberg @borsboom Am I overlooking something that makes this infeasible? This could be both an optimization and a simplification.

@snoyberg
Copy link
Contributor

Overall -1:

  • I doubt in practice it will ever work to share results of local packages. I don't see how the dirtiness detection will work as desired there.
  • Overall, this seems like a recipe for creating even more disk space garbage, by moving more ephemeral files from .stack-work into ~/.stack.
  • I'm also worried that it will more easily allow infecting the ~/.stack databases with modified local packages.
  • I'm not really convinced that this brings any real saving of complexity. We still need to perform all of the calculations of which packages depend on which other packages, so the package promotion concept isn't really lost.
  • It seems that this is likely to complicate the story around the precompiled cache, which relies on files in snapshot database having a predetermined set of flags/options/dependencies.

@mgsloan
Copy link
Contributor Author

mgsloan commented Jan 16, 2018

I doubt in practice it will ever work to share results of local packages. I don't see how the dirtiness detection will work as desired there.

How's this for an implementation approach: the build cache would also get copied when installing. This way we can know if the local package has changed since it was installed.

Overall, this seems like a recipe for creating even more disk space garbage, by moving more ephemeral files from .stack-work into ~/.stack.

True, but I think implicit snapshots will absolutely need GC. So, I don't see this as a problem.

I'm also worried that it will more easily allow infecting the ~/.stack databases with modified local packages.

There wouldn't be any ~/.stack databases, except for the global DBs associated with compilers. The installed packages would live there, though. I think this would actually greatly reduce the risk of infectiousness of the variety we have today of "$everything" ghc-options permanently affecting snapshot packages.

This might require not using Cabal's copy / register step, which wouldn't work so well for custom setup. Since, with that packages need to be installed into a DB rather than just being referencable from a DB. I think I recall hearing about a hack where they just get installed into a DB that only has the single package. Seems gnarly.

I'm not really convinced that this brings any real saving of complexity. We still need to perform all of the calculations of which packages depend on which other packages, so the package promotion concept isn't really lost.

Package promotion is computed separately from plan construction. I believe that package promotion logic could be removed if it's all in one DB. We would unfortunately still need promotion for global to project DB.

It seems that this is likely to complicate the story around the precompiled cache, which relies on files in snapshot database having a predetermined set of flags/options/dependencies.

The precompiled cache hash will include the path to the local package, and, as usual, all of its dependencies. In order to determine when a local package in the DB needs to be rebuilt, we'd compare the installed build cache with the state of the local files.

There is a tricky thing, though, take the following case:

  • project A and project B share local package Z (and have identical deps for it)
  • user modifies Z and rebuilds project A. This modifies the installed package referenced by both DBs. This causes package unregistering / rebuilding in project A's DB, for everything that depended on Z
  • user rebuilds project B. Now it needs to have stored metadata that lets it know to unregister / rebuild everything that depended on Z.

I realize this is a major change in how things work, but I think it is a more elegant and functional design. I certainly haven't thought through every detail.

@snoyberg
Copy link
Contributor

Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants