Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use ghc-pkg recache #5639

Closed
juhp opened this issue Nov 24, 2021 · 10 comments
Closed

use ghc-pkg recache #5639

juhp opened this issue Nov 24, 2021 · 10 comments

Comments

@juhp
Copy link
Contributor

juhp commented Nov 24, 2021

ghc-pkg [un]register is very slow, specially when many packages are involved.
Would it not be possible for stack to use a single ghc-pkg recache instead,
which is way faster when there are a large number of packages being added/removed.

The stackage build typically unregisters 100's of packages and this takes a very long time with unregister (often 30min or more).
(In Fedora we use recache instead of unregister/register and it has sped package updates very much.)

@juhp
Copy link
Contributor Author

juhp commented Aug 18, 2023

Revisiting this again since it would be a big win for curator, which often unregisters hundreds/thousands of packages.

I think the code in question is https://github.com/commercialhaskell/stack/blob/master/src/Stack/Build/Execute.hs#L863

@mpilgrem
Copy link
Member

@juhp, based on https://downloads.haskell.org/~ghc/9.2.8/docs/html/users_guide/packages.html#package-management-the-ghc-pkg-command for ghc-pkg recache, I assume the 'recache' alternative (a) removes the relevant files from the relevant package database directory and then (b) uses ghc-pkg recache to re-create the binary cache file package.cache for the seleted database.

I am (perhaps naively) assuming that ghc-pkg unregister <list_of_packages> (a) removes the relevant files from the package database directory and then (b) amends (as required) the binary cache file.

I am also assuming that when a small number of packages are being removed - presumably the usual case for most users - ghc-pkg unregister is quicker than the 'recache' alternative (a 'small amend' versus a 're-creation'). Is that right? That is, would Stack switch to the 'recache' alternative from 'unregister' only if a sufficiently large number of packages were being unregistered at the same time?

@juhp
Copy link
Contributor Author

juhp commented Aug 19, 2023

Thanks @mpilgrem

@juhp, based on https://downloads.haskell.org/~ghc/9.2.8/docs/html/users_guide/packages.html#package-management-the-ghc-pkg-command for ghc-pkg recache, I assume the 'recache' alternative (a) removes the relevant files from the relevant package database directory and then (b) uses ghc-pkg recache to re-create the binary cache file package.cache for the selected database.

I think so yes

I am (perhaps naively) assuming that ghc-pkg unregister <list_of_packages> (a) removes the relevant files from the package database directory and then (b) amends (as required) the binary cache file.

I believe so: I suspect it may update the db, package by package perhaps.

I don't know exactly why unregister is soo slow, but I think it is a well-known ghc-pkg issue.
It would be great if someone could improve it.

I am also assuming that when a small number of packages are being removed - presumably the usual case for most users - ghc-pkg unregister is quicker than the 'recache' alternative (a 'small amend' versus a 're-creation'). Is that right?

I don't believe that is true. AFAIK the performance of recache is never worse than register/unregister.
And for multiple packages running recache once is significantly faster.

For example if I install or remove 500 Fedora Haskell packages say, I don't even notice recache being run at the end.

@mpilgrem
Copy link
Member

@mpilgrem
Copy link
Member

It seems to me that the problem with ghc-pkg may be (in GHC's utils/ghc-pkg/Main.hs)

"unregister" : pkgarg_strs@(_:_) -> do
        forM_ pkgarg_strs $ \pkgarg_str -> do
          pkgarg <- readPackageArg as_arg pkgarg_str
          unregisterPackage pkgarg verbosity cli force

That is, 'unregister packages' is 'unregister one package after another', and (if I understand correctly) each 'unregister' also checks for newly-broken packages and performs a recache - and each 'remove a package' also filters a (possibly long) list of packages. Extracts:

unregisterPackage :: PackageArg -> Verbosity -> [Flag] -> Force -> IO ()
unregisterPackage = modifyPackage RemovePackage

modifyPackage
  :: (InstalledPackageInfo -> DBOp)
  -> PackageArg
  -> Verbosity
  -> [Flag]
  -> Force
  -> IO ()
modifyPackage fn pkgarg verbosity my_flags force = do
...
      -- ...but do consistency checks with regards to the full stack
      old_broken = brokenPackages (allPackagesInStack db_stack)
      rest_of_stack = filter ((/= db_name) . location) db_stack
      new_stack = new_db_ro : rest_of_stack
      new_broken = brokenPackages (allPackagesInStack new_stack)
      newly_broken = filter ((`notElem` map installedUnitId old_broken)
                            . installedUnitId) new_broken
...
  when (not (null newly_broken)) $
      dieOrForceAll force ("unregistering would break the following packages: "
              ++ unwords (map displayQualPkgId newly_broken))

  changeDB verbosity cmds db db_stack

recache :: Verbosity -> [Flag] -> IO ()
recache verbosity my_flags = do
...
  changeDB verbosity [] db_to_operate_on _db_stack

I suppose you don't want to lose the checking for newly-broken packages, but you only want to do it once for each 'bulk' unregister.

@mpilgrem
Copy link
Member

mpilgrem commented Aug 24, 2023

It seems to me that what would be ideal is:

  • future versions of GHC shipping with an 'efficient' ghc-pkg (a version that efficiently bulk unregisters - the incentive for GHC #12637 and the complaint of GHC #16324);
  • future versions of Stack use 'efficient' ghc-pkg if it is available or an effective back-port of 'efficient' ghc-pkg functionality if it is not available.

I think I have a pull request (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11142) for an 'efficient' ghc-pkg (at least, my code compiles - I need to work out a way to test it).

@juhp
Copy link
Contributor Author

juhp commented Aug 25, 2023

Thanks @mpilgrem for the analysis and the MR - this sounds great!!

BTW I curious about --force though (one would expect it to "speed it up" things, but it seemingly doesn't?)

@mpilgrem
Copy link
Member

@juhp, I think ghc-pkg --force (which sets ForceAll) features here:

  when (not (null newly_broken)) $
      dieOrForceAll force ("unregistering would break the following packages: "
              ++ unwords (map displayQualPkgId newly_broken))

dieOrForceAll :: Force -> String -> IO ()
dieOrForceAll ForceAll s = ignoreError s
dieOrForceAll _other s   = dieForcible s

That is, --force does not avoid the check (null newly_broken) being evaluated, it only avoids the consequences of the check failing.

@mpilgrem
Copy link
Member

Notes for myself on Stack's unregistering generally:

@juhp
Copy link
Contributor Author

juhp commented Sep 13, 2023

Really big thank you for this, Mike! ❤️

mpilgrem added a commit that referenced this issue Sep 13, 2023
Fix #5639 Backport efficient ghc-pkg unregister
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants