Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global source-repository-packages #9314

Open
TeofilC opened this issue Oct 6, 2023 · 23 comments
Open

Global source-repository-packages #9314

TeofilC opened this issue Oct 6, 2023 · 23 comments

Comments

@TeofilC
Copy link
Collaborator

TeofilC commented Oct 6, 2023

Summary

When overriding dependencies using source-repository-package, the overridden packages become local packages.

This means that neither the source code or build results get cached between projects.
It also makes it difficult for nix tooling (or similar) to provide pre-cached packages as cabal-install will be reluctant to use a version of a package coming from a package database provided by this tooling.

Proposal

Add a setting to source-repository-package stanzas that allow setting it to be local or global.

Default this to local if it depends on a local package, ie, to enforce a local package transitive closure property.

I think this default would work well for I think the two main use cases:

  • overriding an external dependency (global)
  • bringing in a tightly coupled sibling package (local)
@TeofilC
Copy link
Collaborator Author

TeofilC commented Oct 6, 2023

I think this is somewhat related to #5444. My impression is that a lot of the issues that people run into there could be fixed/alleviated by this proposal.

@TeofilC
Copy link
Collaborator Author

TeofilC commented Oct 6, 2023

I think this is one of the haskell.nix issues related to this: input-output-hk/haskell.nix#1367

@Mikolaj
Copy link
Member

Mikolaj commented Oct 6, 2023

I don't know enough about this to comment, but I wonder if there are any hidden pitfalls with allowing that. But if there are, perhaps that would lead to a beneficial revision of the assumption and a rewrite.

If the proposal is accepted, who is going to implement that?

@andreabedini
Copy link
Collaborator

Thanks for thinking about this. I think you mention separate problems that are unreleated to the local/global distinction.

This means that neither the source code or build results get cached between projects.

A global cache of source-repository-packages could be implemented without any change in how source-repository-package are considered by the solver. It does require designing how to get it to work well though. Projects enjoy a cabal clean once in a while, while a global cache without gc can be forever.

The cabal store already has this issue so perhaps a common solution is needed.

It also makes it difficult for nix tooling (or similar) to provide pre-cached packages as cabal-install will be reluctant to use a version of a package coming from a package database provided by this tooling.

The problem with getting cabal to work well inside nix is mostly due to the fact inside the nix build sandbox you cannot do arbitrary IO; and that it is hard to get cabal to not do IO. Nix could prefect the repositories for example but there is no simple way to get cabal to use the prefetched repositories rather than doing the checkout itself.

bringing in a tightly coupled sibling package (local)

I don't understand how this situation can arise. A tighly coupled sibiling package ... but living in a separate repository?

@TeofilC
Copy link
Collaborator Author

TeofilC commented Oct 6, 2023

If the proposal is accepted, who is going to implement that?

I'd be happy to give it a go. Though I can't make any promises about when I'll have capacity

@TeofilC
Copy link
Collaborator Author

TeofilC commented Oct 6, 2023

A global cache of source-repository-packages could be implemented without any change in how source-repository-package are considered by the solver

Yes this is a great point. I agree that caching and solving are distinct concerns. Though I think the difficulties that haskell.nix runs into suggests that some changes would need to made to the solver for cabal-install to accept source-repository-packages that aren't built in place.

I agree that implementing this feature would make cabal store bloat worse, but that's orthogonal to this.

Nix could prefect the repositories for example but there is no simple way to get cabal to use the prefetched repositories rather than doing the checkout itself.

Just to be clear I'm not suggesting we should be calling cabal-install inside a nix sandbox. The issues I have in mind are about calling cabal-install from a nix shell where we want to provide pre-compiled dependencies. But, really nix stuff is orthogonal to this issue. It just happens to be the case that solving would, I think, make it easier to solve some issues haskell.nix faces.

I don't understand how this situation can arise. A tighly coupled sibiling package ... but living in a separate repository?

Perhaps I shouldn't have mentioned this. I was trying to come up with a scenario where someone truly wanted a local source-repository-package. I agree that this doesn't really happen in the wild: people just use mono-repos for tightly coupled packages. But I could imagine someone doing this.

@TeofilC
Copy link
Collaborator Author

TeofilC commented Oct 6, 2023

Putting this another way.
If I want to override the source of one of my dependencies. I can do one of two things:

  1. provide a source-repository-package
  2. run my own hackage-server and upload a version there

(2) is a lot of work but nothing else about cabal-install's behaviour will change other than where the source code is coming from.
Whereas (1) will change how solving and caching works.

I would like to avoid having to set up my own hackage-server but still be able to override source code without changing behaviour.

@andreabedini
Copy link
Collaborator

If I want to override the source of one of my dependencies.

[without that package being considered local to the project]

Apologies if I sound repetitive. In this use case, I don't see the importance of the package not being considered local (beside caching, which has we discussed above is an idepedent problem).

  • The override is local to the project anyway.
  • As you notice, we cannot guarantee it can be considered global because it could depend on local packages.
  • By considering it local, you are guaranteed it will replace any other package with the same name (if you added it to a repository, you would need to use active-repositories: to make sure the solver picks that exact package and not another version from another repo).

I can do one of two things:

You can also use local no-index repository, which is much easier to set up.

One could think of caching some local packages but then we would have another "mode" to take into account (local cached / local not cached). I doubt this last idea would be worth the complexity.

@andreabedini
Copy link
Collaborator

The issues I have in mind are about calling cabal-install from a nix shell where we want to provide pre-compiled dependencies. But, really nix stuff is orthogonal to this issue. It just happens to be the case that solving would, I think, make it easier to solve some issues haskell.nix faces.

Wherever these issues are captured on GitHub, could you ping me on the respective GitHub issues?

The tl;dr is that haskell.nix needs to turn a source-repository-package into a package:. This is conceptually simple but a bit fragile when it's implemented by parsing and rewriting the cabal.project file. If there was a better API to do this, there would be no issues (and now there is, in the cabal-install library). Little steps and we will get there ☺️

@TeofilC
Copy link
Collaborator Author

TeofilC commented Oct 9, 2023

beside caching, which has we discussed above is an independent problem

Sorry I think I was unclear above. While I do think clearing things from the cabal store is independent, caching is the main appeal of this feature request.

My problem with source-repository-packages is that they mean that whenever I check out a new worktree of a repository, I then have to download and build external packages that I already have built in another worktree.

You can also use local no-index repository, which is much easier to set up.

Thanks for this link! I agree that this is easier to set up than a hackage-server, but it's still a lot of work compared to just having a set of source-repository-packages in a cabal.project and also it's less declarative.

Wherever these issues are captured on GitHub, could you ping me on the respective GitHub issues?

Done that now.

On haskell.nix stuff, I agree with you that a proper API is necessary for these two projects to interact. Yet, I think this feature request would solve this specific problem with a lot less work than setting that up. haskell.nix does the right thing for stack projects in this case and that's because stack deals with this in the way I'm suggesting.

@TeofilC
Copy link
Collaborator Author

TeofilC commented Oct 9, 2023

This comment from one of the linked issues seems to be calling for this feature as well: #5444 (comment)

@andreabedini
Copy link
Collaborator

@TeofilC Thank you for the link and the ping.

Let me emphasize that I didn't mean to diminish the problems with the current DX!

I should have pointed out right away that it was my experience with source-repository-packages in the cardano ecosystem that led me to write the tool that Michael mentions in that thread!

I think at some point I had a git-wrapper script that would share the checkouts between projects, it was saving multiple gigabytes of downloads (and time!).

Somehow, in my mind this is orthogonal from the local/global distinction but maybe it should not be (ew.g. tests enabled by default comes up a lot).

The problem cabal-install rebuilding things is a little bit more complicated though.

@andreabedini
Copy link
Collaborator

Oh well, it looks like Oleg is right (I should not be surprised 😂).

❯ cat cabal.project
source-repository-package
  type: git
  location: https://github.com/phadej/urakka.git
  tag: 86c83ad2454e0e74a045c16c9a4db5bb10a1d06b
❯ cabal build --dry-run urakka
Warning: There are no packages or optional-packages in the project
Build profile: -w ghc-9.4.7 -O1
In order, the following would be built (use -v for more details):
 - selective-0.5 (lib) (requires build)
 - urakka-0.1 (lib) (requires build)
❯ jq < dist-newstyle/cache/plan.json
{
  "cabal-version": "3.10.1.0",
  "cabal-lib-version": "3.10.1.0",
  "compiler-id": "ghc-9.4.7",
  "os": "linux",
  "arch": "x86_64",
  "install-plan": [
...
    {
      "type": "configured",
      "id": "urakka-0.1-364609ffd7fd88c1045153481bf8a14ab984f26c9697498a89487d5b89d64980",
      "pkg-name": "urakka",
      "pkg-version": "0.1",
      "flags": {},
      "style": "global",
      "pkg-src": {
        "type": "source-repo",
        "source-repo": {
          "type": "git",
          "location": "https://github.com/phadej/urakka.git",
          "tag": "86c83ad2454e0e74a045c16c9a4db5bb10a1d06b"
        }
      },
      "pkg-src-sha256": "a7dd01fae58367640979787ec241ddb1e2bb11a39e71e458351b2572340e3df2",
      "depends": [
        "async-2.2.4-eff2480a8af6cc85ce8a6a7dab595347df7c241b4f4aeefbc1a3723a7e0385d3",
        "base-4.17.2.0",
        "containers-0.6.7",
        "deepseq-1.4.8.0",
        "selective-0.5-a98c427228670a8c59520515817f843e0da9463e847bc07897e274223e7c9660",
        "stm-2.5.1.0",
        "transformers-0.5.6.2",
        "unliftio-core-0.2.1.0-8aed257eb4f08c6c33391e09fac0d910bc5052fbe9ba1705e4d448b47ce5deb7"
      ],
      "exe-depends": [],
      "component-name": "lib"
    }
...
  ]
}

So, it looks like source-repository-packages can automatically considered global. The checkout is not shared but the build is.

@andreabedini
Copy link
Collaborator

Another data point as I rediscover the codebase:

packages: https://github.com/phadej/urakka/tarball/86c83ad2454e0e74a045c16c9a4db5bb10a1d06b

works and it is considered global, and the build is cached in the store. The source distribution is not shared but downloading a tarball might be more efficient that a git checkout.

On other hand, extra-packages: only supports a package ids and optional-packages: only supports path locations.

At least the user guide clear about the use case for each option.
https://cabal.readthedocs.io/en/stable/cabal-project.html#specifying-the-local-packages

@TeofilC
Copy link
Collaborator Author

TeofilC commented Oct 10, 2023

The problem of cabal-install rebuilding is a bit more complicated though.

Yes, very fair! Rebuilding logic is always surprisingly tricky

So, it looks like source-repository-packages can automatically considered global. The checkout is not shared but the build is.

The build being shared is good to hear! I'm not sure how I've missed this thanks for pointing it out!

So, I guess this reduces this issue to just putting the checkouts into the cabal store?

[adding an entry to packages with an url] is considered global, and the build is cached in the store

This seems like a great workaround! I'll give this a try. Though I still think having a proper fix for the source-repository-package case is still the way to go, since that's the main thing I see people use.

One thing about this that really confuses me is why this is considered global at all. All of the *package options are documented as introducing local packages.

@michaelpj
Copy link
Collaborator

So a point of confusion about the local/global thing is the particular behaviour we see with haskell.nix shells, namely:

  1. We have a global package-db with foo-X in it
  2. And yet, when running cabal build, cabal chooses to build foo-X from the source-repository-package specification instead.

I think our hypothesis was that this was caused by source-repository-packages being considered as "local" packages. If that isn't true, then there's something else going on there. Something during planning

@andreabedini
Copy link
Collaborator

This seems like a great workaround! I'll give this a try.

This is the script I was using https://gist.github.com/andreabedini/9647abb8b3d647a4020fd07e568a85c4, of course use it at your own risk :)

Though I still think having a proper fix for the source-repository-package case is still the way to go, since that's the main thing I see people use.

I don't know if I can call it "a fix". We might have identified some wishes but nothing that is broken or wrong. There seem to be a desire for:

  1. git fetched packages
  2. cached globally (source and build)
  3. considered non project-local for tests/benchmarks (and I guess for target resolution too)

which cabal does (may) not have a solution for. This use case needs to be understood in the context of the current code and behaviour if we want to make progress.

One thing about this that really confuses me is why this is considered global at all. All of the *package options are documented as introducing local packages.

 extra-packages: package list with version bounds (comma separated)

    Specifies a list of external packages from Hackage, which should be considered local packages. The motivation for extra-packages is making libraries that are not dependencies of any package in the project available for use in ghci.

Either the documentation is wrong or we are totally confused about what is local and what is global.

~/Scratchpad/temp
❯ echo "extra-packages: aeson" > cabal.project
❯ cabal build --dry-run aeson
Warning: There are no packages or optional-packages in the project
Resolving dependencies...
Build profile: -w ghc-9.4.7 -O1
In order, the following would be built (use -v for more details):
 - vector-0.13.1.0 (lib) (requires download & build)
 - indexed-traversable-instances-0.1.1.2 (lib) (requires build)
 - witherable-0.4.2 (lib) (requires build)
 - semialign-1.3 (lib) (requires build)
 - aeson-2.2.1.0 (lib) (requires build)
❯ jq '."install-plan"[] | select(."pkg-name" == "aeson") | { "pkg-name", style }' < dist-newstyle/cache/plan.json
{
  "pkg-name": "aeson",
  "style": "global"
}

@TeofilC
Copy link
Collaborator Author

TeofilC commented Oct 10, 2023

This is the script I used

Thanks! I'll take a look

I don't know if I can call it "a fix". We might have identified some wishes but nothing that is broken or wrong. ...

Yes well put! I would also add the following requirement to your list:
4. easy transition: users should have to make no/minimal changes to their cabal.project to make use of this.

This use case needs to be understood in the context of the current code and behaviour if we want to make progress.

Definitely! It feels to me like someone needs to spend a bit of time figuring out all the details of how all these parts fit together, before we can figure out how to proceed? I'd be happy to tackle this at this at some point, but I can't promise when I'll have time to do that

@andreabedini
Copy link
Collaborator

andreabedini commented Oct 10, 2023

Notes for our future selves: this seems to be relevant bit


    -- For this local build policy, every package that lives in a local source
    -- dir (as opposed to a tarball), or depends on such a package, will be
    -- built inplace into a shared dist dir. Tarball packages that depend on
    -- source dir packages will also get unpacked locally.
    shouldBuildInplaceOnly :: SolverPackage loc -> Bool
    shouldBuildInplaceOnly pkg = Set.member (packageId pkg)
                                            pkgsToBuildInplaceOnly

    pkgsToBuildInplaceOnly :: Set PackageId
    pkgsToBuildInplaceOnly =
        Set.fromList
      $ map packageId
      $ SolverInstallPlan.reverseDependencyClosure
          solverPlan
          (map PlannedId (Set.toList pkgsLocalToProject))

    isLocalToProject :: Package pkg => pkg -> Bool
    isLocalToProject pkg = Set.member (packageId pkg)
                                      pkgsLocalToProject

    pkgsLocalToProject :: Set PackageId
    pkgsLocalToProject =
        Set.fromList (catMaybes (map shouldBeLocal localPackages))
        --TODO: localPackages is a misnomer, it's all project packages
        -- here is where we decide which ones will be local!

-- ...

shouldBeLocal :: PackageSpecifier (SourcePackage (PackageLocation loc)) -> Maybe PackageId
shouldBeLocal NamedPackage{}              = Nothing
shouldBeLocal (SpecificSourcePackage pkg) = case srcpkgSource pkg of
    LocalUnpackedPackage _ -> Just (packageId pkg)
    _                      -> Nothing

It's in Distribution.Client.ProjectPlanning, part of the project elaboration phase.

@andreabedini
Copy link
Collaborator

And local packages are defined in rebuildProject plan above:


distDirLayout

    -- Look for all the cabal packages in the project
    -- some of which may be local src dirs, tarballs etc
    --
    phaseReadLocalPackages :: ProjectConfig
                           -> Rebuild [PackageSpecifier UnresolvedSourcePackage]
    phaseReadLocalPackages projectConfig@ProjectConfig {
                               projectConfigShared,
                               projectConfigBuildOnly
                             } = do

      pkgLocations <- findProjectPackages distDirLayout projectConfig
      -- Create folder only if findProjectPackages did not throw a
      -- BadPackageLocations exception.
      liftIO $ do
        createDirectoryIfMissingVerbose verbosity True distDirectory
        createDirectoryIfMissingVerbose verbosity True distProjectCacheDirectory

      fetchAndReadSourcePackages verbosity distDirLayout
                                 projectConfigShared
                                 projectConfigBuildOnly
                                 pkgLocations

Looking at findProjectPackages it looks like everything was called "local package" (as the comment in the elaboration phase above seem to confirm).

@gbaz
Copy link
Collaborator

gbaz commented Oct 10, 2023

Here's an issue with pervasive global caching of src-repository-packages.

Suppose I point to HEAD of a repo. Now, with every build where that repo is updated, I will get a new checkout and a new compilation artifact, and these will go in the global store where they are never garbage collected. I think this could drive users somewhat nuts. We may want to limit to packages that point to a specific commit alone, for example.

@mpickering
Copy link
Collaborator

Using a local noindex repository sounds like the right thing to do for this use case.

@TeofilC
Copy link
Collaborator Author

TeofilC commented Oct 12, 2023

Suppose I point to HEAD of a repo. Now, with every build where that repo is updated, I will get a new checkout and a new compilation artifact, and these will go in the global store where they are never garbage collected. I think this could drive users somewhat nuts. We may want to limit to packages that point to a specific commit alone, for example.

This is a great point. I feel like it deserves its own ticket, since it can be a problem irrespective of this feature. Even if you only cache these locally, you'd still have to redownload with each build, which would be quite slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants