Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow running a reproducible lock #250

Merged
merged 26 commits into from
Feb 1, 2022
Merged

Conversation

NathanReb
Copy link
Contributor

@NathanReb NathanReb commented Jan 24, 2022

This PR adds the ability to configure a list of repositories URLs that the solver will use, instead of the local opam configuration.
It is complemented with the ability to set opam global variables making lock fully reproducible and independent from local configuration.

This is a first step towards the full feature as it only supports local filesystem URLs for repositories but this will allow use to write tests for the solver that can be run from the CI or any machine and this has been long awaited!

How does this work

We now look for 2 extensions in the opam files of the target packages:

  1. x-opam-monorepo-opam-repositories
  2. x-opam-monorepo-global-opam-vars

If 1 is present we use the "reproducible" solver instead of the usual one. It will ignore the local opam configuration in terms of repositories and pins and will only use the repos listed in that field, along with any pin-depends it finds there.
The content of the field is simply a list of URLs.

2 can be used to specify the values of opam global variables, such as arch, os, etc. It's ignored if the solver runs in local config mode but will be used if there in reproducible mode. Since those variables must be set for the solver to return anything sensible, if 2 is missing in reproducible mode, we use the global vars set in the user's opam install, assuming they want a lockfile that will work on their machine at the very least. This is obviously not the prefered way for reproducibility but since setting those variables correctly can be a bit tedious I thought it would be good to have such a default behaviour.

Those two fields are saved to the lockfile itself so it is easy to know how the lockfile was computed.

When there are multiple target packages, those fields are extracted from each of them and merged together.

Current limitations

Support for remote URL

At the moment it only handles file:// URLs and we will have to support remote URLs before releasing this obviously. This means downloading the repo from the remote source so the solver can use it.
When we had this we will also have to work on properly caching repos. Opam supports this already and our current mechanism for downloading through opam already uses a git cache we should be able to use for repos as well.
We discussed what we should support, both with Marek and the Opam team and it seems sensible to only support VCS URLs (or local ones) in the repositories field as this is the only way to ensure reproducibility. Tarballs URLs, such as ocaml.org which is used for the default opam repo, are not very reliable as updated tarballs are uploaded there every now and then. The opam team is also considering ruling out http for repositories so this is potentially aligned with this goal. If explicit need for http URLs arise, we can reconsider at that point.

Properly detecting known repos

I ported the code detecting that dune-universe/opam-overlays wasn't set to work with the reproducible solver as well but it will obviously need to be polished to better recognize the repo if it's pinned down to a commit for instance.

Priorities between repos

At the moment there is no way to set priorities between repositories but in practice there is one as the repos are looked up in order, the first repo that defines a pkg.version wins and it will shadow other such packages from the other repos. This order is based on how the URLs are sorted in a set and the user has no way to control this. We will need to add some mechanism. Using the list order from the field seems sensible but it makes merging the repos field slightly harder and we need to define this properly. Setting a priority explicitly with a integer is also something we could look into eventually.

Extracting reproducible solver config from the regular one

It would be useful to extract the opam global vars extracted from the opam config and used by the reproducible solver when the global-opam-vars field isn't set, into the lockfile. That way, even if the lock step is not strictly speaking reproducible per-se, the lockfile contains the information to make it so.

With the same goal in mind, we could also extract the repositories used by the regular solver. This is slightly harder to achieve as the opam libraries API doesn't allow it, or at least, I couldn't find how to do this easily.

Using it for testing

The time of testing has come!

You'll find in this PR a simple solver test that uses the reproducible solver to make the test, well, reproducible!
It's just a regular cram test. The folder contains an opam-repository in repo/ and our target package opam file, setting the opam-repositories field, pointing to the local repository:

x-opam-monorepo-opam-repositories: [
  "file://$OPAM_MONOREPO_CWD/repo"
]

You'll note the $OPAM_MONOREPO_CWD in the URL. file:// URLs only allow absolute paths so I added a feature, directly in the tool to rewrite this var to the project root it uses (by default the current working directory or the value of --root <path>). This allows to write the tests and can potentially be useful to users that would want to use locally available repos for the solver. I don't think we should document this for now but if anybody needs it, we can mention this feature.

I also extracted the base repo into test/bin/minimal-repo. It contains a realistic subset of base packages such as dune, ocaml and all associated metadata packages, see:

test/bin/minimal-repo/
├── packages
│   ├── base-bigarray
│   │   └── base-bigarray.base
│   │       └── opam
│   ├── base-threads
│   │   └── base-threads.base
│   │       └── opam
│   ├── base-unix
│   │   └── base-unix.base
│   │       └── opam
│   ├── dune
│   │   └── dune.2.9.1
│   │       └── opam
│   ├── ocaml
│   │   └── ocaml.4.13.1
│   │       └── opam
│   ├── ocaml-base-compiler
│   │   └── ocaml-base-compiler.4.13.1
│   │       └── opam
│   ├── ocaml-config
│   │   └── ocaml-config.2
│   │       └── opam
│   └── ocaml-options-vanilla
│       └── ocaml-options-vanilla.1
│           └── opam
└── repo

The opam files were copied from upstream opam-repository.

I simply copied it around, I think it's okay for now. In the future we might consider more advanced techniques such as dune rules copying the base set of packages but it honestly feels overkill at the moment.

We could also have dummy package metadata for dune to simplify this structure but I thought it would be good to have the test use a somewhat realistic repo.

@NathanReb
Copy link
Contributor Author

I'm aware that some parts, especially the parsing of the new extensions, could and should be unit tested. I'm planning on adding this before we merge but I think the rest can be reasonably reviewed in the meantime!

@NathanReb
Copy link
Contributor Author

I also need to add documentation for this btw!

Copy link
Member

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a substantial PR with very useful functionality. 👍

For now these are my observations/questions after a first read-through of the code. I think it might be better to post them now (if only to prevent Github from discarding my comments and give you time to answer) and continue reviewing it more in depth tomorrow.

cli/lock.ml Outdated Show resolved Hide resolved
cli/lock.ml Outdated Show resolved Hide resolved
cli/lock.ml Outdated Show resolved Hide resolved
cli/lock.ml Show resolved Hide resolved
cli/lock.ml Show resolved Hide resolved
lib/opam_solve.ml Show resolved Hide resolved
lib/opam_solve.ml Show resolved Hide resolved
lib/source_opam_file.ml Outdated Show resolved Hide resolved
lib/source_opam_file.ml Outdated Show resolved Hide resolved
lib/source_opam_file.ml Outdated Show resolved Hide resolved
@NathanReb NathanReb force-pushed the solve-using-explicit-repos branch from 35c733d to 746d259 Compare January 25, 2022 14:25
@NathanReb NathanReb force-pushed the solve-using-explicit-repos branch from 7373afc to 84c9a82 Compare January 27, 2022 11:11
Copy link
Member

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the detection is a good thing, but maybe just checking for the repo files would make it more robust and reliable?

lib/project.ml Outdated Show resolved Hide resolved
lib/project.ml Outdated Show resolved Hide resolved
lib/project.ml Outdated Show resolved Hide resolved
stdext/result.mli Show resolved Hide resolved
@Leonidas-from-XIV Leonidas-from-XIV self-requested a review February 1, 2022 14:01
Copy link
Member

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Some final tidbits that just make it slightly nicer but I thing it is ready to merge!

lib/opam_solve.ml Outdated Show resolved Hide resolved
lib/opam_solve.ml Outdated Show resolved Hide resolved
@NathanReb NathanReb force-pushed the solve-using-explicit-repos branch from 1143a9f to c6da55c Compare February 1, 2022 14:39
@NathanReb NathanReb merged commit f26027b into main Feb 1, 2022
@Leonidas-from-XIV Leonidas-from-XIV deleted the solve-using-explicit-repos branch February 1, 2022 15:05
| Some path when Opam.Url.is_local_filesystem url ->
Ok (OpamFilename.Dir.to_string OpamFilename.Op.(path / "packages"))
| _ ->
(* TODO before release *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this comment about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an actual todo, to add support for git URLs

l "Could not retrieve opam file for %a: %s" Opam.Pp.package pkg msg);
assert false

(* TODO catch exceptions and turn to error *)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already present before this PR, this is just moved code.

let open Re in
compile (seq [ str "file://"; str opam_monorepo_cwd_var ])

let rewrite_url_in ~pos ~opam_monorepo_cwd url_str =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this rewrite should be a bit more strict. If $OPAM_MONOREPO_CWD is not part of the URL, there might be security implications since opam-monorepo can read anywhere in the filesystem.

NathanReb added a commit to NathanReb/opam-repository that referenced this pull request Apr 19, 2022
CHANGES:

### Added

- Add opam extensions `x-opam-monorepo-opam-repositories` and
  `x-opam-monorepo-global-opam-vars` to make `lock` fully reproducible.
  (tarides/opam-monorepo#250, tarides/opam-monorepo#253, @NathanReb)
- Show an error message when the solver can't find any version that satisfies
  the requested version constraint in the user's OPAM file (tarides/opam-monorepo#215, tarides/opam-monorepo#248,
  @Leonidas-from-XIV)
- Allow packages to be marked as being provided by Opam and not to be pulled by
  `opam-monorepo`. To control this a new optional Opam file field,
  `x-opam-monorepo-opam-provided` is introduced. Its value is a list of package
  names that are to be excluded from being pulled (tarides/opam-monorepo#234, @Leonidas-from-XIV)
- Show an error message when the OCaml version of the lock file does not match
  the OCaml version of the switch (tarides/opam-monorepo#267, tarides/opam-monorepo#268, @Leonidas-from-XIV)
- Generate a `duniverse/README.md` file to explain the basics of
  `opam-monorepo` in the vendored directory (tarides/opam-monorepo#272, tarides/opam-monorepo#274, @Leonidas-from-XIV)
- Add a `--prefer-cross-compile` flag for the solver to select cross-compiling
  versions of packages when available. This is determined through the presence
  of the `"cross-compile"` tag in the opam metadata.

### Changed

- Bump lockfile version to 0.3 (tarides/opam-monorepo#285, @NathanReb)
- Mark packages to be pulled by opam-monorepo with the `vendor` variable so
  using OPAM with `opam install --deps-only --locked .` will not install
  packages that will be installed with `opam-monorepo pull` (tarides/opam-monorepo#237,
  @Leonidas-from-XIV)

### Deprecated

### Fixed

- Fix a bug where a package which had a single version that built with dune and got selected by the solver
  would be reported has having no version building with dune. (tarides/opam-monorepo#245, @Leonidas-from-XIV)
- Fix the solver so it does not select beta versions of the compiler unless
  forced to by version constraints or `--ocaml-version`. (tarides/opam-monorepo#269, @NathanReb)

### Removed

- Drop support for lockfile versions 0.2 and lower (tarides/opam-monorepo#285, @NathanReb)

### Security
NathanReb added a commit to NathanReb/opam-repository that referenced this pull request Apr 20, 2022
CHANGES:

### Added

- Add opam extensions `x-opam-monorepo-opam-repositories` and
  `x-opam-monorepo-global-opam-vars` to make `lock` fully reproducible.
  (tarides/opam-monorepo#250, tarides/opam-monorepo#253, @NathanReb)
- Show an error message when the solver can't find any version that satisfies
  the requested version constraint in the user's OPAM file (tarides/opam-monorepo#215, tarides/opam-monorepo#248, tarides/opam-monorepo#290
  @Leonidas-from-XIV)
- Allow packages to be marked as being provided by Opam and not to be pulled by
  `opam-monorepo`. To control this a new optional Opam file field,
  `x-opam-monorepo-opam-provided` is introduced. Its value is a list of package
  names that are to be excluded from being pulled (tarides/opam-monorepo#234, @Leonidas-from-XIV)
- Show an error message when the OCaml version of the lock file does not match
  the OCaml version of the switch (tarides/opam-monorepo#267, tarides/opam-monorepo#268, @Leonidas-from-XIV)
- Generate a `duniverse/README.md` file to explain the basics of
  `opam-monorepo` in the vendored directory (tarides/opam-monorepo#272, tarides/opam-monorepo#274, @Leonidas-from-XIV)
- Add a `--prefer-cross-compile` flag for the solver to select cross-compiling
  versions of packages when available. This is determined through the presence
  of the `"cross-compile"` tag in the opam metadata.

### Changed

- Bump lockfile version to 0.3 (tarides/opam-monorepo#285, @NathanReb)
- Mark packages to be pulled by opam-monorepo with the `vendor` variable so
  using OPAM with `opam install --deps-only --locked .` will not install
  packages that will be installed with `opam-monorepo pull` (tarides/opam-monorepo#237,
  @Leonidas-from-XIV)

### Fixed

- Fix a bug where a package which had a single version that built with dune and got selected by the solver
  would be reported has having no version building with dune. (tarides/opam-monorepo#245, @Leonidas-from-XIV)
- Fix the solver so it does not select beta versions of the compiler unless
  forced to by version constraints or `--ocaml-version`. (tarides/opam-monorepo#269, @NathanReb)

### Removed

- Drop support for lockfile versions 0.2 and lower (tarides/opam-monorepo#285, @NathanReb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants