Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.2] Build occurrences index for Merlin #10422

Merged
merged 21 commits into from
May 27, 2024
Merged

Conversation

voodoos
Copy link
Collaborator

@voodoos voodoos commented Apr 15, 2024

This PR adds new rules that drive the ocaml-index tool to aggregate a workspace index. The rules are a bit simpler as the ones in #6762 since now part of the indexation will is done by the compiler in 5.2.

Building the index involves three steps:

  • Use the -bin-annot-occurrences flag to have the compiler store partial indexes in the cmt files.
  • Run the ocaml-index tool to make an index for each stanza (actually each compilation-context).
  • Run the ocaml-index tool to aggregate every stanza's index into a global project index.

I ran simple benchmarks on Dune itself. These are cold builds ran after ./dune.exe clean:

Command Mean [s] Min [s] Max [s] Relative
./dune.exe build @check (without -bin-annot-occurrences) 8.632 ± 0.167 8.454 8.950 1.00
./dune.exe build @check 8.637 ± 0.108 8.433 8.796 1.00
./dune.exe build @check @ocaml-index 13.597 ± 0.173 13.384 13.904 1.00
  • There are no perceptible performance cost of using the -bin-annon-occurrences flag when writing cmt files, so the best option is probably to always use that flag to simplify the rules.
  • Actually building and gathering the index is costly (especially from a cold build), so it should be opt-in. (We have several optimizations in mind that should improve that situation in the future.)

There are several moving parts that I would like to discuss in the next dune-dev meeting:

  1. The indexing rules have specific dependencies that I am unsure how best to check/enforce:
    • The external ocaml-index tool must be installed
    • The -bin-annot-occurrences flag requires OCaml 5.2
  2. I am unsure of the best way to make indexation opt-in (it can be expensive on massive projects)
    • I initially added a dune-workspace-level option that I remove in cd34ae1 following a discussion with @emillon
    • I currently think we could simply keep the @ocaml-index alias but remove it from @default so that users have to explicitly use it.

Other issues to consider:

  • The Melange compiler is not yet accepting the -bin-annot-occurrences flag
image

@voodoos voodoos added the requires-team-discussion This topic requires a team discussion label Apr 15, 2024
@emillon
Copy link
Collaborator

emillon commented Apr 16, 2024

For a first release, I would consider the following strategy to select whether this feature is enabled:

  • attach these rules to @ocaml-index
  • if ocaml-index is available, and OCaml >= 5.2 (which I suppose is an opam constraint already), attach @ocaml-index to @check

(This is similar to how we set up sherlodoc, though the situation is a bit different)

That way:

  • you can test this in dune's test suite, by providing a fake ocaml-index binary (check how sherlodoc is tested)
  • people can use it in an opt-in way in their workspaces
  • we can find later a way to configure this in a more precise way.

Makefile Outdated Show resolved Hide resolved
@@ -1,5 +1,4 @@
[@@@alert unstable "The API of this library is not stable and may change without notice."]
[@@@alert "-unstable"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just so we don't forget: you can work around that by passing -alert -unstable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with alerts, I tried to add -alert but it still fails:

File "otherlibs/stdune/src/stdune.ml", line 2, characters 0-29:
2 | [@@@alert "-alert -unstable"]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error (warning 47 [attribute-payload]): illegal payload for attribute 'alert'.
Ill-formed list of alert settings

File "otherlibs/stdune/src/stdune.ml", line 1, characters 4-9:
1 | [@@@alert unstable "The API of this library is not stable and may change without notice."]
        ^^^^^
Error (warning 53 [misplaced-attribute]): the "alert" attribute cannot appear in this context

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking a step back: why did you make that change? in other words, what breaks when you leave these attributes?

Copy link
Collaborator Author

@voodoos voodoos May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this error when I build with 5.2~beta2:

File "otherlibs/stdune/src/stdune.ml", line 1, characters 4-9:
1 | [@@@alert unstable "The API of this library is not stable and may change without notice."]
        ^^^^^
Error (warning 53 [misplaced-attribute]): the "alert" attribute cannot appear in this context

I found that related change in the compiler: ocaml/ocaml#12451

@ccasin I am not sure if our usage of [@@@alert deprecated here is indeed wrong or if the warning is triggering unexpectedly?

src/dune_rules/alias0.ml Show resolved Hide resolved
@voodoos
Copy link
Collaborator Author

voodoos commented Apr 17, 2024

For a first release, I would consider the following strategy to select whether this feature is enabled:

* attach these rules to `@ocaml-index`

* if `ocaml-index` is available, and OCaml >= 5.2 (which I suppose is an opam constraint already), attach `@ocaml-index` to `@check`

(This is similar to how we set up sherlodoc, though the situation is a bit different)

That way:

* you can test this in dune's test suite, by providing a fake `ocaml-index` binary (check how sherlodoc is tested)

* people can use it in an opt-in way in their workspaces

* we can find later a way to configure this in a more precise way.

I wonder whether we should attach @ocaml-index to @check to make it really "opt-in" ?

@voodoos
Copy link
Collaborator Author

voodoos commented Apr 17, 2024

@rgrinberg removing the aggregation step does have an impact on cold build time, which is probably more noticeable on rebuilds. @check @ocaml-index is still 46% slower than @check, but it was 57% before.

Command Mean [s] Min [s] Max [s] Relative
./dune.exe build @check 9.090 ± 0.089 8.959 9.222 1.00
./dune.exe build @check @ocaml-index 13.302 ± 0.174 12.964 13.449 1.00

But now I realize that my measurement are wrong: the @ocaml-index appears to build much more targets (and not only indexes) than @check so it is not a good comparison. (and maybe I do something wrong in my rules ?)

If I compare with @check @install the results are much "better", only a 16% increase in built time:

Command Mean [s] Min [s] Max [s] Relative
./dune.exe build @check @install 18.552 ± 0.441 18.206 19.745 1.00
./dune.exe build @check @install @ocaml-index 21.517 ± 0.219 21.224 21.976 1.00

But even then, there is some additional building happening (in vendored dirs notably). I find this a bit surprising since the dependencies of @ocaml-index are "every cmt of every stanza of the project" which I would expect to be the result of building @check ?

Additionally, is there some way to list the targets built for some given aliases ? And to have Dune output time spent running commands (ideally total time per tool) ?

In any case, the more I think about it the more I am convinced that the "don't merge" strategy is the best. I will update the merlin config and see how it goes.

@rgrinberg rgrinberg removed the requires-team-discussion This topic requires a team discussion label Apr 19, 2024
@rgrinberg
Copy link
Member

As discussed, the performance penalty currently is too large to build this by default. A separate alias sounds like a good start to me. When this feature becomes fast enough for general use, do you think we will still need the ocaml-index alias?

Also, could you comment on the stability of this feature for the users? When this feature is shipped, is it recommended that everyone start using it right away? Or is it more for wider experimentation?

@voodoos
Copy link
Collaborator Author

voodoos commented Apr 25, 2024

Thanks for your review @rgrinberg. I was focused on experimenting with performance improvement but I will answer your comments soon.

We spent some time with @emillon to make the indexer faster and we did found some important optimizations.

However indexing only the libraries/executables which are part of @check still takes increases cold-build time by 18% .
Moreover most of the additional time is spent at the end of the build when indexing the dune_rules library which features a large number of modules. I notably tried things such as indexing each cmt independently to get finer grained dependencies. This improves partial re-build time a lot but also increase significantly cold-build time because it doesn't allow the indexer to cache results that are the same for every cmt.

In short, I don't think we can afford to enable the feature by default right now. I propose to add two additional targets:

  • @check-index which only indexes cmt generated by the @check target
  • @index-all which perform full indexation of the project, which might trigger the build of additionnal targets like unused vendored dependencies.

Eventually, our best option is probably to have LSP ask dune via RPC to update the index when needed.

Command Mean [s] Min [s] Max [s] Relative
./dune.exe build @check 8.789 ± 0.094 8.673 8.920 1.00
./dune.exe build @check-index 10.422 ± 0.174 10.227 10.838 1.00
./dune.exe build @index-all 11.520 ± 0.239 11.177 11.877 1.00

Results are actually better on a codebase such as Irmin where there is no single very large library such as dune_rules: the time increase between @check ad @check-indexis 9.5%.

Command Mean [s] Min [s] Max [s] Relative
dune build @check 15.718 ± 0.091 15.618 15.928 1.00
dune build @check-index 17.220 ± 0.129 17.000 17.420 1.00
dune build @ocaml-index 17.437 ± 0.120 17.228 17.601 1.00

There is a tension between: more granularity for smaller incremental rebuild VS less granularity that allow for greater use of cache which speeds up cold builds but slow down incremental rebuilds. We could probably adapt the indexer to cache on the disk results for each cmt so that we could get the best of both worlds, but I don't think the benefit would be big enough to make it a priority.

@rgrinberg
Copy link
Member

Additionally, is there some way to list the targets built for some given aliases ? And to have Dune output time spent running commands (ideally total time per tool) ?

Yes, there's the --trace-file flag

@rgrinberg
Copy link
Member

Could you stick to check-index or index-all for now? The only reason vendored sources aren't included in @check is because of a bug.

@voodoos voodoos force-pushed the ocaml-index branch 2 times, most recently from 6da9b18 to 3173f97 Compare May 7, 2024 13:23
@voodoos voodoos marked this pull request as ready for review May 7, 2024 13:23
@voodoos
Copy link
Collaborator Author

voodoos commented May 7, 2024

I rebased the PR and made a few more changes like adding documentation and removing @check-index.

I think the PR is now ready for another round of review.

@rgrinberg I still wonder about the hidden transitive deps issue. I think now that the compiler supports the -H flag and Merlin has corresponding configuration directives we should pass it the hidden deps using these directives Right now to get the set of hidden deps I use requires_link \ requires_compile and it appears to work, but is that correct ? Is there a better way to get these ?

@rgrinberg
Copy link
Member

Right now to get the set of hidden deps I use requires_link \ requires_compile and it appears to work, but is that correct ? Is there a better way to get these ?

I think it's correct though not ideal. I can't think of a better way

@voodoos
Copy link
Collaborator Author

voodoos commented May 15, 2024

Right now to get the set of hidden deps I use requires_link \ requires_compile and it appears to work, but is that correct ? Is there a better way to get these ?

I think it's correct though not ideal. I can't think of a better way

Right, I reworked the update of Merlin's configuration to use this set and also rely on the new BH and SH directives that take advantage of the new "hidden deps" mechanism of the compiler.

I moved these changes to a separated PR since they are a bit orthogonal: #10535

List.rev_append cmts acc)
in
Command.run_dyn_prog
~dir:context_dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you not run ocaml-index from dir? We only run compilation commands from context_dir for good error messages. Does ocaml-index output error messages that refer to source code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ocaml-index relies on the load-path stored in the cmt files. These paths are relative to the directory in which the compiler was called.

If you would prefer the indexer to run in dir we can have dune pass the complete load-path via the CLI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's fine to leave it this way. It makes sense to leave a comment about this though.

voodoos and others added 17 commits May 27, 2024 09:26
We use a mock binary in tests

Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
They should all be relative to the root of the build directory

Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Ulysse Gérard <[email protected]>
@voodoos voodoos merged commit 38163a6 into ocaml:main May 27, 2024
27 of 28 checks passed
MA0010 pushed a commit to MA0010/dune that referenced this pull request Jun 5, 2024
- `dune ocaml-merlin`: communicate additional directives `SOURCE_ROOT`,
  `UNIT_NAME` (the actual name with wrapping) and `INDEX` with the paths to the
  index(es).

- Add a new alias `@ocaml-index` that uses the `ocaml-index` binary to generate
  indexes that can be read by tools such as Merlin to provide project-wide
  references search.

Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Co-authored-by: Rudi Grinberg <[email protected]>
gridbugs added a commit to gridbugs/opam-repository that referenced this pull request Jun 5, 2024
CHANGES:

### Added

- allow libraries with the same `(name ..)` in projects as long as they don't
  conflict during resolution (via `enabled_if`). (ocaml/dune#10307, @anmonteiro,
  @jchavarri)

- `dune describe pp` now finds the exact module and the stanza it belongs to,
  instead of guessing the name of the preprocessed file. (ocaml/dune#10321, @anmonteiro)

- Print the result of `dune describe pp` with the respective dialect printer.
  (ocaml/dune#10322, @anmonteiro)

- Add new flag `--context` to `dune ocaml-merlin`, which allows to select a
  Dune context when requesting Merlin config. Add `dune describe contexts`
  subcommand. Introduce a field `generate_merlin_rules` for contexts declared
  in the workspace, that allows to optionally produce Merlin rules for other
  contexts besides the one selected for Merlin (ocaml/dune#10324, @jchavarri)

- melange: add include paths for private library `.cmj` files during JS
  emission. (ocaml/dune#10416, @anmonteiro)

- `dune ocaml-merlin`: communicate additional directives `SOURCE_ROOT`,
  `UNIT_NAME` (the actual name with wrapping) and `INDEX` with the paths to the
  index(es). (ocaml/dune#10422, @voodoos)

- Add a new alias `@ocaml-index` that uses the `ocaml-index` binary to generate
  indexes that can be read by tools such as Merlin to provide project-wide
  references search. (ocaml/dune#10422, @voodoos)

- merlin: add optional `(merlin_reader CMD)` construct to `(dialect)` stanza to
  configure a merlin reader (ocaml/dune#8567, @andreypopp)

### Changed

- melange: treat private libraries with `(package ..)` as public libraries,
  fixing an issue where `import` paths were wrongly emitted. (ocaml/dune#10415,
  @anmonteiro)

- install `.glob` files for Coq theories too (ocaml/dune#10602, @ejgallego)

### Fixed

- Don't try to document non-existent libraries in doc-new target (ocaml/dune#10319, fixes
  ocaml/dune#10056, @jonludlam)

- Make `dune-site`'s `load_all` function look for `META` files so that it
  doesn't fail on empty directories in the plugin directory (ocaml/dune#10458, fixes
  ocaml/dune#10457, @shym)

- Fix incorrect warning for libraries defined inside non-existant directories
  using `(subdir ..)` and used by executables using `dune-build-info` (ocaml/dune#10525,
  @rgrinberg)

- Don't try to take build lock when running `coq top --no-build` (ocaml/dune#10547, fixes
  ocaml/dune#7671, @lzy0505)

- Make sure to truncate dune's lock file after locking and unlocking so that
  users cannot observe incorrect pid's (ocaml/dune#10575, @rgrinberg)

- mdx: link mdx binary with `byte_complete`. This fixes `(libraries)` with
  foreign archives on Linux. (ocaml/dune#10586, fixes ocaml/dune#10582, @anmonteiro)
emillon added a commit to emillon/opam-repository that referenced this pull request Jun 17, 2024
CHANGES:

### Added

- allow libraries with the same `(name ..)` in projects as long as they don't
  conflict during resolution (via `enabled_if`). (ocaml/dune#10307, @anmonteiro,
  @jchavarri)

- `dune describe pp` now finds the exact module and the stanza it belongs to,
  instead of guessing the name of the preprocessed file. (ocaml/dune#10321, @anmonteiro)

- Print the result of `dune describe pp` with the respective dialect printer.
  (ocaml/dune#10322, @anmonteiro)

- Add new flag `--context` to `dune ocaml-merlin`, which allows to select a
  Dune context when requesting Merlin config. Add `dune describe contexts`
  subcommand. Introduce a field `generate_merlin_rules` for contexts declared
  in the workspace, that allows to optionally produce Merlin rules for other
  contexts besides the one selected for Merlin (ocaml/dune#10324, @jchavarri)

- melange: add include paths for private library `.cmj` files during JS
  emission. (ocaml/dune#10416, @anmonteiro)

- `dune ocaml-merlin`: communicate additional directives `SOURCE_ROOT`,
  `UNIT_NAME` (the actual name with wrapping) and `INDEX` with the paths to the
  index(es). (ocaml/dune#10422, @voodoos)

- Add a new alias `@ocaml-index` that uses the `ocaml-index` binary to generate
  indexes that can be read by tools such as Merlin to provide project-wide
  references search. (ocaml/dune#10422, @voodoos)

- merlin: add optional `(merlin_reader CMD)` construct to `(dialect)` stanza to
  configure a merlin reader (ocaml/dune#8567, @andreypopp)

### Changed

- melange: treat private libraries with `(package ..)` as public libraries,
  fixing an issue where `import` paths were wrongly emitted. (ocaml/dune#10415,
  @anmonteiro)

- install `.glob` files for Coq theories too (ocaml/dune#10602, @ejgallego)

### Fixed

- Don't try to document non-existent libraries in doc-new target (ocaml/dune#10319, fixes
  ocaml/dune#10056, @jonludlam)

- Make `dune-site`'s `load_all` function look for `META` files so that it
  doesn't fail on empty directories in the plugin directory (ocaml/dune#10458, fixes
  ocaml/dune#10457, @shym)

- Fix incorrect warning for libraries defined inside non-existant directories
  using `(subdir ..)` and used by executables using `dune-build-info` (ocaml/dune#10525,
  @rgrinberg)

- Don't try to take build lock when running `coq top --no-build` (ocaml/dune#10547, fixes
  ocaml/dune#7671, @lzy0505)

- Make sure to truncate dune's lock file after locking and unlocking so that
  users cannot observe incorrect pid's (ocaml/dune#10575, @rgrinberg)

- mdx: link mdx binary with `byte_complete`. This fixes `(libraries)` with
  foreign archives on Linux. (ocaml/dune#10586, fixes ocaml/dune#10582, @anmonteiro)

- virtual libraries: fix an issue where linking an executable involving several
  virtual libries would cause an error. (ocaml/dune#10581, fixes ocaml/dune#10460, @rgrinberg)
avsm pushed a commit to avsm/opam-repository that referenced this pull request Sep 5, 2024
CHANGES:

### Added

- allow libraries with the same `(name ..)` in projects as long as they don't
  conflict during resolution (via `enabled_if`). (ocaml/dune#10307, @anmonteiro,
  @jchavarri)

- `dune describe pp` now finds the exact module and the stanza it belongs to,
  instead of guessing the name of the preprocessed file. (ocaml/dune#10321, @anmonteiro)

- Print the result of `dune describe pp` with the respective dialect printer.
  (ocaml/dune#10322, @anmonteiro)

- Add new flag `--context` to `dune ocaml-merlin`, which allows to select a
  Dune context when requesting Merlin config. Add `dune describe contexts`
  subcommand. Introduce a field `generate_merlin_rules` for contexts declared
  in the workspace, that allows to optionally produce Merlin rules for other
  contexts besides the one selected for Merlin (ocaml/dune#10324, @jchavarri)

- melange: add include paths for private library `.cmj` files during JS
  emission. (ocaml/dune#10416, @anmonteiro)

- `dune ocaml-merlin`: communicate additional directives `SOURCE_ROOT`,
  `UNIT_NAME` (the actual name with wrapping) and `INDEX` with the paths to the
  index(es). (ocaml/dune#10422, @voodoos)

- Add a new alias `@ocaml-index` that uses the `ocaml-index` binary to generate
  indexes that can be read by tools such as Merlin to provide project-wide
  references search. (ocaml/dune#10422, @voodoos)

- merlin: add optional `(merlin_reader CMD)` construct to `(dialect)` stanza to
  configure a merlin reader (ocaml/dune#8567, @andreypopp)

### Changed

- melange: treat private libraries with `(package ..)` as public libraries,
  fixing an issue where `import` paths were wrongly emitted. (ocaml/dune#10415,
  @anmonteiro)

- install `.glob` files for Coq theories too (ocaml/dune#10602, @ejgallego)

### Fixed

- Don't try to document non-existent libraries in doc-new target (ocaml/dune#10319, fixes
  ocaml/dune#10056, @jonludlam)

- Make `dune-site`'s `load_all` function look for `META` files so that it
  doesn't fail on empty directories in the plugin directory (ocaml/dune#10458, fixes
  ocaml/dune#10457, @shym)

- Fix incorrect warning for libraries defined inside non-existant directories
  using `(subdir ..)` and used by executables using `dune-build-info` (ocaml/dune#10525,
  @rgrinberg)

- Don't try to take build lock when running `coq top --no-build` (ocaml/dune#10547, fixes
  ocaml/dune#7671, @lzy0505)

- Make sure to truncate dune's lock file after locking and unlocking so that
  users cannot observe incorrect pid's (ocaml/dune#10575, @rgrinberg)

- mdx: link mdx binary with `byte_complete`. This fixes `(libraries)` with
  foreign archives on Linux. (ocaml/dune#10586, fixes ocaml/dune#10582, @anmonteiro)

- virtual libraries: fix an issue where linking an executable involving several
  virtual libries would cause an error. (ocaml/dune#10581, fixes ocaml/dune#10460, @rgrinberg)
anmonteiro pushed a commit to anmonteiro/dune that referenced this pull request Nov 17, 2024
- `dune ocaml-merlin`: communicate additional directives `SOURCE_ROOT`,
  `UNIT_NAME` (the actual name with wrapping) and `INDEX` with the paths to the
  index(es). 

- Add a new alias `@ocaml-index` that uses the `ocaml-index` binary to generate
  indexes that can be read by tools such as Merlin to provide project-wide
  references search.

Signed-off-by: Ulysse Gérard <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Co-authored-by: Rudi Grinberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants