Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support saving and loading the rule index between Gazelle runs #1181

Open
eric-skydio opened this issue Feb 25, 2022 · 3 comments · May be fixed by #1182
Open

Support saving and loading the rule index between Gazelle runs #1181

eric-skydio opened this issue Feb 25, 2022 · 3 comments · May be fixed by #1182

Comments

@eric-skydio
Copy link
Contributor

eric-skydio commented Feb 25, 2022

As discussed in various issues and in-person at the Gazelle maintainers sync, indexing can become a dominant part of the Gazelle runtime for large repositories, so even "incremental" gazelle runs that only update a couple directories still end up being slow. The current best solution is to turn off indexing, but that doesn't work for some uses, particularly non-Go languages without strong target naming conventions.

I propose that we add a pair of command line flags to Gazelle that enable it to save the state of the index at the end of a run, and load that saved state at the beginning of a future invocation. The second invocation would then be able to process just a limited set of files, but still resolve imports correctly using the cached index.

There's a few ways this could be used:

  • Autogazelle, or a similar tool, could run the full Gazelle once, save the index, then watch for file modifications to trigger limited reruns
  • An editor auto-save hook could run Gazelle in the current directory, loading, updating, and saving the index each time
  • The index could be checked into the repo, with unit tests verifying that it is up to date.

For this to work...

  • Something external needs to decide which directories to re-run on. Using the cached index won't necessarily yield the correct result if a package that needs updating isn't passed to the Gazelle run.
  • The index needs to, on load, invalidate any entries that were originally created by packages which will be re-run in this invocation, so that the obsolete index entries don't stick around.

I've put together a prototype (#1182) which seems to decrease runtime on our repo by ~80% when only re-processing a single directory, compared to having indexing enabled.

@blico
Copy link
Contributor

blico commented Mar 7, 2022

I agree this feature would be valuable, but I worry about the introduced complexity and new types of error scenarios that come with managing a cached index. Especially given the limited bandwidth the maintainers have for this repo.

The current best solution is to turn off indexing, but that doesn't work for some uses, particularly non-Go languages without strong target naming conventions.

In our repo with ~8 extra language extensions, we were able to turn off indexing and rely on naming conventions. We did add some extra functionality to Gazelle to support encoding naming conventions for each extension, and to detect "unconventional imports" and automatically add # gazelle:resolve directives for them. This functionality has been enabled for over a year without any issues, so if there is interest we could upstream it.

Could you provide examples of non-Go language extensions that don't have strong target naming conventions?

In order to justify adding this feature, I would need to see that it has a very high demand, and the naming convention approach is proven to be unfeasible for a significant portion of users.

@alex-torok
Copy link
Contributor

alex-torok commented Mar 11, 2022

We did add some extra functionality to Gazelle to support encoding naming conventions for each extension, and to detect "unconventional imports" and automatically add # gazelle:resolve directives for them. This functionality has been enabled for over a year without any issues, so if there is interest we could upstream it.

Would you mind elaborating about the kinds of extra functionality that you added and the kinds of naming conventions that they aim to support? I've been doing some cursory investigation in enabling gazelle for other languages in our monorepo where we don't use indexing. Per-language naming conventions is something that I think that we'll need.

One thing in particular that I've been wanting to look into adding to gazelle is something like a # gazelle:resolve_prefix <lang> <import prefix> <target prefix> to handle rules in our repo that give certain generated files a specific prefixed import location.

Also, if you'd prefer to move this discussion to another issue, please let me know and I'd be happy to open one.

@alex-torok
Copy link
Contributor

@blico - Friendly ping on my comment above ^

HALtheWise pushed a commit to HALtheWise/bazel-gazelle that referenced this issue Nov 1, 2023
This is intended to serve as a foundation for external tools,
or a future version of autogazelle, that detects what files have changed
and only re-runs gazelle in those directories, while still also
supporting the full power of the rule index.

See bazel-contrib#1181 for more details.
HALtheWise pushed a commit to HALtheWise/bazel-gazelle that referenced this issue Nov 1, 2023
This is intended to serve as a foundation for external tools,
or a future version of autogazelle, that detects what files have changed
and only re-runs gazelle in those directories, while still also
supporting the full power of the rule index.

See bazel-contrib#1181 for more details.
fmeum pushed a commit that referenced this issue Aug 18, 2024
**What type of PR is this?**
refactor

**What package or component does this PR mostly affect?**
all

**What does this PR do? Why is it needed?**
Refactor working toward serializing+caching of the rule index.

The `ruleRecord` should be as close as possible to what a language
returns. All computed data such as that for indexing should be computed
from the `ruleRecord` and persisted elsewhere.

**Which issues(s) does this PR fix?**

Ref: #1181
Some ideas are similar to #1182 while trying to reduce the scope of the
change and diff.

**Other notes for review**

IMO this is a reasonable cleanup on its own, without even taking
serializing+caching into account 🤷
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants