Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster module resolution #3452

Closed
eysi09 opened this issue Dec 15, 2022 · 12 comments · Fixed by #4642
Closed

Faster module resolution #3452

eysi09 opened this issue Dec 15, 2022 · 12 comments · Fixed by #4642

Comments

@eysi09
Copy link
Collaborator

eysi09 commented Dec 15, 2022

Users that have hundreds of modules in their Garden project, are running into performance issues. Starting most Garden commands takes a long time, because Garden is currently running multiple git ls-files calls for each module.

Technical background

Garden relies heavily on Git for scanning module/action source trees, to determine the scope of build contexts and versions of each node in the Stack Graph.

Right now the tree scanning is done in a fairly naive way, basically calling out (even multiple times) to the git ls-files CLI command for each module in a project, with limited optimizations made when different modules/actions share a directory, overlap etc.

Proposed solution

  1. Instead of a single instance of VcsHandler attached to the Garden instance, we create a handler for the context of each scan, i.e. when resolving actions/modules and pass it along. This allows the handler to hold state and caches during that context and then be freed.
  2. When calling GitHandler.getTreeVersion() for a given action/path, instead of scanning the targeted path directly via git ls-files calls, we
  3. Ensure that the repo that the given path belongs to has been scanned and that the tree is in-memory.
    1. This should respect the dotIgnoreFile(s) and modules.* (to be renamed to scan.*) fields in the project config.
    1. Effectively this should be what currently happens in VcsHandler.getFiles() except that it streams into a tree structure instead of a list.
    1. A simple data structure would be a basic trie, organized by path components instead of string character. This would be a good place to start at least, and we can see how that scales.
  4. Extract the subtree for the given path.
  5. Apply any filters, i.e. include/exclude on the module/action.
  6. Return the list of files.
  7. Initially roll this mechanism out via feature flag, e.g. GARDEN_GIT_SCANNER=repo.
  8. This can basically toggle whether we use the current GitHandler or the new class which houses the new implementation.

Benefits of the approach

  1. The only required changes outside of VcsHandler and GitHandler are to pass an instance of the handler around during use, instead of using garden.vcs. This limits the blast radius of the changes dramatically and avoids major conflict with graph V2 changes.
  2. We basically drop the number of git ls-files calls from up-to-hundreds to a handful on each project scan.

Potential concerns

  1. We may in certain cases be indexing a lot of files in memory, so we might need aggressive optimizations on that front. We need to stress test this and see.
  2. One potential highly scalable approach, since we already bundle it, is using an in-memory sqlite instance, single table per repo, b-tree index on the path. Might be too heavy, not quite sure.
  3. Another approach could be making a WebWorker for the scanning, written in Rust or Go and compiled to WASM, and querying that.
  4. This will need quite a bit of testing, both automated and in practical use.

Alternative approaches

  1. We could instead employ more limited caching improvements, e.g. caching file lists per scanned directory (pre-filtered). In that case we would still implement step Basic build flow #1 in the above solution.
  2. This would be a much more marginal improvement, but still likely drop git CLI calls.
  3. We might also go for a different flow, essentially pre-optimizing calls by gathering upfront which filesystem queries need to be run and planning accordingly.
  4. This is likely a more complex approach ultimately, and would certainly have a far bigger blast radius in terms of code modules affected.
  5. ?
@mdragilev
Copy link

Hi. Are there any plans for implementing this or anything else for speeding up resolution of hundreds of modules. We are running into the performance issues as well.

@vvagaytsev
Copy link
Collaborator

@mdragilev we're working on the next big release now - 0.13. We plan do revisit this after 0.13.

@mdragilev
Copy link

@vvagaytsev Thank you Vladimir. Is there a place to track when 0.13 will come out?

@vvagaytsev
Copy link
Collaborator

@mdragilev yes, the 0.13 project GH board is https://github.com/orgs/garden-io/projects/5

The goal is to release it in Q1 2023.

@tedchang77
Copy link

any update on timelines for this feature/0.13 release?

@vvagaytsev
Copy link
Collaborator

@tedchang77 we're planning to do a pre-release for public testing by the end of March. The production release is planned for Q2 2023.

@tedchang77
Copy link

We're planning on migrating to 0.13. Is there a timeline for when this will be implemented?

@edvald
Copy link
Collaborator

edvald commented Jun 14, 2023

We're working on this right now @tedchang77! Can't quite commit to a timeline until we've tested it a bit but expect no more than a couple of weeks.

@edvald
Copy link
Collaborator

edvald commented Jul 6, 2023

We've only partially addressed this, so leaving it open still.

@vvagaytsev
Copy link
Collaborator

@eysi09 @edvald should we split it into a few more particular issues?

@eysi09
Copy link
Collaborator Author

eysi09 commented Oct 27, 2023

Yeah I think this needs to be revisited now that we have a partial implementation.

We may want to open a new issue that's more specific.

@TimBeyer
Copy link
Contributor

Can we close this now that 0.13.20 is released with the new scan mode as default and with the tree based performance optimization?

@10ko 10ko closed this as completed Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants