Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(git): use sub-tree scan mode for config files scan #6483

Merged
merged 2 commits into from
Oct 18, 2024

Conversation

vvagaytsev
Copy link
Collaborator

What this PR does / why we need it:

This idea was discussed in #6461.

Use sub-tree scan mode for config files scanning. Now Garden has the second VCS handler - a dedicated SubTreeGitHandler that only performs config files scan. Both VCS handlers share the same TreeCache.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

@stefreak
Copy link
Member

The big question for me here is: what are the benchmark results? is it faster, slower, same performance?

@edvald
Copy link
Collaborator

edvald commented Sep 25, 2024

To evaluate that we need to look at various cases, even different commands etc. I can't quite imagine how this would turn out to be slower outside of very rare cases and even then marginally, but I don't object to running a few performance tests to see how it shakes out.

@vvagaytsev
Copy link
Collaborator Author

@stefreak sure, sorry, I accidentally created it before I completed the description.

With sub-tree scan mode it doesn't matter if we enable or disable the untracked files hashing implemented in #6461, it's equally fast in either way. The Garden repo scan example from #6461 (with node_modules included) gets scanned in about 0.6 sec.

With repo scan mode and disabled hashing for untracked files, the results are almost the same: about ~0.6 - 0.9 sec.

I think that's because repo does process more files than sub-tree. Processing without hash calculation is very fast, but processing less files is even faster.

We can keep both changes:

  1. Keep a dedicated VcsHandler for config scan and use sub-tree - very fast.
  2. Keep an option to disable hashing of untracked files - then we can also use repo mode for config scan if we have to do that for some reason.

@vvagaytsev vvagaytsev marked this pull request as draft September 25, 2024 13:39
@vvagaytsev
Copy link
Collaborator Author

@edvald @stefreak thanks for the comments!

Sure, let's do more performance tests and benchmarks with different types of projects. I've converted this to draft.

@edvald
Copy link
Collaborator

edvald commented Sep 25, 2024

No need to overdo it @vvagaytsev :) Just good to see a couple of different variations.

@stefreak
Copy link
Member

I agree, if y'all think this is definitely an improvement I'm all for it, I just don't understand the vcs code to be honest so that's why I wanted to see proof that this is better than before :)

@vvagaytsev
Copy link
Collaborator Author

Just tested this against main branch with 100k files project (used https://github.com/garden-io/garden-large-repo-generator to generate that). All files were tracked.

This PR did the initial scan in 0.2-0.3 sec against 0.5-0.6 sec in main. So, it looks it's ready to go.

@vvagaytsev vvagaytsev marked this pull request as ready for review October 18, 2024 13:04
@vvagaytsev vvagaytsev enabled auto-merge October 18, 2024 13:04
@vvagaytsev vvagaytsev added this pull request to the merge queue Oct 18, 2024
Merged via the queue into main with commit 1316631 Oct 18, 2024
40 checks passed
@vvagaytsev vvagaytsev deleted the perf/use-sub-tree-scan-mode-for-config-scan branch October 18, 2024 13:45
thsig added a commit that referenced this pull request Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants