Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new VCS repo scanning: file exclusion not working properly #4783

Closed
hnicke opened this issue Jul 10, 2023 · 8 comments
Closed

new VCS repo scanning: file exclusion not working properly #4783

hnicke opened this issue Jul 10, 2023 · 8 comments

Comments

@hnicke
Copy link
Contributor

hnicke commented Jul 10, 2023

Bug

Current Behavior

I have tried the new VCS repo scanning algorithm (scan.git.mode: repo) that ships with v0.13.7 (#4642) since I hoped it would help with the stack resolution performance problems described in #4763.

Something is off with the file exclusion.
E.g., one module with ~1100 files (without new algorithm) now has around ~10000, according to the log.

Another module (with node_modules folder) normally has ~1000 files (without new algorithm), with new algorithm log reports there are ~200000 files.

I have defined all exclusions via module config excludes array.
It seems they don't work properly.

Therefore, with the new vcs algorithm garden validate takes >5 minutes.

Expected behavior

Both repo scanning algorithms detect the same amount of files for each module.

Workaround

Use .gardenignore files for file exclusions instead of exclude in module /action configuration.

Your environment

  • OS: arch linux
  • How I'm running Kubernetes: GKE

garden version
0.13.7

@edvald might be interesting for you

@hnicke hnicke changed the title new vcs module scanning: file exclusion not working properly new VCS repo scanning: file exclusion not working properly Jul 10, 2023
@hnicke
Copy link
Contributor Author

hnicke commented Jul 10, 2023

I have migrated our stack from using exclude in module config to using .gardenignore files which fixed the issue for me.

@stefreak stefreak moved this to Candidate in Core Weekly Jul 10, 2023
@stefreak
Copy link
Member

Good catch @hnicke – thanks for the detailed report! 👍
I am adding this as candidate to our weekly board so someone will have a look at this soon.

@stefreak
Copy link
Member

@hnicke If you could provide a minimal example in the mean time to reproduce, that would be perfect (e.g. a simple one-file garden yaml with an exclude that counts 2 files instead of one – do excludes work with some module / action kinds, but not others?)

@hnicke
Copy link
Contributor Author

hnicke commented Jul 10, 2023

I currently don't have the time to provide a full example, but here's the gist:

Given the node_modules folder is huge as almost always the case.

The project config is configured to ignore node_modules:

scan:
  git:
    mode: repo
  include:
    - "*.garden.yml"
  exclude:
    - "**/node_modules/**/*"

The following module (without further ignorefiles) will take ages to scan (i.e., garden validate):

# example.garden.yml
kind: Module
type: container
name: example
exclude:
  - node_modules/**

The logs (.garden/logs/validate.debug..) will show that the exclusion didn't work: ...Found <huge number> files in module example....


However, when changing the module like this, the perfomance issue is gone:

# example.garden.yml
kind: Module
type: container
name: example
# .gardenignore
node_modules

By the way - in my case the module is located in a remote source - not sure if that is a factor that has to be taken into account.

@stefreak
Copy link
Member

stefreak commented Jul 11, 2023

Thank you for the more detailed information

We were able to reproduce successfully

Steps to reproduce

  1. Create garden.yml
# garden.yml
kind: Project
apiVersion: garden.io/v1
name: repro
environments:
 - name: default
providers:
 - name: local-kubernetes

scan:
  git:
    mode: repo
  include:
    - "garden.yml"
    - "*.garden.yml"
  exclude:
    - "**/node_modules/**/*"

---
kind: Module
type: container
name: example
exclude:
  - node_modules/**
  1. Create Dockerfile
# Dockerfile
FROM scratch
  1. Create fake node_modules
$ mkdir node_modules && touch node_modules/file{1..10000} && touch node_modules/file{10000..20000}
  1. Run garden build

Observed behaviour

With scan.git.mode: repo the files in node_modules count into the number of files of the module, despite being excluded:

ℹ graph [debug]        → Found 20003 files in module example root /Users/steffen/repro

Expected behaviour

I would expect the same number of files as without scan.git.mode: repo:

ℹ graph [debug]        → Found 3 files in module example root /Users/steffen/repro

@vvagaytsev
Copy link
Collaborator

@hnicke is this still an issue in 0.13.13?

@TimBeyer
Copy link
Contributor

I tried the repro from @stefreak in 0.13.20 and this is the logs I see

ℹ git [debug]          → Scanning module example root at /Users/tim/Development/garden/support/repo-scan-exclusion-repro
  → Includes: (none)
  → Excludes: **/.garden/**/*
ℹ git [debug]          → Found 20002 files in module example root /Users/tim/Development/garden/support/repo-scan-exclusion-repro
ℹ graph [debug]        → Found 20002 files in module path, filtering by 1 include and 6 exclude globs
ℹ graph [debug]        → Found 2 files in module path after glob matching

I assume we can consider this fixed then?

@stefreak
Copy link
Member

Sounds like this is solved then, which PR did introduce the fix @TimBeyer? #5364?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants