Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(misconf): High memory usage (9.5 GB) and long scan time (45 min) on some repos #6557

Closed
2 tasks done
simar7 opened this issue Apr 24, 2024 Discussed in #6549 · 6 comments · Fixed by #6586
Closed
2 tasks done

perf(misconf): High memory usage (9.5 GB) and long scan time (45 min) on some repos #6557

simar7 opened this issue Apr 24, 2024 Discussed in #6549 · 6 comments · Fixed by #6586
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. scan/misconfiguration Issues relating to misconfiguration scanning
Milestone

Comments

@simar7
Copy link
Member

simar7 commented Apr 24, 2024

Discussed in #6549 and #6517

Originally posted by ptupitsyn April 24, 2024

Description

Some repos, like https://github.com/kubernetes/minikube, take a very long time to scan (45 minutes on t3.xlarge) and consume up to 9.5 GB of RAM.

Desired Behavior

Memory consumption below 1 GB, scan time under 5 minutes.

Actual Behavior

Memory consumption of 9.5 GB, scan time 45 minutes.

Reproduction Steps

1. git clone https://github.com/kubernetes/minikube.git
2. cd minikube
3. docker run -v $PWD:/myapp --entrypoint "trivy" aquasec/trivy --timeout 60m --quiet filesystem --scanners vuln,config --format json  /myapp

Target

Filesystem

Scanner

Vulnerability

Output Format

JSON

Mode

Standalone

Debug Output

No output.

Operating System

Ubuntu 22.04

Version

0.50.2

Checklist

@simar7 simar7 added kind/bug Categorizes issue or PR as related to a bug. scan/misconfiguration Issues relating to misconfiguration scanning labels Apr 24, 2024
@simar7
Copy link
Member Author

simar7 commented Apr 25, 2024

Looks like the issue lies here

We seem to spend an awful lot of time getting the underlying metadata for the code snippets to show in the results. This involves reading each file that has a misconfiguration, which is expensive to do with large repos as there are many files.

image

Results

Input: https://github.com/kubernetes/minikube.git

Before

Doesn't finish in a reasonable time

After

./trivy --debug config ~/repos/trivy-issues/6557/minikube/

  194.62s user 5.54s system 152% cpu 2:11.61 total

Possible solutions

  1. Maybe we should disable code snippets with such large repos (many files) as it is very expensive to read each file individually to know the source of misconfiguration.

To be clear, misconfigurations are still shown, just not the code snippets. They will look as follows:

image
  1. Another idea could be to introduce a new flag where we disable the code snippets if the user wishes to do so. By default code snippets will be on (current behavior) but can be turned off, if the user decides to disable them for performance or by choice as seen here.

@DmitriyLewen
Copy link
Contributor

I also have some thoughts:

  • I don't have much experience with trivy/iac, so I could be wrong.
    Why do we need to read files again to get wrong code? Can we store cause code (like we do for secrets - we store previous and next line for secret as soon as we find it). My idea: scan file -> detect misconfiguration -> save wrong line in Result. @simar7 correct me if this is not possible.
    This should save time, but we still need to double-check memory usage.
  • Do we need to read files if the check passes?

@ptupitsyn
Copy link

It would be great to be able to toggle code snippets with a CLI flag. Disable them when not needed to improve performance.

@DmitriyLewen
Copy link
Contributor

DmitriyLewen commented Apr 25, 2024

@simar7 i agree with @ptupitsyn .
I would also choose a new flag to add more variety to the Trivy experience.

@knqyf263
Copy link
Collaborator

I'm still wondering why it consumes 9.5 GB. If it reads each file individually, it doesn't use so much memory. Or does it keep all the file content in memory?

@kaypee90
Copy link

It would be great to be able to toggle code snippets with a CLI flag. Disable them when not needed to improve performance.

I agree with @ptupitsyn's approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. scan/misconfiguration Issues relating to misconfiguration scanning
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants