-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
external-sources: throttle requests to maven central to avoid being rate limited for large sets of java dependencies #2384
Conversation
There's more details in #2383 - this is one potential solution, figured it might be helpful, the issue above also includes a reproducer for the current behaviour. There may be better alternatives, which if so great. this approach has proved reliable after a fair amount of testing with the integration test included in the PR. |
return &mavenSearch{ | ||
client: client, | ||
baseURL: baseURL, | ||
rateLimiter: rate.NewLimiter(rate.Every(300*time.Millisecond), 1), // 1 request per 300ms, using the integration test to determine the best rate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just out of curiosity, why 1 request per 300 ms? from experimentation? or did you find any useful maven docs on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah through experimenting, I found that it needed a few iterations sometimes to get the 403 but using 300ms I've not seen one failure. There could be other tuning, I tried a bunch of http client settings too but this was the lowest I found that reliably worked.
if strings.Contains(err.Error(), "no artifact found") { | ||
log.Debugf("no upstream maven artifact found for %s", p.Name) | ||
} | ||
log.WithFields("package", p.Name, "error", err).Warn("failed to resolve package details with maven") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wagoodman was chatting to @luhring and he mentioned that this could still mean that errors could occur and mean results get missed. During my testing I saw 0 failures but am wondering about the semantics of how Grype handled failures like this. Is logging the right way to go as opposed to returning the err
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways, not to distract from this PR, maybe it's a followup thing. Thanks for all the feedback so far!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, since we're just logging the error and not returning it, I don't think this completely prevents the "silent failure" case, where vuln matches are missing from scan to scan.
My two cents is that we shouldn't let that stop us from merging this. If we start returning an error instead, we'll have a different kind of problem, even though we'd have solved silent failures.
From what I can tell, this PR is forward progress on the issue, and other work will be needed to make other kinds of improvements to the Maven lookups, like #2216. And then the ideal scenario would be a mechanism where Grype could do these lookups offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kzantow and I were chatting this through -- we agree that there should be some conditions that halt and force grype to return 1 (e.g. http 429 / 503). We also think there are common cases that should be notable but not change execution (keep on matching... e.g. http 404). All of these cases should probably end up as evidence in a new errors
section in the JSON document, and in the table output be reported as a footer noting any issues found.
I'm ok with merging this PR as is since it alleviates a direct problem and iterating towards these other more impactful suggestions as follow ons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi -- new issue created #2393
…ate limited for large sets of java depenencies When [external-sources](https://github.com/anchore/grype?tab=readme-ov-file#external-sources) are enabled if an Image contains a large number of Java dependencies Grype can get rate limited by maven central. This change will: - add a rate limiter to throttle requests at 300ms per second to produce reliable results - if a normal artifact not found error is returned by maven central the existing debug logging happens - any other error from maven central will result in an error being logged - adds an integration test that can be used to verify the rate limiter so we can verify against the real external api settings Related to issue anchore#2383 Signed-off-by: James Rawlings <[email protected]>
…aviour Signed-off-by: James Rawlings <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Many thanks @wagoodman! |
When external-sources are enabled if an Image contains a large number of Java dependencies Grype can get rate limited by maven central.
This change will:
artifact not found
error is returned by maven central, the existing debug logging happensRelated to issue #2383