-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-63539] Use additional repo URL variants to find cache #947
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, while creating decideAndUseCache
I didn't think about the semantics of the URLs. Thanks for doing this, it is a great addition.
@rishabhBudhouliya I realized that I need more tests of the newly added method. I'll add those tests before we merge this change. The statement coverage of the functions looks reasonable but the number of assertions for those statements is far too low. |
The tests lead me to #948 while trying to understand the branch coverage report. |
No need to require command line git on the master for this case
16de1ef
to
b7f1c62
Compare
Estimation of repository size uses a cache key based on the repository URL. There are several different forms of repository URL that can represent the same git repository. For example, the following URL's all point to the same repository * git://github.com/jenkinsci/git-plugin * [email protected]:jenkinsci/git-plugin.git * https://github.com/jenkinsci/git-plugin * https://github.com/jenkinsci/git-plugin.git * ssh://[email protected]/jenkinsci/git-plugin.git This pull request permutes the repository URL provided by the user into several other forms in hopes that one of the forms already has a cached copy that can be used for local estimation of repository size. This assumes that local Java processing of strings is much faster than a REST API call to the hosting provider to request the size of the repository. Includes tests to confirm remote alternatives The alternatives work well for repository URL formats for GitHub, GitLab, Gitea, and most other providers. The alternatives do not work well with Bitbucket URL formats because Bitbucket applies the user name differently than the other providers. Use random values in locations where the value of the random will have no affect on the assertions. Do not return null from remote alternatives Cleaner interface if it always returns a value
b7f1c62
to
410f70e
Compare
@rishabhBudhouliya I reworked the commit to better match the cases that are being tested. I believe the new implementation is simpler, clearer, and more accurate. It also has many more tests than the first implementation I created. |
No need for protocolPatterns array outside the method.
Consistent with the logging of final exit
If there are multiple cached copies of the repository, be pessimistic and assume that the largest cache is the best approximation of repository size.
Reduce risk of test dependency on specific sizes
The default initial capacity is 16 with 75% fill factor. This code will insert roughly 10 entries.
Filter the trailing slash in the pattern matcher since any number of slashes on the end of the repository URL do not change the URL.
Repository size estimate is a very coarse decision criteria. Outdated information is unlikely to cause serious harm. Caching the repository URL and the size in memory seems like a very low cost way to avoid disc access (for local cache) and network access (for REST API calls).
The job will generally fail when no remote configs are defined, but it should not throw a Java exception due to a user configuration error.
Job will usually fail in other ways, but a null pointer exception is not a friendly way to show a configuration error to a user.
Also use isWindows() rather than the SystemUtils check
sizeOfRepo = FileUtils.sizeOfDirectory(cacheDir); | ||
sizeOfRepo = (sizeOfRepo/1000); // Conversion from Bytes to Kilo Bytes | ||
long clientRepoSize = FileUtils.sizeOfDirectory(cacheDir) / 1024; | ||
if (clientRepoSize > sizeOfRepo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case would we have multiple caches for the same git repository?
Would it happen if two different multi branch projects work on a different version of the same git repository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that one case that might happen is two jobs that refer to a remote repository with different URLs and with different update frequencies. If one of the caches is only updated once a week and the other is updated once an hour, the "once an hour" repository may have a larger size than the "once a week" repository. This assumes the larger of the two values is the better approximation of repository size.
I've realized this is implemented incorrectly. It needs rework to use much simpler code. |
JENKINS-63539 Use additional repo URL variants to find cache
Estimation of repository size uses a cache key based on the repository URL. There are several different forms of repository URL that can represent the same git repository.
For example, the following URL's all point to the same repository
This pull request permutes the repository URL provided by the user into several other forms in hopes that one of the forms already has a cached copy that can be used for local estimation of repository size. This assumes that local Java processing of strings is much faster than a REST API call to the hosting provider to request the size of the repository.
Checklist
Types of changes