-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Optimize for big repository on repo home page to make it load less than 1s and /main less than 8s #6045
Conversation
5175afd
to
f17fa2d
Compare
Codecov Report
@@ Coverage Diff @@
## master #6045 +/- ##
==========================================
- Coverage 38.83% 38.69% -0.14%
==========================================
Files 354 343 -11
Lines 50183 49128 -1055
==========================================
- Hits 19488 19010 -478
+ Misses 27870 27366 -504
+ Partials 2825 2752 -73
Continue to review full report at Codecov.
|
Why not just reuse already existing caching functionality like commit counts are cached? |
I think that cache infrastructures are too complicated. and I think the commit counting cache should also be used a separated cache service to do that. |
Thanks a lot @lunny for trying to improve this usability concern. I can confirm that using boltdb improves performance the most for me.
url to try is http://51.15.253.24:3000/clandmeter/aports/src/branch/master/main What really would make a difference on repo layouts like this would be a pager with a commit list limit set in app.ini. Or if that's not possible a hard limit like github does. |
@clandmeter have you enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have another pr for code.gitea.io/git
?
err := c.db.View(func(tx *bolt.Tx) error { | ||
b := tx.Bucket(c.bucket) | ||
v := b.Get([]byte(getKey(repoPath, ref, entryPath))) | ||
if v == nil || len(v) <= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simplify that whole thing with something like if len(v) == 0 {
, if v
is nil
, len(v)
will return 0
.
@lunny i enabled cache (redis) as this was a vanilla setup. the load is now around 30-40 seconds. Yes a hard limit could be an option. Github does the same: https://github.com/alpinelinux/aports/tree/master/main This loads in a few seconds 1-3. I don't think gitea can do that even with a 1k limit. The amount of objects is now around 2k so i would assume at best it would load in 15 seconds. Even with caching I see a lot of git commands (git rev-list -1 xxx) beeing executed in the background. I guess the cache doesn't cover all of the git lookups. |
@clandmeter yes. I need send several PRs for all the optimizations since not all the git command be cached. I will send another PR to limit the display files number according user setting. |
@lunny I wonder why you choose to use boltdb but did not add a Redis option? Redis is already used for caching and would provide much better scaling when your instance would grow? |
@lunny imho it would be better to implement type that uses internally existing caching methods. This way we would support also redis etc |
@clandmeter It's easy to add redis. |
## Git - LastCommitCache settings (`git.last_commit_cache`) | ||
|
||
- `TYPE`: **none**: Cache type, could be empty, `memory` or `boltdb`. | ||
- `DATA_PATH`: ****: Cache dir when type is boltdb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a default path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default type is none means do not use cache. For small repository, it's enough. So the default path is empty.
f17fa2d
to
9c33992
Compare
@lafriks added old cache configuration support. |
@clandmeter added redis support and reduced half of git commands when list last commit information. |
@lunny nice thx.
|
Sorry, I changed the config option |
@lunny could you update the ini sample on how to use the new options? I am unable to make redis work. I can only make memory work. |
modules/cache/ls_tree/cache.go
Outdated
@@ -0,0 +1,30 @@ | |||
package ls_tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright title please
modules/cache/ls_tree/init.go
Outdated
@@ -0,0 +1,28 @@ | |||
package ls_tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this file
@clandmeter I send another commit today and I think that will make aport |
@lunny im not able to make this work anymore.
which results in
|
@lunny yes after retesting boltdb it seems broken. Regarding redis, it looks like it doesn't cache anything. Compared to memory caching you can see the second run will do none or almost no git lookups anymore. With redis this keeps happening making it much slower. Funny thing is is does check for redis availability because if i turn it of it will return error 500. I have some small remarks for your reference. To my understanding a redis lookup should be much faster than a git lookup. So if you enable global cache with redis why not automatically enable it for the other cache options? Also what does the following mean:
This reads as repo with more than 1000 commits (default setting) have caching disabled? Thanks for looking into it, and ill keep following this PR. |
After re-running my Bloom filter experiment from December based on this article I think that's a better approach for solving the problem. It offers a way to accelerate the listing for any branch or history point. I realise the burden of proof is on me and unfortunately I may not be able to deliver it in timely manner. This may still be the way to go short-term but if someone decides to invest time into it there are viable alternatives. I'm now going through the discussion on the Git mailing list to read up on the various attempts to implement it and whether it got any traction since last year. |
I still don't have much to share about my Bloom filter experiments in terms of code, but I do have some preliminary numbers from testing them on the Linux repository. Since this PR was started Gitea underwent some changes that changed some performance characteristics, mostly for the better. Migrating to This is all driven by the GitHub enigineers philosophy:
Listing the recent commits in Linux repository has few challenges:
How can this be addressed?
Overall, with all the above performance enhancements I was able to get the load time of the repository listing page down to respectable 1.2 second. This general performance improvement translates to different listings, older points in history or other branches. |
@lunny any idea when this is going to be finalized/implemented? |
@clandmeter will continue work on this PR. I wish this could be released in v1.9 |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions. |
@lunny any updates on this? |
This PR is outdate and I will send another PR to do some optimization. |
We are evaluating Gitea, and already love the project very much so thank you for that! |
Which gitea version are you using? @michaelshiel |
@typeless when you have many sub directories in a directory and many commits, it is still slow. |
Version: Gitea v1.10.0+dev-226-g256b17817 built with GNU Make 4.2.1, go1.12.9 : bindata, sqlite, sqlite_unlock_notify 10,145 commits, 517 branches, and 1.9GB total repo size. |
To update, the fault appears to have been on my end. I initially was running the latest image in docker on OSX (with delegated volume mappings), running on Ubuntu 18.04 has no performance problems so far and any page load is sub 1 second. Sorry for the noise, and thanks again! |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions. |
@lunny any updates on this? |
@kolaente I will start another PR after some time. This PR is too far away current master. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions. |
This has been replaced by #10069 |
I tested it locally. The recent linux repo home page loads time from 8s to 2s if enabled memory last commit cache, 8s to 2.8s if enabled boltdb last commit cache.
update: now it only 1.9s even boltdb on my MacOS since we only get simple commit infomation.
update(2019-02-22): added ls tree cache and files limit to 1000, now aport will /main directory will load less than 8s on my Macbook pro. I will add a log cache and need more work to clean up the codes.
blocked by go-gitea/git#145