Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analysis on performance bottlenecks in repo migration #14772

Open
noerw opened this issue Feb 22, 2021 · 3 comments
Open

analysis on performance bottlenecks in repo migration #14772

noerw opened this issue Feb 22, 2021 · 3 comments
Labels
performance/bigrepo Performance Issues affecting Big Repositories performance/speed performance issues with slow downs topic/repo-migration Migrate repos from other platforms to Gitea, or from Gitea to them

Comments

@noerw
Copy link
Member

noerw commented Feb 22, 2021

Repo migration from Github can take a long time (eg this gitea repo takes more than 24h on a small VPS (hetzner CPX11)).
It's unclear (to me) if this is incurred by ratelimits of external services (GH API), or due to inefficiencies in Giteas migration module.

The aim of this issue is to identify the bottlenecks involved. For now the primary method for this is to collect pprof profiles, to investigate in which routines most time is spent.
This partly also gives insight into which network, disk, DB operations take much time, but only indirectly. For that, analyzing DB query times specifically might be more helpful. (If somebody can outline a good process for that, a comment here would be appreciated ;)

I sampled some pprof activity during a migration of https://github.com/go-gitea/gitea, including all entities except for releases, gitea 1.14.0+dev-713-gec06eb112. which ran over several hours. You can find several pprof profiles that were sampled for 30 seconds - 30 minutes attached:
pprof.gitea.samples.cpu.00.zip

  • To inspect them run go tool pprof -http :8080 <path to profile>
  • To collect you own profiles, set ENABLE_PPROF = true under [server] in app.ini, then call go tool pprof -seconds 1800 0.0.0.0:6060

Actual analysis of these samples will follow in the upcoming days..

Server utilization graphs for the middle 12 hours of migration:
grafik
This looks to me like the higher-utilization phases each hour are related to a reset of the github ratelimiter window, so we're down to ~33% of potential performance just through github ratelimits

@noerw noerw added performance/bigrepo Performance Issues affecting Big Repositories performance/speed performance issues with slow downs labels Feb 22, 2021
@lunny
Copy link
Member

lunny commented Feb 22, 2021

A possible resolution to break rate limit is to create multiple account and migrations support multiple tokens.

@zeripath
Copy link
Contributor

quick and dirty look through those samples suggests just downloading and storing the releases is what's taking the time.

There's some delay due to encoding/json and it might be that using jsoniter here is a little quicker but I'm not certain it's huge.

@noerw noerw added the topic/repo-migration Migrate repos from other platforms to Gitea, or from Gitea to them label May 16, 2021
@lunny
Copy link
Member

lunny commented Jun 7, 2021

#16070 may give some help for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance/bigrepo Performance Issues affecting Big Repositories performance/speed performance issues with slow downs topic/repo-migration Migrate repos from other platforms to Gitea, or from Gitea to them
Projects
None yet
Development

No branches or pull requests

3 participants