analysis on performance bottlenecks in repo migration #14772
Labels
performance/bigrepo
Performance Issues affecting Big Repositories
performance/speed
performance issues with slow downs
topic/repo-migration
Migrate repos from other platforms to Gitea, or from Gitea to them
Repo migration from Github can take a long time (eg this gitea repo takes more than 24h on a small VPS (hetzner CPX11)).
It's unclear (to me) if this is incurred by ratelimits of external services (GH API), or due to inefficiencies in Giteas migration module.
The aim of this issue is to identify the bottlenecks involved. For now the primary method for this is to collect pprof profiles, to investigate in which routines most time is spent.
This partly also gives insight into which network, disk, DB operations take much time, but only indirectly. For that, analyzing DB query times specifically might be more helpful. (If somebody can outline a good process for that, a comment here would be appreciated ;)
I sampled some pprof activity during a migration of
https://github.com/go-gitea/gitea
, including all entities except for releases, gitea1.14.0+dev-713-gec06eb112
. which ran over several hours. You can find several pprof profiles that were sampled for 30 seconds - 30 minutes attached:pprof.gitea.samples.cpu.00.zip
go tool pprof -http :8080 <path to profile>
ENABLE_PPROF = true
under[server]
inapp.ini
, then callgo tool pprof -seconds 1800 0.0.0.0:6060
Actual analysis of these samples will follow in the upcoming days..
Server utilization graphs for the middle 12 hours of migration:
This looks to me like the higher-utilization phases each hour are related to a reset of the github ratelimiter window, so we're down to ~33% of potential performance just through github ratelimits
The text was updated successfully, but these errors were encountered: