Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo not shrinking, number of commits doubles #595

Open
buusqualia opened this issue Aug 21, 2024 · 4 comments
Open

Repo not shrinking, number of commits doubles #595

buusqualia opened this issue Aug 21, 2024 · 4 comments

Comments

@buusqualia
Copy link

buusqualia commented Aug 21, 2024

I must be doing something wrong, but can't suss out what that might be. Steps taken:

$ git clone --mirror https://dev.azure.com/company/Playground/_git/SizeTest R
Cloning into bare repository 'R'...
remote: Azure Repos
remote: Found 1512904 objects to send. (20646 ms)
Receiving objects: 100% (1512904/1512904), 28.49 GiB | 22.01 MiB/s, done.
Resolving deltas: 100% (1054450/1054450), done.

$ cd R

$ git filter-repo --paths-from-file ../pathsToRemove.txt --invert-paths
Parsed 172371 commits
New history written in 403.99 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
Enumerating objects: 1997320, done.
Counting objects: 100% (1997320/1997320), done.
Delta compression using up to 8 threads
Compressing objects: 100% (605875/605875), done.
Writing objects: 100% (1997320/1997320), done.
Selecting bitmap commits: 328059, done.
Building bitmaps: 100% (370/370), done.
Total 1997320 (delta 1515440), reused 1857890 (delta 1376055), pack-reused 0
Expanding reachable commits in commit graph: 330509, done.
Completely finished after 992.60 seconds.

At this point, doing a du -sk shows that the repo hasn't shrunk at all. Running the same command again shows:

$ git filter-repo --paths-from-file ../pathsToRemove.txt --invert-paths
Parsed 330509 commits
New history written in 727.61 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
Enumerating objects: 2003932, done.
Counting objects: 100% (2003932/2003932), done.
Delta compression using up to 8 threads
Compressing objects: 100% (473105/473105), done.
Writing objects: 100% (2003932/2003932), done.
Selecting bitmap commits: 333060, done.
Building bitmaps: 100% (371/371), done.
Total 2003932 (delta 1522063), reused 1997306 (delta 1515437), pack-reused 0
Expanding reachable commits in commit graph: 337121, done.
Completely finished after 1232.85 seconds.

Notice that the number of commits above has nearly doubled for some reason. Running the command a third time results in slightly more commits, but not doubling (maybe 7k additional commits) The file "pathsToRemove.txt" contains lines like the following, which were copy/pasted from some of the --analyze output files:

R/RC/help/R.chm
R/CW/help/R.chm
RSQL/RSchema.vsd
R/Tools/RDM/release
R/packages
R/lib/Aspose.Pdf.dll
R/lib/Aspose.Words.dll
R/Server/bin/Debug/.dll
R/Server/bin/Debug/
.pdb

I've tried running using --path on the command line as well with the same results. This repo lives on Azure Devops. Any ideas? Thanks!

Bryan

@newren
Copy link
Owner

newren commented Sep 5, 2024

Receiving objects: 100% (1512904/1512904), 28.49 GiB | 22.01 MiB/s, done.

That is a huge repository. There's a significant risk that attempting to repack is completely failing, leaving the rewrite of various refs not completed. What kind of memory do you have available on the machine you are doing this rewrite on? Can you retry with a newer version of git-filter-repo, one with commit 44ecf0c (filter-repo: notice and signal when cleanup commands fail, 2024-08-01), which is not yet part of any release? That commit won't fix this problem, but it'd at least give you an error message when the intermediate steps fail instead of ignoring errors coming from those other commands.

@buusqualia
Copy link
Author

Thanks for the response. Yes, the repo is huge - hence the reason I'm trying desperately to shrink it! :-)

The version I ran did have that commit (I downloaded the copy of git-filter-repo from the homepage). It contained the lines changed in the commit:

    for cmd in cleanup_cmds:
      if show_debuginfo:
        print("[DEBUG] Running{}: {}".format(location_info, ' '.join(cmd)))
>      ret = subproc.call(cmd, cwd=repo)
>      if ret != 0:
>        raise SystemExit("fatal: running '%s' failed!" % ' '.join(cmd))
      if cmd[0:3] == 'git reflog expire'.split():
        self._write_stash()

I just reran it. Some details:

  • Windows Server 2019 Standard with 18GB of memory
  • Running git-filter-repo from a Cygwin bash prompt (running in a command window had the same result)
  • As it's processing commits, git.exe is using between 200MB and 250MB of RAM, and python uses between 180MB and 436MB of RAM
  • During repacking, enumerating objects, git grows to 550MB
  • During compressing, git grows to 670MB
  • During writing, git grows to 678MB but then about 80% of the way done git goes down to 126MB and python down to 226MB. As it gets up to 94% done, they shrink further; python is 18MB to 34MB and git 39MB to 110MB.
  • Building bitmaps, git is between 580MB and 625MB

After finishing, the pack is about 25MB smaller, which isn't anywhere near what I'm expecting it should be, and the number of commits is still doubling.

Is there anything else that I can do to help debug what might be going wrong? Thanks again for the assistance,

Bryan

@newren
Copy link
Owner

newren commented Oct 19, 2024

Do you have a background job (git maintenance maybe?) which is forcibly fetching the repository and thus updating it with the old history while git-filter-repo is writing the new?

@buusqualia
Copy link
Author

Thanks for the question, but no, there are no background jobs. This is running on my local development machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants