-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFusion repo got 40MB larger #10422
Comments
That's unfortunate, but fixing this requires rewriting git history. |
I suggest we use this particular issue for fixing the history -- I recommend a seprate ticket for ways to prevent a similar mistake in the future |
if we want to fix this, this will require a force push to Fixing i prepared a
once we do the above, then we need to delete (replace?) the release tags mentioned above |
I think the branch protections would need to be updated temporarily too -- I don't think we can force push to main I also think if we force push to main all outstanding PRs will become quite messed up until after a rebase THank you @findepi for preparing this @andygrove / @comphead / @jayzhan211 do you have any thoughts on this matter? |
I think it was a precommit github hook checking large files, please allow me some time to investigate it. https://github.com/pre-commit/pre-commit-hooks?tab=readme-ov-file#check-added-large-files |
that's correct. also, editing release tags is something to think about. i'd be tempted not to solve this problem.
is this something we can configure in the repo itself? |
Git action looks even more promising as you right, the precommit checks are not reliable as they running locally. |
Talking about the size and "cleanliness" of the repo, should we do a |
@ozankabak good question. |
Yep, |
My understanding is that what comes from github is automatically So I don't think there is anything we need to do with |
I keep seeing notifications on commits with the notice "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.", which basically says that this is an unreachable commit/garbage. I think these commits accumulate because people fork the repo to try things, and when the fork repo is deleted, we get into this state. This prompts me to think GH doesn't do |
It's more complicated. It seems that all forks' objects are stored together on github backend. |
Yes, there is actually no problem until the fork is deleted. When the fork is then deleted, it seems like the commit still stays on the "mother" repo as an unreachable commit and these things accumulate over time. Googling it suggests gc would solve this, but that may be wrong. |
Describe the bug
We accidentally have checked in a 40MB binary
docs.tgz
file in #10407#10416 removed the file but it is still in git history
Thus the DataFusion repo is significantly larger than it used to be
To Reproduce
No response
Expected behavior
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: