-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Change repo settings to only allow 'squash' commits #594
Comments
@gregschohn, I know you had opinions against this when I joined. Do those still hold with the increase in contributions? |
I usually do squash commits myself, so this would not change my workflow. I also prefer them as a general rule, as they package all the work for a feature/bug fix into a single, easily visible unit in the mainline history. I'm open to arguments against the approach though. |
For your consideration OpenSearch-MigrationsHere is what todays git log looks like, which looks like
OpenSearch repositoryOpenSearch has been using this setting for considerable time, here is what todays git log looks like:
In comparison, I see much better information density out of OpenSearch, no 'addressed pr comments' titled commits, and each commit has a PR number associated with it where the nuance is available if I want to follow up in detail. BTW I mean no shade for any of the commit authors shown above, I am lazy about commit details because I expect them to be squashed. |
What problem is this trying to solve? From the link, it seems that there's a hope that a git log can form a narrative. I see the git log as one of the best forensic tools that I have at my disposal. I personally don't expect it to read like release notes, but instead, of developer intent. Unfortunately, I cannot recover that stream of intent when branches from other forks are squashed or rebased into the main repo, especially if you consider that the source repo that the changes came from could go away. Since github is managed by a single-entity and the data within github isn't in the single block-chained source of truth that is replicated in every cloned repo, I chafe at the idea of having different durability guarantees for my source control metadata and history (my gitlog in my clone will always be in my control). On the tightness of histories, previously, I've used ffwd-merges and it is unfortunate that github doesn't provide that natively and that pushes are disabled for our repos. That deficiency also contorts people into having discussions like this one. With the ffwd-merge, it eliminates the superfluous "extra merge" commits that pop up on every github "merge-commit" and a lot of the arrowing/lines that you see. So, I'll flip the question around and ask why are we worrying about how to make the git logs pretty if we don't even have the right tools to do that. Given the workflow where devs might be working on long-lived branches, CI on that development branch may be the norm, merging mainline into the dev branch repeatedly (maybe over months). When it's time to merge that feature, does it make sense to destroy the granular history and checkpoints that shows the progression of an idea through its maturation? Here are a couple PRs that if squashed would erase critical information. The first is a medium sized PR that purposefully separates commits so that file refactoring (renames, moving functions, but not changing business logic) are separate from the rest of the changes. Should those have been separate PRs? Probably - but they were. Should they have been immediately merged/closed? Maybe - but they weren't because development time outpaced review time and other changes were right behind the refactoring PRs. They also weren't merged because they may have, at the time, provided negative value. If the subsequent changes needed to make further modifications to the newly refactored abstractions, the commit logs would have been even worse since effectively experimental commits that the author didn't think were truly ready would be in the git log at the same level as more polished commits that had more confidence. With file changes, it is MUCH harder to review both renames, content moves, and content updates in a flattened commit than in separate commits. Merge commits always look innocuous upon first pass and can be terrifying when poring over more carefully. When dealing with squashes, we lose the ability to easily cherry-pick cross-file commits and run tests/analysis. Lastly, I can be lazy about my commit logs too. I see that as a valuable signal later on. Here's a snippet of some of my commits (
Notice the "WIP" entry. That code ended up working better than I had originally suspected it to work. It took a couple more generations to get it cleaned up. Weeks from now, it could be useful to know if some errant code was thrown in within the first day of the project or at the very end - especially if something was broken and the wrong fix seemed to put things right. Granular histories help us get deeper into the 'whys'. Github and git use similar, partially overlapping, workflows, but one is completely open-source and in control of everybody holding a copy of the repo and the other requires major leaps of faith. The more open-source safe way to keep this information around would seem to be to stick with the one that can work without an internet connection. In summary, 1) I'd love to see linear histories when possible w/ ffwd-merges, or even with rebases when the changes are very small. 2) However, to me, removing the forensic evidence of how code evolved is a big loss if the only gain is to engineer a prettier history. 3) There are better ways to natively manage narratives like release notes (which can be automatically created via PR descriptions) and releases. |
Have you considered running
If we wanted to enforce something for better details, we could require descriptive one-line merge commits or that branch names are descriptive. Even if squashes are required, the messages could still be of diminished utility, so I'm not sure why it's worth the penalty/one-way door of reducing the information in the logs. |
@gregschohn Not really, I'd like Its nearly been a week, project maintainers what happens next? It looks like votes are in |
Taking the option for merge commits away is a one-way door. The repository becomes less expressive than it was before and we will end up with 10K line commits and no way within git to untangle the commits. If we go that way, I'd like to understand how we should be proceeding for those. Likewise, for PRs that are being completed in waves (feature work), what should the SOP be for squashing those and dealing with any inadvertent conflicts? Before solutionizing, what problem are you trying to solve? Is there a rubric that you'd like to see the team adopt for what the history should look like? For some commits, I'll interactively rebase to squash some commits together before pushing. I'm happy to do that more often - but we do lose something there when correlating to the original PR review. Should branch names be more descriptive/should we make sure that they're accurate (personally, I use the branch name to set the PR commit line). It's a dumb question, but having used git for over a decade, but github a relatively short time - why do people like "clean histories" for a project doing CICD with multiple branches and merge points? Aren't there other ways to make a history clean later without needing to lose information? How do people use git histories? Personally, I use it when doing detective work to find out why specific changes did or didn't happen. I can't recall ever wanting less context or granularity in those situations. Full-disclosure, I use Sourcetree which does a reasonable job for me to grok the history. |
This decision isn't mine to make, but I'd like it to be made carefully and to consider alternatives and implication. If the goal is a cleaner history, we could keep this & determine when we should be squashing commits or merges. In other words, when is it "acceptable" to end up with non-linear histories? What commits should be look like in the history log, etc. Should we be maintaining a separate CHANGELOG with every merge? Of the people that voted, can we get more details in either how we'd deal with the concerns above and if there are any other solutions that would satisfy those. As the question is phrased in binary terms, it's leading & doesn't naturally lead itself to improving processes for the project or the artifacts for the repo. |
@gregschohn you are reaching here - your comment is hyperbolic. 1) Maintainers already squash their history 2) Commit history in forks and pull requests is not erased by enabling this setting. |
@gregschohn is it? There are a number of ways different projects run, what does this project use? The ones that I'm familiar with are:
|
@gregschohn reflecting on your concerns. I don't think my desire for clean history in my use case should force the point for merge commits not to present in the branch history. As I haven't really used merged commits maybe this is a good chance for me to try it out and see if I come away with any specific concerns. With that said, I don't see a need to push for this change and suggest that the issue be closed out without any action taken. As we get more data we can always revisit if we feel its worth revisiting. |
Its been a couple weeks, closing this out thanks! |
I don't see much value in seeing individual commits at the repo level. With the number of unique tools and developers, it makes it harder to look into history to get a sense of what is happening in the code base (to me).
I suggest that we make a change in this repo's pull request settings to the following:
Please vote 👍 or 👎 to help gauge what should be done
The text was updated successfully, but these errors were encountered: