Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default action to squash merge in cmssw repo. #2080

Closed
antoniovilela opened this issue Sep 26, 2023 · 45 comments
Closed

Change default action to squash merge in cmssw repo. #2080

antoniovilela opened this issue Sep 26, 2023 · 45 comments

Comments

@antoniovilela
Copy link

As discussed in the ORP meeting (2023-SEP-19), managers and users agree to squash commits when merging to master. This issue is to follow-up on what is needed in cms-bot.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 26, 2023

A new Issue was created by @antoniovilela .

@Dr15Jones, @rappoccio, @smuzaffar, @makortel, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

makortel commented Sep 26, 2023

I don't think an agreement was reached (the topic was merely touched last week), and personally in many cases I want to preserve the commit history. (although I agree in many cases the commits can be squashed).

@antoniovilela
Copy link
Author

I don't think an agreement was reached (the topic was merely touched last week), and personally in many cases I want to preserve the commit history. (although I agree in many cases the commits can be squashed).

Let's maybe discuss first what is needed in order for us to use the "Squash and merge" option? @smuzaffar

@Dr15Jones
Copy link
Contributor

I strongly disagree with this. I personally am very careful about creating commits with meaningful comments to help understand why code was changed. Squashing them all would make using git history far less useful.

@smuzaffar
Copy link
Contributor

There are three ways one can merge the PR

  1. Merge pull request: This is default for all our repositories. It keeps the histroy of the commits and on top it adds a merge commit. This merge commit is used by bot to find out which pull requests has been merged on top of last build. This informtion is then used for generating the IB pages. This merge commit also very help full for tagging the IBs. e.g. one can find the last merge commit before 11h00 to tag the 11h IB. Without this merge commit there could be chance that we tag something incomplete e.g. lets say a PR has two commits one at 10h59 and other at 11h01 and we have merge it at 11h05 (just before starting the IB) then bot will build the IB using 10h59 commit.
  2. Squash and merge: This does not create any merge commit and is squashes all the commits on to one re-write the commit message. Many developers does not like this as it loses the commit history.
  3. Rebase and merge: This also does not create any merge commit. It keeps the commit history of PR , so if a PR has N commits then all of those will be rebased on top of the base branch. Bebasing also means change in commit sha which some developers might not like if they want to keep on working on the local branch.

I prefer to keep using the option 1. Option 2 loses all the commit history and option 3 (though keeps the commits) but it can build inconsistant IBs.

@makortel
Copy link
Contributor

I think we'd need to understand better the motivation(s) behind the proposal. Maybe there could be a different path forward to improve things?

@antoniovilela
Copy link
Author

Do I understand correctly that if one does a Squash and merge the PR will not show up in the IB page. Is there a way around this?
Since it is one commit you do not get the problem of incomplete PRs.
In many (most?) cases the commit history is not very meaningful and what matters is the PR in its entirety it would seem?
Could we simply have the option of the Squash and merge, without unintended consequences in the IB monitoring?

@smuzaffar
Copy link
Contributor

@antoniovilela, I agree with @makortel. Lets first understand the motivation(s) behind the proposal

@smuzaffar
Copy link
Contributor

also note that with Squash and merge, we not only lose the history of commit but also the lose the history who to blame for the code. I did a give test PR where multiple developers provided changes but the blame only shows the user who actually open the PR. Also the actual committer loses the credit of the commit too.

@makortel
Copy link
Contributor

Here are some further downsides on "Squash and merge" model

  • it doesn't allow sharing a branch for PRs in different release cycles (of course the branch must be based on a commit/tag that is common between the release cycles)
  • it doesn't allow (out of the box) basing a "further developments branch" on top of a branch used for a PR
  • for the PRs that have a meaningful commit history, it would make the code archaeology more difficult

Written that, I share the feeling that many PRs could probably be squashed into one commit without any of the downsides mentioned so far. I believe it would be possible to find a solution that balances the different aspects (and for that we need better understanding first).

@makortel
Copy link
Contributor

assign core, orp

Let's include @cms-sw/all-l2 for more feedback.

@cmsbuild
Copy link
Contributor

New categories assigned: core,orp

@Dr15Jones,@makortel,@rappoccio,@antoniovilela,@smuzaffar,@sextonkennedy you have been requested to review this Pull request/Issue and eventually sign? Thanks

@tvami
Copy link
Contributor

tvami commented Sep 27, 2023

include all-l2 for more feedback

I think the ability to do squashing w/o having to ask the developers is a welcomed addition. My main argument for asking for squashing from the developers was that when one looks at the diff between releases on github (e.g. cms-sw/cmssw@CMSSW_10_6_28...CMSSW_10_6_30) that shows the commit differences, and not the PR differences. So having one commit per PR will make that much more cleaner.

Of course well constructed commits are good for this too, but many many times we have "fix", "fix typo", "code checks", "answer to reviewers", "code checks again" as commit msgs, that's when the squashing from the reviewers/orp would be nice IMO.

@fwyzard
Copy link
Contributor

fwyzard commented Sep 27, 2023

Also the actual committer loses the credit of the commit too.

Doesn't the squashed commit list all the Co-authored-by people ?

@fwyzard
Copy link
Contributor

fwyzard commented Sep 27, 2023

Having worked with non-CMS repositories that enforce a linear history (like llvm) I noticed that they are a lot easier to browse and follow.

Compare

cmssw$ git log --oneline --graph --no-decorate -n20
*   56711f4be87 Merge pull request #42838 from guitargeek/buildfile_duplicate_entries_1
|\  
| * 7176cc0fce3 Remove duplicate entries from `CondTools/RunInfo/plugins/BuildFile.xml`
* |   213663839a3 Merge pull request #42872 from CMSTrackerDPG/FixFEDErrorGPUvsCPUImbalance
|\ \  
| * | 708c739cffc store errors only for valid DetIds
| * | 951cf6cc81c skip the 2nd error word for the timeout error as in the CPU code
* | |   20c8c100813 Merge pull request #42848 from alejands/noisyGpuValidation_133X
|\ \ \  
| * | | 6ec2b12513f Prevent using getByToken on EcalRawData when skipped and not consumed
| * | | 4d18fd5efba Introduce ability to choose to skip running over collections
| * | | e1408a755c3 Toggle off running over collections missing in EcalOnly configs
| * | | d8d8eb0934f Correct GpuTask flag logic to avoid running over unused collections
* | | |   f3dfef0254f Merge pull request #42728 from nurfikri89/from133X_202309042300_nanoDev_jmeNanoV11
|\ \ \ \  
| * | | | f3b31fa1ee5 Remove unused HF energy fractions from modifiers
| * | | | 3615d26bcf3 Use int16 data type for multiplicity variables
| * | | | 325107b52ed Add energy fractions for AK8 jets. Change data type and dummy value for multiplicity variables.
| * | | | 9f03ad5fd17 Check isPFJet() for multiplicity variables because in MiniAOD, AK8 jets below some pt thresholds have reduced information stored (e.g multiplicities)
| * | | | 9fce07625f5 Make it optional to use other PuppiProducer label used for puppi jets clustering. 'puppi' is set as the default value.
| * | | | 15db740e555 Fixes to variable names and PuppiProducer label
| * | | | f2cbbbe9a0c Fix typo
| * | | | 772b5375506 Rename multiplicity variables in nanojmeDQM_cff.py
| * | | | da915559f8d Make it optional to use a different PuppiProducer label rather than a hard-coded label. 'puppiNoLep' is set as the default value.

with

llvm-project$ git log --oneline --graph --no-decorate -n20
* 97187e127881 [AArch64] update "rm" inline asm test (#67472)
* 4bdec5830bc3 [flang][runtime] Enable more code for offload device builds. (#67489)
* 21c2ba4bdb82 [GlobalISel] Remove TargetLowering::isConstantUnsignedBitfieldExtractLegal
* 9eeb0293e27c [SLP]Cleanup MultiNodeScalars when tree deleted.
* cd7f1714dea1 [BOLT][RISCV] Implement R_RISCV_64 (#67558)
* 2e9a04b985e8 [X86]Add NO_REVERSE attribute to X86 RMW instrs in memfold table (#67288)
* 76a5602fd027 [scudo] Always express sizes in terms of element count in BufferPool (#66896)
* 6e46545b9814 Fix warning on align directives with non-zero fill value (#67237)
* eff4ef25b3dc Revert "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)"
* ccd5b8db48a4 [X86] Change target of __builtin_ia32_cmp[p|s][s|d] from avx into sse/sse2 (#67410)
* 07d08f4eb44e [libc++] Don't add reference to system_category when exceptions disabled (#67504)
* 5109cb28fda8 [mlir][bufferization] Make buffer deallocation pipeline op type independent (#67546)
* 6e1dcc933511 [libc++] Refactor string unit tests to ease addition of new allocators
* 47b7f33b13b6 [IR] Allow llvm.ptrmask of vectors (#67434)
* cf7eac9650f3 [ObjectSizeOffsetVisitor] Bail after visiting 100 instructions (#67479)
* f3c417f34113 [Passes] Add option for LoopVersioningLICM pass. (#67107)
* 918863deb0af [clang][Diagnostics] Make 'note' color CYAN (#66997)
* 30fe8762446c [mlir][cfg-to-scf] Fix invalid transformation when value is used in a subregion (#67544)
* d338acf36edd [Flang][OpenMP] NFC: Version of a few tests with HLFIR
* 7c128f6d0e35 CostModel/RISCV: tweak cost of vector ctpop under ZVBB (#67020)

@fwyzard
Copy link
Contributor

fwyzard commented Sep 27, 2023

Answering myself: yes, the commit message for a squashed PR shows the additional authors as Co-authored-by:

image

@antoniovilela
Copy link
Author

assign core, orp

Let's include @cms-sw/all-l2 for more feedback.

Many thanks, it is very good to hear from @cms-sw/all-l2

@smuzaffar
Copy link
Contributor

Also the actual committer loses the credit of the commit too.

Doesn't the squashed commit list all the Co-authored-by people ?

yes the squashed commit message contains the co-author but does not contain the information about who change which part of the code e.g. 7819e3c is a squashed commit and the actual changed code does tell who change what. So you can not properly blame the right author :-)

@antoniovilela
Copy link
Author

Here are some further downsides on "Squash and merge" model

  • it doesn't allow sharing a branch for PRs in different release cycles (of course the branch must be based on a commit/tag that is common between the release cycles)
  • it doesn't allow (out of the box) basing a "further developments branch" on top of a branch used for a PR
  • for the PRs that have a meaningful commit history, it would make the code archaeology more difficult

Written that, I share the feeling that many PRs could probably be squashed into one commit without any of the downsides mentioned so far. I believe it would be possible to find a solution that balances the different aspects (and for that we need better understanding first).

I still feel that in most cases, squashing into a single commit would be the better option.

Could we add a "squash flag", different L2s could add it to flag the PR to be squashed, without needing to ask authors and delaying the approval, and release managers would then use Squash and merge for these PRs.

On a different note, do we already have regular training sessions on project collaboration with Git? This might be a more important outcome of this discussion.

@perrotta
Copy link
Contributor

As past release manager, I would have really liked the possibility to have a "squash" button, that just squashes the commits, without merging them. I would have used it in those (few) cases where the authors ended up with tens of useless commits, and were not able, or did not want, to squash them themselves. I would say that the release manager should use it only in particular cases, and with the agreement of the authors and the reviewers of the PR. Then the squashed commit(s) could be merged as usual in the next step, without renouncing to the merge commit.

@fwyzard
Copy link
Contributor

fwyzard commented Sep 27, 2023

Then the squashed commit(s) could be merged as usual in the next step, without renouncing to the merge commit.

This is not a workflow that GitHub supports (squash, then merge).

Also, what is the advantage of keeping the merge commits ?
One of the advantages of always squashing instead of merging is to get rid of the merge commits altogether...

@jhakala
Copy link
Contributor

jhakala commented Sep 27, 2023 via email

@perrotta
Copy link
Contributor

Also, what is the advantage of keeping the merge commits ?

To have them listed in the IBs, as well as in the release notes

@fwyzard
Copy link
Contributor

fwyzard commented Sep 27, 2023

If all PRs were squashed instead of merged, the IBs and release notes could simply list all commits.

And we could make the bot use a standardised format for the squashed commit message, like

Update GPU workflows (#42713)

A longer description of the commit would go here - maybe the text of the PR description when it was merged.

@Martin-Grunewald
Copy link
Contributor

Maybe post a recipe somewhere how to squash with leaving just a (new) single commit message...

@tvami
Copy link
Contributor

tvami commented Sep 28, 2023

Maybe post a recipe somewhere how to squash with leaving just a (new) single commit message...

This exists, people just usually dont do it (and they have to be pushed which then makes the review process longer)

@Martin-Grunewald
Copy link
Contributor

Well yes I can google it and lots of different recipes come up, interactive work needed, editing a file, blah blah.... lots of work.

We have cms specific git commands like git cms-addpkg, so I propose one makes a cms-squash command which assumes all commites in a repo (from which the PR is generated) to be quashed. This is listed together with the bot commands when a PR is made, so that proponents of PRs can choose to apply this easy recipe (similar to code formatting) and know what to do and do not need to search for it. (IOW, make it easy for non git experts).

@fwyzard
Copy link
Contributor

fwyzard commented Sep 28, 2023

We have cms specific git commands like git cms-addpkg, so I propose one makes a cms-squash command which assumes all commites in a repo (from which the PR is generated) to be quashed. This is listed together with the bot commands when a PR is made, so that proponents of PRs can choose to apply this easy recipe (similar to code formatting) and know what to do and do not need to search for it. (IOW, make it easy for non git experts).

Yes, that's a reasonable request.

I think the most complicated part in writing such tool is how to figure out what is the "starting point" to use for the squash.

A possibility could be to use the commit corresponding to the release itself, i.e. if one is working in CMSSW_13_3_0_pre3 or CMSSW_13_3_X_2023-09-27-2300, the script could produce a single commit including all the changes on top of those tags.

@antoniovilela
Copy link
Author

@makortel
Matti, would you like to summarize the proposals?
Many thanks.

@makortel
Copy link
Contributor

makortel commented Oct 2, 2023

Matti, would you like to summarize the proposals? Many thanks.

I'd suggest we discuss this topic in the core software meeting next week (as a first topic, so should have minimal impact to ORP attendees that want to participate but not follow full core software meeting).

@makortel
Copy link
Contributor

Outcome of the discussion in the core software meeting

  • we want to stay with the current merging strategy
  • we will provide an easy-to-use command for developers to squash the commits in their branch (as suggested in Change default action to squash merge in cmssw repo. #2080 (comment), Change default action to squash merge in cmssw repo. #2080 (comment))
  • we will improve cms-bot messages and documentation to point to the exact command to do the squash
  • we will improve cms-bot to detect (earlier) big and/or binary file
  • we will improve cms-bot to detect when the contents were not changed, and to not request new signatures if the PR's diff didn't change
  • we will also look into making cms-bot to not require a new signature of a package A, when that package A was not touched in an update of the PR

@kpedro88
Copy link
Contributor

see cms-sw/cms-git-tools#125 for cms-squash-topic

@jhakala
Copy link
Contributor

jhakala commented Oct 12, 2023 via email

@tvami
Copy link
Contributor

tvami commented Oct 12, 2023

But @jhakala if the user does the above introduced cms-squash-topic, it will be their contribution all along. At least that's my understanding

@makortel
Copy link
Contributor

@kpedro88 What does the cms-squash-topic do if the branch contains commits from different developers? Does it include the others-than-squasher with Co-authored-by:?

@makortel
Copy link
Contributor

As a side comment I'd find the model of directly approaching only the committer of a particular change and not including the assigned maintainers (that would be PdmV in case of runTheMatrix) in the communication suboptimal in general (there are certainly exceptions).

@kpedro88
Copy link
Contributor

@makortel to test this, I picked a random recent PR branch and added my own trivial commit to it.

  1. The original branch (mirrored to my fork): https://github.com/kpedro88/cmssw/commits/Run3-sim149Y
  2. The branch with my extra commit: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-plus
  3. After running cms-squash-topic: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash
  4. Instead using git rebase -i to squash: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-rebase-squash

In 3. the commit author is me, with the original commit authors listed in the combined commit message. In 4. the author of the first commit is preserved, with me as the committer. This is slightly better, but doesn't scale beyond 2 people (author and committer fields can be different, but each one can only have one value).

GitHub and GitLab support a co-author standard described here: https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors.

With a modified version of cms-squash-topic, additional authors can be included:
5. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2

However, if cms-squash-topic is called a second time (or more generally, called on a branch that includes commits that already have coauthors), the coauthor info is lost:
6. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2again

This can be addressed with a further modification (at the cost of a bit of redundancy):
7. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2again2

cms-sw/cms-git-tools#125 is updated with the changes.

@kpedro88
Copy link
Contributor

@jhakala I hope #2080 (comment) addresses your concern more directly.

The issues you raised were actually the subject of a long discussion in the core software meeting on Tuesday, and generally there was agreement that we want to attribute changes properly. That led to the plan in #2080 (comment) to provide a tool whose use can be requested at the discretion of the release manager (or for other purposes for which it might be convenient).

@fwyzard
Copy link
Contributor

fwyzard commented Oct 13, 2023 via email

@jhakala
Copy link
Contributor

jhakala commented Oct 13, 2023 via email

@antoniovilela
Copy link
Author

@makortel to test this, I picked a random recent PR branch and added my own trivial commit to it.

  1. The original branch (mirrored to my fork): https://github.com/kpedro88/cmssw/commits/Run3-sim149Y
  2. The branch with my extra commit: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-plus
  3. After running cms-squash-topic: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash
  4. Instead using git rebase -i to squash: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-rebase-squash

In 3. the commit author is me, with the original commit authors listed in the combined commit message. In 4. the author of the first commit is preserved, with me as the committer. This is slightly better, but doesn't scale beyond 2 people (author and committer fields can be different, but each one can only have one value).

GitHub and GitLab support a co-author standard described here: https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors.

With a modified version of cms-squash-topic, additional authors can be included: 5. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2

However, if cms-squash-topic is called a second time (or more generally, called on a branch that includes commits that already have coauthors), the coauthor info is lost: 6. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2again

This can be addressed with a further modification (at the cost of a bit of redundancy): 7. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2again2

cms-sw/cms-git-tools#125 is updated with the changes.

Hi Kevin, all, belated thanks!

Let's continue to follow up on the list in #2080 (comment), namely pointing to the squash command in cms-bot messages; check if there is no diff in the PR and if so not reset the tests & signatures; more generally, not reset signatures related to a given package that was not updated in a new commit in the PR.

Thanks again.

@smuzaffar
Copy link
Contributor

FYI, bot PR #2144 should allow to squash a cmssw PR without resetting the L2's signatures. As squashed creates a new commit so the PR checks will be reset but bot will re-run the code-checks and PR tests automatically.

@antoniovilela
Copy link
Author

FYI, bot PR #2144 should allow to squash a cmssw PR without resetting the L2's signatures. As squashed creates a new commit so the PR checks will be reset but bot will re-run the code-checks and PR tests automatically.

Fantastic!

@smuzaffar
Copy link
Contributor

@antoniovilela , can we close this issue now?

@antoniovilela
Copy link
Author

@antoniovilela , can we close this issue now?

Yes, I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests