-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write a guide for doing "Stacked Reviews" with GitHub Pull Requests #56636
Comments
@llvm/issue-subscribers-infrastructure |
I'd like to challenge the idea that we need to "provide similar functionality to Phabricator's "Stacked Reviews"". We need a good way to do stacked reviews in Github. It already has a way to review multiple commits in a single PR by, well, just reviewing a branch that has more than one commit. It may not be a good way, it may need some restraint with force-pushes, but that is the natural way PRs do stacked reviews. Trying to emulate Phab is leading us into the wrong path, trying to force a feature that doesn't exist in GH and will not solve the problem we're trying to solve: make it easier for external people to put their code for review in LLVM. If we end up with a convoluted set of rules, being on GH will make no difference to how hard it is to do code review in LLVM, but it will definitely make the lives of current reviewers hard, forcing us to learn (and teach) a whole new way of doing things that are very likely slightly worse than previously (because it's more unnatural). |
This does not fit many of the requirements: in particular shard the discussion with multiple reviewers that care only about a subset of the contribution, handle this through the lifecycle of the contribution and its review, the ability to get the subset of the contribution approved and merge it independently of the subsequent patches. While I can agree that the common trap is to want to have something too close to "the way it is done right now", throwing away all requirements is not appropriate either. |
I'm not sure where in my reply you managed to find that meaning. |
Maybe Stacked View is misleading. You want to develop a larger feature that has to be split into smaller pieces. GitHub supports task lists in issues: |
The difference with a "task lists" or "tracking issue" as I understand it is that in this case the PRs can be reviewed and submitted totally disconnected. That is two PRs don't have dependencies and opening one does not rely on the code of the other. Alternatively, you can accomplish a "task list" by waiting for the first PR to be merged before opening the following one that depends on it. Stacked PRs allows to open and review dependent changes separately as a pipeline of contributions. |
The first SPIRV part was a Stacked Review and it took months until it was committed |
I am hopeful that using a single PR (with improvements) for stacked reviews would be workable if GitHub first finds a way to handle re-associating in-line comments on rebases/amends. Maybe if they had a force-push UI where the user has to tell it where the commits with attached comments appear in the new branch. After that, maybe something like using subtopic tags on individual commits would allow for filtering/approval by tag. Once a PR is approved for some tag, the author reorders their commits so the approved part sits on top of the base branch and the UI would be made to provide merge options that land just that part. |
I would not rely on assuming GitHub will do anything :) |
On Discourse there is an indication that graphite.dev handles stacked reviews better. It would be great to investigate this possibility. https://docs.graphite.dev/getting-started/the-graphite-workflow |
Two questions on that doc:
|
That (merging one patch of the stack before others) is the status quo with Phab, right? We don't have a way to block/require an all-or-nothing approval on a patch series that I know of? |
Correct, no way to block, other than asking people nicely. I'm (just a little) concerned with their documentation. If we point developers to that, they'll assume we're also fine with that kind of operation. It would be nice to have a way to block, it, but not a deal breaker. |
Yeah, given the challenges of this migration already - it's probably important to consider improvements to the workflow as out-of-scope for the transition, since there are enough functionality changes/losses (depending on perspective/particular workflows) that finding workarounds/retraining people for those is enough of a challenge :s - good things to keep in mind for wherever we end up/when we end up there to keep improving things |
@llvm-beanz started another related discussion on Discourse: https://discourse.llvm.org/t/rfc-revisiting-linear-history-vs-merge-commits/64873/2 |
The most simple implementation of this would be just put links to child pull requests in the initial comment of the parent pull request. My questions are:
|
"links to child" isn't half the problem, here are more interesting questions I think:
|
How does this work in Phabricator? |
It's not clear to me in what form the child PRs will take in this case. For example, if they are just another PR based off on main, then they'd be the first PR plus extra commits. Links from parent PRs to child PRs would be useful though. I believe you could achieve this by stating in the child something akin to "Depends on #XXX" which would generate a mention in the parent PR. This functionality already exists in Phab (literally adding Depends on DNNNNNN creates a parent/child link).
Ability to create a patch that physically depends on another in-flight patch, whilst being able to review and approve them all separately. The existing GitHub PR system allows you to review multiple or individual commits within a PR, but all the messages end up getting mixed up in the Overview tab, and you can only approve the PR as a whole (plus there are other issues regarding how to handle fixup commits/force pushes etc). To be more precise, the "stack" system itself isn't required for this, since Phab allows you to upload arbitrary diffs -that isn't true in GitHub, because PRs are based on real commits and branches in a repo somewhere. Additionally, tools can read the stack to determine how to handle stacked PRs. For example, Phab pre-commit testing does this to ensure it has applied the patch on top of the correct base patch. I guess with sufficient adaptation, the tooling could be adapted to read these links from the PR descriptions, so this one isn't really an issue.
I think it's entirely up to the contributors involved. I've had cases where I've bothered to update the downstream PRs and cases where I haven't - to some extent it depends on how much of an impact updating PR 2 has on 3 and 4. If I haven't updated 3 and 4, e.g. because the change in 2 doesn't have a direct impact on them, then I usually don't bother ever doing it in the UI, and only rebase locally prior to pushing the sequence. With GitHub, you can do the same, except that you usually have to rebase the later PRs prior to merge at the GitHub end (because GitHub is the thing doing the merging, and landing the previous commit would usually end up with a disjoint history with the later PRs relying on a commit that didn't actually ever land - GitHub rewrites the commit title by appending the PR link to the title). |
I think Phabricator would be roughly equivalent of:
So assuming that locally it looks like: You can have 4 pull-requests:
That creates 4 remote branches, and then you open:
Each PR only shows its own commit, focusing the review on it. You can checkout locally any PR and you get all the ones it is based on as well. When the first PR is merged, the second one is automatically retargeted to Working through the lifecycle of the stack is all scriptable, for example skim through the readme here: https://github.com/ejoffe/spr (I have't tried this particular script). I would add that this is even somehow superior to Phabricator in how the PR are naturally connected and checking out the code for one ensures self-consistency, the only part where GitHub is lacking compared to Phab (I think) is the UI and the way it tracks comments across updates. |
I have some limited experience with something similar in a different (downstream from LLVM) project where branches in the main repository are not allowed. The key difference is:
It's annoyingly manual, but it does work when you keep a clean commit history (as we should!) because GitHub's diff view allows you to select an individual commit to look at when reviewing. (It doesn't model the idea of approval of an individual commit, so it's a matter of manually writing something like "Patch N ("headline of patch") LGTM." By the way, to make this workflow a bit more pleasant on the reviewer's side (which is arguably the most important part in all of this, since review bandwidth is a serious bottleneck), I wrote this tool which does take some getting used to but provides reasonably clean diffs after rebases, and can also act as an IMHO nicer alternative to |
Have anyone seen or evaluated Ghstack ? that seems like it could fit the bill in theory. |
From their README:
Seems they only automate the process, don't bring new features. |
I've seen it used for pytorch successfully, but yup, it requires the ability to create new branches in the repo you're sending the PR too. Would we consider relaxing the "no branches in the main repo" restriction to enable stacked PRs? |
I believe this is something that was identified as the best tradeoff in one of the issue tracking stacked PRs. |
I played around with https://github.com/ejoffe/spr and https://github.com/ezyang/ghstack. (Disclaimer: ghstack was developed for Pytorch by a Meta engineer, and I also work at Meta, but I'm not affiliated with the ghstack author or Pytorch otherwise.) ghstack is installed via pip. spr can be installed via Homebrew or apt, and also has precompiled binaries available. I'm not sure if one of those is generally preferred for corporate environments. Neither of them require root to install, and Python 3.4 and newer guarantee pip being present. spr relies on the gh CLI for authentication, so you can get going right away if you have that already set up. ghstack requires an OAuth token to be specified manually (though you could use the same token as for gh). Both of them require branches to be created in the main repository, which seems to be a fundamental GitHub limitation. ghstack creates its commits under ghstack has a more complex branching scheme, which is described in https://github.com/ezyang/ghstack#structure-of-submitted-pull-requests. The aim is to not lose review comments or context when rebasing, which is a problem that's been discussed here. spr does something simpler and just creates one branch per PR. ghstack adds lines like the following to each commit:
spr adds the following instead:
Both are a little noisy, though the PR URL is potentially useful. I dunno if we can scrub them out automatically during the land. The stack that I created with ghstack is: The stack that I created with spr is: Each PR has a link to the full stack. The big difference is that when updating a stack, ghstack adds new commits on top, whereas spr just rebases. You can see the difference in https://github.com/smeenai/llvm-project/pull/3/commits vs. https://github.com/smeenai/llvm-project/pull/27/commits. In both cases, I amended an existing commit and then updated the stack. ghstack automatically creates an update commit, which lets reviewers see changes made in this update (while still having the Files Changed view to review the overall PR). spr just force pushes the updated commit instead. IMO although ghstack is a little bit harder to setup, its more complex branching scheme which preserves individual updates and avoids force pushing is a pretty killer feature. I can ask about the possibility of creating branches not under |
In my experience, while nothing has landed, this generally works. I have never seen the workflow work properly once the first PR is "merged" using "squash and merge" (which is the easiest way to maintain linear history). The "downstream" PRs then need to be all rebased (and maybe GitHub has gotten better with inline comments and rebasing, but if it has, I haven't noticed). |
Ah, damn, ghstack requires force pushing, so that wouldn't work for us: ezyang/ghstack#50. I'm not really sure why though; I'll ask on the issue. |
GitHub seems to disallow pull requests from refs not under |
Thanks for trying! We should adopt a convention for these branches, and update our docs with a ref spec that would exclude them for normal fetches for day-to-day development ( |
I misread the code here; I think it'll end up working out for LLVM by accident (IIUC we do have branch protection for main set up, but non-admins can't see the branch protection settings, so ghstack will be fine ... I don't understand this super well though).
Both ghstack and spr create their branches under their own |
There's another tool also called I have no idea which is the best, haven't tried any of them. |
This looks pretty nice! They even claim trying to provide a similar experience to arc. @tstellar can we open a namespace like |
@joker-eph Do we know how this will affect the repo size? I thought there was an issue where refs that are present when a repo is forked can't be deleted or garbage collected. |
I technically can't have a zero-impact: any single line of code has impact, in practice though it should be noticeable: we only keep branches for open pull-requests, so code that is meant to merge anyway. It's also intended to be used only for stacked PR, it shouldn't be crowded. For the overall repo garbage collection through forks issue, this seems no-impact since indeed it is an issue that already affects forks, where branches are pushed today. |
Do the tools support easily customizing their branch namespace, or should we just open up their default branch namespaces (e.g. |
I tried |
@joker-eph I've removed the branch creation restriction for users/**/* |
Stacked Git (StGit) may also be worth consideration, though it currently doesn't support nested stacks (for bulk reordering of patches) or GitHub's API for pull requests. I haven't tested StGit with spr yet. |
I have been trying this one. Things have been going mostly smoothly up until the point where I start to land a stack of commits on |
I had real trouble reviewing a PR created via SPR. Initially it was fine, but at some point they rebased the PR and this made it impossible to sensibly view what had actually changed, because commits in the "Files changed" had become linked in such a way as to make it impossible to view the important one only, and the full diff contained all sorts of completely unrelated changes pulled in from the rebase. Unfortunately, I didn't make a note of the PR number for others to look at. |
This kind of problem is why I wrote diff-modulo-base. The tool is admittedly a bit rough around the edges and it doesn't understand SPR -- I primarily use it with PRs that just natively have a sequence of commits and are rebased. But it could probably be extended. |
The text was updated successfully, but these errors were encountered: