-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A build that starts immediately after the completion of a prior build for the same branch sometimes gets the wrong merge #269
Comments
There is an Advanced Project Setting for Quiet period. You can set that for the timeout. Once the build is scheduled the trigger has no more control over it. |
Repeating here the comment I put on the other linked issue. The quiet period does not work to assist with this problem. The quiet period starts immediately after the build is triggered, not when the build actually starts. If there is a build already running, the quiet period would have to be longer than the time to complete a full build in order to be effective. The suggestion I posted about using the pre SCM fetch plugin and putting in a manual sleep is the only way we have found to force a pause between the first build ending and the second build starting when the second build was already queued with the quiet period expired. |
When I first added the merge section I had the same issue. We were committing back a version number to the repo before "pushing the button." I had to add a pause for github to re-merge the PR before trying to send the command. |
Yeah, the pause has been pretty reliable at the front of the next build, although a pause at the end after the merge back to the golden branch would probably work as well - I'm not entirely sure what changed on GitHub's end, but the length of time the pending PRs take to settle was previously rapid enough that we didn't run into it on every stacked build, but is it happening now on every set of stacked builds without the pause we put in - 5 seconds seems to be a reliable buffer. |
I know there really isn't anything that can be done about the specific issue of the pr/x/merge branch returning the previous merge sha, since that is completely out of the plugin's control, but there is one part of it that could be addressed (since getting the older merge sha isn't always a problem for the build itself, depending on the code delta). When the commit notifier is triggered (it appears to be triggered by the plugin directly), the sha used is the merge commit. That causes the "No commit for SHA" exception in the other issue ticket - if that specific call to the commit notifier used the actual commit sha instead, the only failure would be caused by someone force pushing to the pr after the build had started, which is something that I would expect to fail anyway. |
When the trigger runs it queries github for the pull request, and gets the latest comments/commits. If it decides to build it sets up the environment variables, generates the cause, then ships it off to be run. When the build starts, we execute Builds.onStarted which creates a commit status, and sets the build description. Then when the build is completed Builds.onCompleted is run. In between the schedule build and onStarted there is no input from the trigger. The GitSCM plugin handles the actual checkout. I think if there were a pause and then check the status after the merge the job could avoid having any artificial delays. |
So, that won't actually work - here is hopefully a better description of what and when the problem happens: We are letting the PR builder handle the "push" of the merge. The problem isn't the merge of the first job - it is complete when the first job is over, with the correct "merged" status (so checking the status won't help here, the status is merged). The problem is the merges of all of the PRs that are pending for the same branch and have been approved (and thus triggered) and already are queued in Jenkins with pending status waiting to start BEFORE the first job completes. The trigger at that point has already queried github for the pull request, as well as the actual commit (as these are properties passed in as build parameters and used for the build description when the build is shipped off). During both the onStarted and onCompleted phases, one of the actions that the ghprb plugin takes is to notify GitHub (via the commit notifier api) that things are pending (onStarted), and then successful or failed (onCompleted). The problem is that the ghprb is using the MERGE commit and not the ACTUAL commit (it knows both of these at this point) for the notification at the end of the build (onCompleted) The problem is that when build #2 is triggered, build #1 is still in progress (and not yet merged). The pull request builder has the correct pr, and has the correct "actual" commit sha. The timing issue is not here. The timing issue is in between Build 1 finishing and Build 2 (which was already pending when Build 1 completed) starting. Build 1 completes, successfully merges the PR. The status of the PR is merged and all is well for Build 1. But all is not well for Build 2. Build 2 starts immediately after Build 1 completes (because Build 2 was already pending, and thus the quiet period has expired and no longer applies). The very first thing Build 2 does (outside of the ghprb plugin's control) is to fetch from GitHub. The fetch for the PR for Build 2 returns /head and /merge for that PR. The /merge at this point returns the /merge from Build 2's PR prior to Build 1's merge succeeding. GitHub hasn't finished re-merging Build 2's PR into the branch to check mergeability yet (because that is done asynchronously AFTER Build 1's commit is merged successfully). Build 2 starts and fetches the /merge ref for it's PR. GitHub then finishes the asynchronous re-merge of PR 2, and clobbers the old /merge sha with the new /merge sha. It is too late for Build 2 however, because the /merge ref for Build 2 has the older merge sha. This only becomes problematic in the onCompleted phase, when the ghprb plugin tries to notify GitHub that the build is successful. The notification is sent to the MERGE sha, which no longer exists on GitHub, because it is the old sha that was clobbered when the GitHub re-merge finished. This causes the No Commit for SHA exception. If this notification (or, really, ALL of the notifications via the commit notifier) were sent to the ACTUAL sha instead of the MERGE sha, this error would not happen and as a side bonus, we would actually be able to see all of the updates that went to the commit in the PR history. (Since, right now, if it fails and the merge sha changes again, any of the messages that went to the old merge sha now are no longer visible in the PR, since the old sha is gone) |
There are a couple misconceptions here.
I understand what you are saying, I just want to help explain the code. If you look through you will only ever see one commit value passed around. The ${sha1} variable is generated just before shipping off the build to be scheduled, where cause.getCommit() returns the real commit in question:
|
I was just thinking, the plugin gets called on start, it could check that the PR is stabilized or unmerfeable at that time and update/ pause the build |
What we were seeing is that the cause commit was not the actual last commit that the pr was based on - the cause commit was the commit at the tip of whatever ref the build is actually being built off of (which is the sha of the /merge, not the sha of the /head). The cause.getCommit() was returning the "real" commit, that the build was being built off of, but it is being built off of the /merge, not /head. The /head has the actual commit, the /merge has the pre-merged commit done by GitHub. That /merge sha is a real sha, and it is visible by itself in GitHub as long as the merge is valid - that is how we were able to track down what was happening, by looking at the sha that was being pushed to by the commit notifier at the end, and comparing that to the refspec tips from the /merge and /head for that PR in the local Jenkins workspace, and then looking up that sha directly in GitHub. We also noticed that If you run the build off of the ghprbActualCommit instead of the sha1 parameter, the behavior is totally different with the notifications - you actually get a bunch more detail about the build's final status showing up in the PR if it fails the build and it notifies the correct commit at the end. It may appear to work properly because as long as that merge commit is valid GitHub properly attributes the information to the PR, but it is not the correct commit. We were able to grab the commit that the notifier is using at the very end to notify the completed status, and verify on GitHub what commit is it - it is most certainly not the actual last commit of the PR. It is possible that it may appear to be the same if it is a fast forward merge, but ours very often are not (and, in fact, we prefer them not to be) |
With the second thought - is it possible to check with GitHub to see if the PR is stabilized? It should absolutely check if it is no longer mergeable when the build actually starts (since that could have changed) - a fail fast there would be helpful (but it will fail anyway at the end). In our case though, it is just not stabilized, it is almost always still mergeable. |
If you follow the steps to reproduce that I outlined, we had a clear and reproducible test case that would fail the second build every time, always with the no commit for sha error, and always trying to notify the incorrect commit. I understand you are saying it shouldn't work that way, but I can reproduce exactly the behavior I have outlined, every time. |
I have emailed GitHub asking if there is a way to tell when the PR settles. |
Please check out 1.21.1 once it is in the update-center. It should be up in the next couple of hours. |
This appears to be a timing issue with the following steps to reproduce (reproduces fairly reliably for us, but as it is a timing issue it might not always reproduce for every repo)
The problem is, the correct merge isn't the sha being built. The sha being built is the old merge from prior to PR1's build succeeding. We believe this to be a timing issue with GitHub. When build 2 starts, GitHub has not yet completed the re-merge, and so the fetch in the beginning of the build gets the old merge sha. For us, this can either cause the second build to fail at the very end when the commit notifier attempts to notify to the merge sha about the status of the build, or the second build runs and may fail because conditions (db upgrade or code coverage thresholds, for example) were changed by the previous build and this build does not contain those changes.
There is a workaround and some suggestions for a possible fix.
Workaround - Install the pre-scm-buildstep and run a 'sleep 5' prior to fetching from GitHub. This reliably allows GitHub time to process the re-merge so that when the build fetches, the correct new merge sha is retrieved.
Suggestions:
The text was updated successfully, but these errors were encountered: