Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to checkout PR when a push -f is done to the PR while is being already built #148

Open
leandro-lucarella-sociomantic opened this issue May 13, 2014 · 60 comments
Assignees
Labels

Comments

@leandro-lucarella-sociomantic

This issue comes from https://issues.jenkins-ci.org/browse/JENKINS-22537

Please consult that issue for details. Here is the summary of the latest research.

The setup is a job using a configuration matrix (I'm not sure if this is strictly necessary to reproduce the problem though) with only one variable and 2 configurations (which is only the Ubuntu version to do the build). Then the ghprb plugin is activated and configured to trigger a build by the GH hook.

Steps to reproduce:

  1. Create a dummy project in GitHub
  2. Add a build script that is only a sleep 60
  3. Configure the Jenkins job to build with the ghprb plugin
  4. Create a PR
  5. Change something in the PR (you can do a git commit --amend and just change the commit message)
  6. While jenkins is building the job (sleeping), do a git push -f to your PR branch
  7. Wait for Jenkins to finish the current build and start the new one
  8. You should see the error now

The error (this is the console output for one of the configurations in the matrix, the other have the same error) is:

Started by upstream project "Test" build number 40
originally caused by:
 GitHub pull request #76 of commit 8e9a1ab00c82edf5f37b400f0a98b6476ba63b97 automatically merged.
Building in workspace .../jenkins/jobs/Test/workspace/Ubuntu/trusty
Cloning the remote Git repository
Cloning repository [email protected]:someuser/somerepo.git
Fetching upstream changes from [email protected]:someuser/somerepo.git
using GIT_SSH to set credentials GitHub leandro-lucarella-sociomantic
Fetching upstream changes from [email protected]:someuser/somerepo.git
using GIT_SSH to set credentials GitHub leandro-lucarella-sociomantic
Checking out Revision 0c2e9e245bd8d7329c3d8b91013d806f4b112687 (detached)
FATAL: Could not checkout null with start point 0c2e9e245bd8d7329c3d8b91013d806f4b112687
hudson.plugins.git.GitException: Could not checkout null with start point 0c2e9e245bd8d7329c3d8b91013d806f4b112687
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1448)
    at hudson.plugins.git.GitSCM.checkout(GitSCM.java:896)
    at hudson.model.AbstractProject.checkout(AbstractProject.java:1411)
    at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:652)
    at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:561)
    at hudson.model.Run.execute(Run.java:1665)
    at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
    at hudson.model.ResourceController.execute(ResourceController.java:88)
    at hudson.model.Executor.run(Executor.java:246)
Caused by: hudson.plugins.git.GitException: Command "git checkout -f 0c2e9e245bd8d7329c3d8b91013d806f4b112687" returned status code 128:
stdout: 
stderr: fatal: reference is not a tree: 0c2e9e245bd8d7329c3d8b91013d806f4b112687

    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1276)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1253)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1249)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1065)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1075)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1443)
    ... 9 more

This is the console output for the whole job (not one configuration in particular):

GitHub pull request #76 of commit 8e9a1ab00c82edf5f37b400f0a98b6476ba63b97 automatically merged.
ode environment variables.
Building in workspace .../jenkins/jobs/Test/workspace
Fetching changes from the remote Git repository
Fetching upstream changes from [email protected]:someuser/somerepo.git
using GIT_SSH to set credentials GitHub leandro-lucarella-sociomantic
Checking out Revision 142cbdfe90a5e7cf8e2ba44eb6c16b4b93bb2b93 (detached)
Cleaning workspace
Resetting working tree
Triggering trusty
Triggering saucy
trusty completed with result FAILURE
saucy completed with result FAILURE
Finished: FAILURE
@valdisrigdon valdisrigdon self-assigned this May 14, 2014
@valdisrigdon
Copy link
Collaborator

I'm testing this on v1.532.3 with v1.12 of the plugin and do not see it on a non-matrix job.

@valdisrigdon
Copy link
Collaborator

I've tried this on a matrix build as well, and it seems to work fine.

When you execute the "git push -f", is the build step actually running? Or is the build waiting in a quiet period?

@leandro-lucarella-sociomantic
Copy link
Author

On Wed, May 14, 2014 at 08:11:53AM -0700, Valdis Rigdon wrote:

I've tried this on a matrix build as well, and it seems to work fine.

When you execute the "git push -f", is the build step actually
running? Or is the build waiting in a quiet period?

It's running, I didn't check with the quiet period. I'll look to update
all the plug-ins and try it again, just in case the bug was somewhere
else and it was fixed in the last few weeks.

@valdisrigdon
Copy link
Collaborator

Note that my testing is being done on Windows, but I can't imagine that would affect this.

@leandro-lucarella-sociomantic
Copy link
Author

I'm testing doing the push -f while the job is in the pending state, and the old pending job is not cancelled, so the quiet period doesn't seems very useful with the ghprb plugin, but I guess this is just an unrelated feature request :)

Anyway, by pushing -f while a job is waiting in the quiet period seems to be working.

Now I'll test again to push -f while the job is being run instead to see if I can reproduce or if it was fixed elsewhere (my plugin update updated both the ghprb from v1.11.2 to v1.12 and I think the Git plugin or some other Git related plugin, but I didn't write down the version and now I don't remember exactly and can't find it in the logs :S).

@leandro-lucarella-sociomantic
Copy link
Author

And I can still reproduce the problem when pushing -f while the job is being built.

@valdisrigdon
Copy link
Collaborator

So you run a "push -f" while job #1 is running. #1 finishes, and then ghprb will kick off job #2 and that one fails?

@valdisrigdon
Copy link
Collaborator

Is the pull request you are testing mergeable?

@leandro-lucarella-sociomantic
Copy link
Author

On Wed, May 14, 2014 at 09:05:05AM -0700, Valdis Rigdon wrote:

So you run a "push -f" while job #1 is running. #1 finishes, and then
ghprb will kick off job #2 and that one fails?

Yes, there is also a 2 minutes quiet period for the job, but I have the
feeling we didn't have that before and it also failed. The previous
running job ends before the new job completes the quiet period and even
then, the new job fails.

This is a poors man time diagram:

job #1 job #2

_ <- push/create PR
:
:
:
:
:
+
|
| _ <- push -f
| :
~ :
OK :
+
|
|
|
~
FAIL

: is quiet period
| is job running

@leandro-lucarella-sociomantic
Copy link
Author

On Wed, May 14, 2014 at 09:13:43AM -0700, Valdis Rigdon wrote:

Is the pull request you are testing mergeable?

Yes it is.

@valdisrigdon
Copy link
Collaborator

Are you able to attach Jenkins logs? Add a logger for "org.jenkinsci.plugins.ghprb" set to DEBUG. In addition, seeing the Git plugin logs would help.

@leandro-lucarella-sociomantic
Copy link
Author

On Wed, May 14, 2014 at 09:17:52AM -0700, Valdis Rigdon wrote:

Are you able to attach Jenkins logs? Add a logger for
"org.jenkinsci.plugins.ghprb" set to DEBUG. In addition, seeing the
Git plugin logs would help.

Mmm, how can I do that? I added a logger for
"org.jenkinsci.plugins.ghprb", but I don't see a DEBUG level, the are
SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST, ALL. It is ALL or
there is a special way to enable a DEBUG level?

@valdisrigdon
Copy link
Collaborator

ALL works.

Sent from my iPad

On May 15, 2014, at 6:03 AM, "Leandro Lucarella" <[email protected]mailto:[email protected]> wrote:

On Wed, May 14, 2014 at 09:17:52AM -0700, Valdis Rigdon wrote:

Are you able to attach Jenkins logs? Add a logger for
"org.jenkinsci.plugins.ghprb" set to DEBUG. In addition, seeing the
Git plugin logs would help.

Mmm, how can I do that? I added a logger for
"org.jenkinsci.plugins.ghprb", but I don't see a DEBUG level, the are
SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST, ALL. It is ALL or
there is a special way to enable a DEBUG level?

Reply to this email directly or view it on GitHubhttps://github.com//issues/148#issuecomment-43191268.

@leandro-lucarella-sociomantic
Copy link
Author

Here is the log for a push -f for a PR that is already broken (this means, once this happens in a PR, then any other push -f will fail in the same way, even if it wasn't triggered when the job was being built). I can do the same for a fresh PR and log the first breakage.

Logger:

May 15, 2014 11:50:35 AM INFO org.jenkinsci.plugins.ghprb.GhprbRootAction doIndex
Got payload event: pull_request

May 15, 2014 11:50:35 AM INFO org.jenkinsci.plugins.ghprb.GhprbPullRequest check
Pull request #77 was updated on sociomantic/playground at 5/15/14 10:03 AM by User:leandro-lucarella-sociomantic

May 15, 2014 11:50:36 AM FINE org.jenkinsci.plugins.ghprb.GhprbPullRequest
New commit. Sha: 2c8be0044ff5814bd71b5564695f5afbfd7c67a0 => 853e20559daea977aa6b63d53d378e875258e147

May 15, 2014 11:50:36 AM INFO org.jenkinsci.plugins.ghprb.GhprbRepository createCommitStatus
Setting status of 853e20559daea977aa6b63d53d378e875258e147 to PENDING with url null and message:  Merged build triggered.

May 15, 2014 11:50:36 AM INFO org.jenkinsci.plugins.ghprb.GhprbPullRequest build
 Merged build triggered.

May 15, 2014 11:52:36 AM INFO org.jenkinsci.plugins.ghprb.GhprbRepository createCommitStatus
Setting status of 853e20559daea977aa6b63d53d378e875258e147 to PENDING with url https://ci.sociomantic.com/job/Playground%20PR/50/ and message: Merged build started.

May 15, 2014 11:52:45 AM INFO org.jenkinsci.plugins.ghprb.GhprbRepository createCommitStatus
Setting status of 853e20559daea977aa6b63d53d378e875258e147 to FAILURE with url https://ci.sociomantic.com/job/Playground%20PR/50/ and message: Merged build finished.

The hash used there is OK, 853e20559daea977aa6b63d53d378e875258e147 is the new HEAD for that PR.

Console output for the failed job (the log for one of the matrix configs):

Started by upstream project "Playground PR" build number 50
originally caused by:
 GitHub pull request #77 of commit 853e20559daea977aa6b63d53d378e875258e147 automatically merged.
Building in workspace /srv/jenkins/jobs/Playground PR/workspace/Ubuntu/10.04
Cloning the remote Git repository
Cloning repository [email protected]:sociomantic/playground.git
 > git init /srv/jenkins/jobs/Playground PR/workspace/Ubuntu/10.04
Fetching upstream changes from [email protected]:sociomantic/playground.git
 > git --version
using GIT_SSH to set credentials GitHub admin
 > git fetch --tags --progress [email protected]:sociomantic/playground.git +refs/heads/*:refs/remotes/origin/*
 > git config remote.origin.url [email protected]:sociomantic/playground.git
 > git config remote.origin.fetch +refs/heads/*:refs/remotes/origin/*
 > git config remote.origin.url [email protected]:sociomantic/playground.git
Fetching upstream changes from [email protected]:sociomantic/playground.git
using GIT_SSH to set credentials GitHub jenkins-admin
 > git fetch --tags --progress [email protected]:sociomantic/playground.git +refs/pull/*:refs/remotes/origin/pr/*
Checking out Revision 655e53cff41bffdb724ded7e4a6be3fc2e541284 (detached)
 > git config core.sparsecheckout
 > git checkout -f 655e53cff41bffdb724ded7e4a6be3fc2e541284
FATAL: Could not checkout null with start point 655e53cff41bffdb724ded7e4a6be3fc2e541284
hudson.plugins.git.GitException: Could not checkout null with start point 655e53cff41bffdb724ded7e4a6be3fc2e541284
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1479)
    at hudson.plugins.git.GitSCM.checkout(GitSCM.java:896)
    at hudson.model.AbstractProject.checkout(AbstractProject.java:1411)
    at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:652)
    at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:561)
    at hudson.model.Run.execute(Run.java:1665)
    at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
    at hudson.model.ResourceController.execute(ResourceController.java:88)
    at hudson.model.Executor.run(Executor.java:246)
Caused by: hudson.plugins.git.GitException: Command "git checkout -f 655e53cff41bffdb724ded7e4a6be3fc2e541284" returned status code 128:
stdout: 
stderr: fatal: reference is not a tree: 655e53cff41bffdb724ded7e4a6be3fc2e541284

    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1307)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1283)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1279)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1084)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1094)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1474)

The hash 655e53cff41bffdb724ded7e4a6be3fc2e541284 is obviously the wrong one, and the hash for the previous HEAD for the PR, before the push -f.

@valdisrigdon
Copy link
Collaborator

One other difference in my testing is that I'm using polling. I will try that.

@valdisrigdon
Copy link
Collaborator

Under SCM --> Git, what additional behaviors do you have set? I'm using push notifications now, and a matrix job, and it still seems to do the right thing.

@valdisrigdon
Copy link
Collaborator

The log for the matrix config that's failing doesn't show a call to git rev-parse. In my configuration I see a call to git.exe rev-parse "origin/pr/8/merge^{commit}"

@valdisrigdon
Copy link
Collaborator

One more question -- are the matrix builds being run on remote build agents?

@leandro-lucarella-sociomantic
Copy link
Author

Under SCM --> Git, what additional behaviors do you have set? I'm using push notifications now, and a matrix job, and it still seems to do the right thing.

Right now, "clean after checkout", but I tried disabling it before and it didn't help.

@leandro-lucarella-sociomantic
Copy link
Author

The log for the matrix config that's failing doesn't show a call to git rev-parse. In my configuration I see a call to git.exe rev-parse "origin/pr/8/merge^{commit}"

Mmm, this looks weird. I have no idea of who is in charge of doing that or how can I change the behaviour.

@leandro-lucarella-sociomantic
Copy link
Author

One more question -- are the matrix builds being run on remote build agents?

No, it's all local, they run in a vagrant VM, but that's in the build script, and the build is broken before that.

@leandro-lucarella-sociomantic
Copy link
Author

And I actually, this test repository is not even using vagrant, the build script is just: make (and the makefile is just a sleep 60 command).

@valdisrigdon
Copy link
Collaborator

What version of Git is on the box?

@leandro-lucarella-sociomantic
Copy link
Author

On Fri, May 16, 2014 at 05:20:16AM -0700, Valdis Rigdon wrote:

What version of Git is on the box?

git version 1.7.9.5

@valdisrigdon
Copy link
Collaborator

Have you tried upgrading to the 1.9.x release? I'm trying to eliminate various differences between your setup and what I'm testing on here.

@leandro-lucarella-sociomantic
Copy link
Author

On Fri, May 16, 2014 at 06:38:16AM -0700, Valdis Rigdon wrote:

Have you tried upgrading to the 1.9.x release? I'm trying to eliminate
various differences between your setup and what I'm testing on here.

I can upgrade, sure.

@leandro-lucarella-sociomantic
Copy link
Author

Mmm... I upgraded Git to 1.9.3 and Jenkins to the new LTS version 1.554.1, and I can't reproduce the problem with that configuration.

@leandro-lucarella-sociomantic
Copy link
Author

I'll re-enable the PR builder in other jobs and report back if I find any jobs failing.

@artembochkov
Copy link

Hi, all,
We also experience the same problem. We have a matrix job building a project for a 5 different configurations (platforms).
Sometimes it fails like this for one configuration and works well for the rest ones:
FATAL: Could not checkout null with start point f11538533f716841f223abbf0cbe43a60a0590a8
12:53:31 hudson.plugins.git.GitException: Could not checkout null with start point f11538533f716841f223abbf0cbe43a60a0590a8

And if we rerun the job - all works fine.

@DeepDiver1975
Copy link

reported in Jenkins Jira as well - https://issues.jenkins-ci.org/browse/JENKINS-26748

@martyngigg
Copy link

We are also experiencing this problem with matrix jobs. The parent has the correct reference but then when it starts the child jobs it looks like it passes along an old sha1 from a previous merge and not the updated one.

Output from parent job

GitHub pull request #219 of commit 096f9835a3de34f1f8713cc0513845ff83f07e30 automatically merged.
Setting status of 096f9835a3de34f1f8713cc0513845ff83f07e30 to PENDING with url http://builds.mantidproject.org/job/pull_requests/509/ and message: Merged build started.
[EnvInject] - Loading node environment variables.
Building on master in workspace /var/lib/jenkins/jobs/pull_requests/workspace@6
Cloning the remote Git repository
Cloning repository git://github.com/mantidproject/mantid.git
 > git init /var/lib/jenkins/jobs/pull_requests/workspace@6 # timeout=10
Fetching upstream changes from git://github.com/mantidproject/mantid.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress git://github.com/mantidproject/mantid.git +refs/heads/*:refs/remotes/origin/* # timeout=30
 > git config remote.origin.url git://github.com/mantidproject/mantid.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url git://github.com/mantidproject/mantid.git # timeout=10
Fetching upstream changes from git://github.com/mantidproject/mantid.git
 > git -c core.askpass=true fetch --tags --progress git://github.com/mantidproject/mantid.git +refs/pull/*:refs/remotes/origin/pr/* # timeout=30
 > git rev-parse refs/remotes/origin/pr/219/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/219/merge^{commit} # timeout=10
Checking out Revision a66d7a5c5fb873ee28462a27cec0b79b131b6530 (refs/remotes/origin/pr/219/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a66d7a5c5fb873ee28462a27cec0b79b131b6530
 > git rev-list 9e5da46e719b86cc4e3679562cdf4bfa386c4b1c # timeout=10
First time build. Skipping changelog.
Triggering pull_requests » win7-build
Triggering pull_requests » ubuntu-14.04-build
Triggering pull_requests » fedora20-build
Triggering pull_requests » rhel6-build
Triggering pull_requests » osx-10.8-build
Configuration pull_requests » win7-build is still in the queue: Waiting for next available executor on win7-build
pull_requests » win7-build completed with result SUCCESS
pull_requests » ubuntu-14.04-build completed with result FAILURE
pull_requests » fedora20-build completed with result FAILURE
pull_requests » rhel6-build completed with result FAILURE
pull_requests » osx-10.8-build completed with result FAILURE
Setting status of 096f9835a3de34f1f8713cc0513845ff83f07e30 to FAILURE with url http://builds.mantidproject.org/job/pull_requests/509/ and message: Merged build finished.
Finished: FAILURE

It is checking out the merge HEAD of the pull request at a66d7a5c5fb873ee28462a27cec0b79b131b6530, which looks correct if I compare it to pulling down the pull/219/merge locally.

However, when the child builds start you see (snipped output)

Started by upstream project "pull_requests" build number 509
originally caused by:
 GitHub pull request #219 of commit 096f9835a3de34f1f8713cc0513845ff83f07e30 automatically merged.
[EnvInject] - Loading node environment variables.
Building remotely on ndav-builder2 (vm sphinx rhel6-build cades) in workspace /home/builder/jenkins-linode/workspace/pull_requests/label/rhel6-build
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url git://github.com/mantidproject/mantid.git # timeout=10
Fetching upstream changes from git://github.com/mantidproject/mantid.git
 > git --version # timeout=10
 > git fetch --tags --progress git://github.com/mantidproject/mantid.git +refs/pull/*:refs/remotes/origin/pr/* # timeout=30
Checking out Revision 9e5da46e719b86cc4e3679562cdf4bfa386c4b1c (refs/remotes/origin/pr/219/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9e5da46e719b86cc4e3679562cdf4bfa386c4b1c
 > git rev-list 9e5da46e719b86cc4e3679562cdf4bfa386c4b1c # timeout=10

It is trying to checkout 9e5da46e719b86cc4e3679562cdf4bfa386c4b1c, which is not the same as the parent job of the matrix. This seems like quite a significant bug.

@directhex
Copy link
Contributor

@martyngigg ghprb is innocent, it's an upstream GitHub bug, as far as my analysis goes.

If you still have the workspaces from your failed builds, try "git log" on your build nodes - the tip of the branch they're on should be an auto-generated commit merging the branch and master. This commitid is in refs/remotes/origin/pr/*/merge.

There will be a discrepancy between what your successful nodes believe this commit should be, and what the failures think it should be. If you check the timestamp on the merge commits, they will differ - even though all nodes in the matrix job started simultaneously & git fetch'd simultaneously.

I filed a support issue with GitHub late last week, but have had nothing back on the topic.

@martyngigg
Copy link

@directhex Thanks for the update. I had spotted the differing commits but not the timestamps.

Hopefully they will respond soon.

@jpsim
Copy link

jpsim commented Mar 16, 2015

Has anyone found a workaround for this issue? It's been a major problem for us whenever rebasing (which happens fairly often).

@leandro-lucarella-sociomantic
Copy link
Author

On Mon, Mar 16, 2015 at 10:12:59AM -0700, JP Simard wrote:

Has anyone found a workaround for this issue? It's been a major
problem for us whenever rebasing (which happens fairly often).

I'm not having this problem often anymore, but when it happens usually
removing the last job results (and maybe a previous job too) and then
re-triggering the job (via a message in github) fixes the problem for
me.

@jpsim
Copy link

jpsim commented Mar 16, 2015

"Rebuild Last" always succeeds, but manually having to trigger builds kind of defeats the purpose of continuous integration.

@directhex
Copy link
Contributor

I still haven't heard anything from Github, including when I put out feelers via internal contacts. The problem is at their end.

@mskutin
Copy link

mskutin commented Jun 9, 2015

Yeah, I've the same issue and it doesn't depend of git version.
@jpsim Totally agree with you, "workaround" is manually clicking "Rebuild Last" button and definitely it's killing CI process.

@DeepDiver1975
Copy link

@janinko can we add a config switch to disable the whole merge mechanism?

As in we execute the job on the plain branch and leave it to the developers to rebase the branch if needed.

Thanks a lot!

@DavidTanner
Copy link
Collaborator

Check the readme, but there is an environment variable that gives you the original commit, not the test merge one.

@DeepDiver1975
Copy link

@DavidTanner I assume you are talking about this?

If you want to use the actual commit in the pull request, use ${ghprbActualCommit} instead of ${sha1}

Will this help with respect to this issue and NOT try to perform the merge at all?

@DavidTanner
Copy link
Collaborator

The merge happens because you created a pull request on a GitHub. This plugin just tells the git client what commit/branch/etc to use when it does the checkout portion.

@DeepDiver1975
Copy link

Thanks David. I take that as a no then. Too bad - these bloody issue is almost rendering this plugin unusable. :-(

@directhex
Copy link
Contributor

It's a problem only with matrix jobs.

Here's what happens:

  • "master" job of matrix gets executed, which determines the sha1 of the target commit (i.e. it turns pr/XXX/merge into abcdef1)
  • the matrix slaves get scheduled to build this sha1sum
  • the matrix slaves eventually execute the build

The race condition here is that because the PR merge commit is ephemeral (i.e. it gets recreated every time there's a commit to either the PR, or the repo the PR targets). If there's any commit between when the matrix master running and the matrix slaves running (e.g. if one of your matrix axis slaves is overloaded so the scheduling takes some time), the problem occurs because the commitid that the matrix master saw no longer exists by the time the matrix slaves fetch the repo.

The "easy" fix is to make sure matrix slaves build the merge ref, not the sha1. Or, as we did, don't use matrix jobs at all.

@DeepDiver1975
Copy link

Great input @directhex thx

That easy fix has to be implemented within the plugin? Or is this something we can somehow configure already as if today?

@DavidTanner
Copy link
Collaborator

@directhex you have it almost correct.
The plugin doesn't actually do any git work, it just pulls the pull request from the GitHub api. The data returned has a boolean flag that says wether the request is mergeable or not.

If it is mergeable, then ${sha1} = pr/###/merge, else ${sha1} = abcdef1, where abcdef1 is sha of last commit to the pull request when we queried the api.

If ${sha1} is pr/###/merge, then the actual sha hash is resolved by the jenkins git plugin when checking out the workspace.

final String commitSha = cause.isMerged() ? "origin/pr/" + cause.getPullID() + "/merge" : cause.getCommit();
values.add(new StringParameterValue("sha1", commitSha)); 

@MikeMcQuaid
Copy link

The "easy" fix is to make sure matrix slaves build the merge ref, not the sha1. Or, as we did, don't use matrix jobs at all.

Nice debugging, folks. In my situation I have to use matrix builds. Is there any way of working around this currently by tweaking settings or will it need a bugfix in a future ghprb release? Thanks!

@thughes
Copy link

thughes commented Jun 17, 2015

Just to clarify, it sounds like the underlying issue with matrix builds is that GitHub force pushes to the pr/<id>/merge ref when a new commit is added to the pull request. As a result, it's possible that the merge SHA1 that was resolved by ghprb will no longer be available for any git repos that hadn't fetched the original merge commit (e.g., a matrix job on a slave, etc).

Further expanding the code snippet from ghprb in the earlier commit, we have:

        final String commitSha = cause.isMerged() ? "origin/pr/" + cause.getPullID() + "/merge" : cause.getCommit();
        values.add(new StringParameterValue("sha1", commitSha));
        values.add(new StringParameterValue("ghprbActualCommit", cause.getCommit()));

This makes it look like you could avoid this problem by using ${ghprbActualCommit} instead of ${sha1}, since the former will be an actual commit in the pull request branch as opposed to an ephemeral commit that could be force-pushed over. Note that this could have other side effects in that the pull request branch could be build, but once merged into the target branch, the build could fail (since you're not building the merged branches). I think the only way to safely do a merged build would be to use the pull request commit and have ghprb do a local merge from the target branch. Note that for matrix builds to be consistent it would need to pass a SHA1 for the target branch as well so that any matrix jobs doing the local merge would use the exact same commits.

@childnode
Copy link

I'm experiencing a related issue on a jenkins and self hosted git repository too after rebasing the development tree, it's old tip tree-ish id is evaluated for checkout even I prune / delete workspace completely, including git force clone.

My speculation: The plugin seems to evaluate the last successfull build for getting the id to calculate a diff?

what is strange to me and what let me find a fix/workaround for me:
This is the log from the last stable build where tip of develop was @6bc8861 and swd was the given name for the remote

`Checking out Revision 6bc8861e2f988a8fe92b5dc4fa0a657982b98559 (swd/develop)
 > git.exe config core.sparsecheckout # timeout=10
 > git.exe checkout -f 6bc8861e2f988a8fe92b5dc4fa0a657982b98559
Using 'Changelog to branch' strategy.
fatal: bad revision '^origin/develop'
ERROR: Unable to retrieve changeset
�[8mha:AAAAWB+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=�[0mhudson.plugins.git.GitException: Error launching git whatchanged
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:775)
    at hudson.plugins.git.GitSCM.computeChangeLog(GitSCM.java:1121)
    at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1042)
    at hudson.scm.SCM.checkout(SCM.java:484)
    at hudson.model.AbstractProject.checkout(AbstractProject.java:1270)
    at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:609)
    at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:531)
    at hudson.model.Run.execute(Run.java:1741)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:374)
No emails were triggered.

this is the one from the failed one after rebasing and push forcing develop: 6bc8861 is no more the remote

Checking out Revision 6bc8861e2f988a8fe92b5dc4fa0a657982b98559 (swd/develop)
 > git.exe config core.sparsecheckout # timeout=10
 > git.exe checkout -f 6bc8861e2f988a8fe92b5dc4fa0a657982b98559
FATAL: Could not checkout null with start point 6bc8861e2f988a8fe92b5dc4fa0a657982b98559
�[8mha:AAAAWB+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=�[0mhudson.plugins.git.GitException: Could not checkout null with start point 6bc8861e2f988a8fe92b5dc4fa0a657982b98559
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1855)
    at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1033)
    at hudson.scm.SCM.checkout(SCM.java:484)
    at hudson.model.AbstractProject.checkout(AbstractProject.java:1270)
    at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:609)
    at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:531)
    at hudson.model.Run.execute(Run.java:1741)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:374)
Caused by: hudson.plugins.git.GitException: Command "git.exe checkout -f 6bc8861e2f988a8fe92b5dc4fa0a657982b98559" returned status code 128:
stdout: 
stderr: fatal: reference is not a tree: 6bc8861e2f988a8fe92b5dc4fa0a657982b98559

    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1591)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$500(CliGitAPIImpl.java:86)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1850)
    ... 10 more

I checked back the git repository on the jenkins server:

C:\.jenkins\workspace\Build_Player\rep>git branch -av
  remotes/swd/develop                                                                           aa43a90 comment on develop tip

The error fatal: bad revision '^origin/develop' in log from "green" build let me check up the job config again which leads me to the misconfiguration made, that the remote was set up with a given name "swd" but the branch specifier to make tests for where set up with "origin/develop" and "origin/feature*". Somehow it worked still fine with this misconfig ... up today.

So: I fixed up the given name / id for the remote and changed it to origin and: anything working again! Perhaps this might help someone too, still I'm not sure / I expect the root-cause is somewhere else.

unfortunatly: all branches are rebuild, even they were build with the "wrong" configuration

@mihelich
Copy link

The "easy" fix is to make sure matrix slaves build the merge ref, not the sha1.

We did exactly that to resolve this issue, which we were hitting probably a third(!) of the time. Patch against git (not ghprb) plugin 2.4.0:

diff --git a/src/main/java/hudson/plugins/git/GitSCM.java b/src/main/java/hudson/plugins/git/GitSCM.java
index aca465d..bbddc53 100644
--- a/src/main/java/hudson/plugins/git/GitSCM.java
+++ b/src/main/java/hudson/plugins/git/GitSCM.java
@@ -915,6 +915,7 @@ public class GitSCM extends GitSCMBackwardCompatibility {


         // every MatrixRun should build the same marked commit ID
+        /*
         if (build instanceof MatrixRun) {
             MatrixBuild parentBuild = ((MatrixRun) build).getParentBuild();
             if (parentBuild != null) {
@@ -926,6 +927,7 @@ public class GitSCM extends GitSCMBackwardCompatibility {
                 }
             }
         }
+        */

         // parameter forcing the commit ID to build
         if (candidates.isEmpty() ) {

The patch just bluntly disables the git plugin's special case logic for matrix builds, so each child will use pr/<id>/merge. It introduces a risk of child builds using inconsistent commits, but in practice hasn't caused us any problems.

I posted this also on JENKINS-26290, where hopefully a better fix can be found. I suspect an optimal solution would require changes to both plugins.

@jdart
Copy link

jdart commented Jan 20, 2016

Using ${ghprbActualCommit} instead of ${sha1} seems to be a decent work around for my team. We were having this issue using the MultiJob jenkins.

@adelyafatykhova
Copy link

Hi everyone. We switched to using ${ghprbActualCommit} but the problem still stands. The initial parent-child jobs works just fine, but on the second run the parent has the correct details but the child job fails from trying to checkout the wrong commit. In our case oftentimes there are new commits in the pull request while the first PR job is still running, hence the failure. Are there any further solutions/advice/experience for what to do in this situation?

@rmmh
Copy link

rmmh commented Sep 9, 2016

There's a separate, similar-looking bug in the Git plugin that causes a "reference is not a tree" failure, unrelated to matrix builds or push -f. There's a potential fix in JENKINS-2629.

The characteristic error is that the build is supposed to test "origin/pr/123/merge", and then proceeds to fail to checkout a different PR's branch, like "origin/pr/345/merge", resulting in fatal: reference is not a tree. It happens on retests of a merge commit, see the linked bug for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests