Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-49757] Remove redundant fetch #904

Merged
merged 36 commits into from
Jul 2, 2020

Conversation

rishabhBudhouliya
Copy link
Contributor

@rishabhBudhouliya rishabhBudhouliya commented Jun 10, 2020

Pull Request-845

The concern of removing the second fetch call is a possible repository data loss or
misconfiguration in terms of the extra behaviors applied during checkout.

If the second fetch call is ignored, CleanBeforeCheckout option will also be ignored
as it doesn't implement the decorateCloneCommand which is used by the git fetch call.
To ensure that it is not ignored, decorateCloneCommand has been implemented for this
particular extension. The unit test GitSCMTest#testCleanBeforeCheckout doesn't fail
after the removal of second fetch call now.

Checklist

  • I have read the CONTRIBUTING doc
  • I have referenced the Jira issue related to my changes in one or more commit messages
  • I have added tests that verify my changes
  • Unit tests pass locally with my changes
  • I have added documentation as necessary
  • No Javadoc warnings were introduced with my changes
  • No spotbugs warnings were introduced with my changes
  • I have interactively tested my changes
  • Any dependent changes have been merged and published in upstream modules (like git-client-plugin)

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

rishabhBudhouliya and others added 14 commits February 25, 2020 02:52
My fork is 91 commits behind master
Co-authored-by: Francisco Javier Fernandez <[email protected]>
Co-authored-by: Francisco Javier Fernandez <[email protected]>
Co-authored-by: Francisco Javier Fernandez <[email protected]>
Co-authored-by: Francisco Javier Fernandez <[email protected]>
The build is supposed to fail as the narrow refspec only
fetches "foo" branch (honor initial refspec) and we are commiting in a branch
which doesnt exist for the project, the master branch.
The concern of removing the second fetch call is a possible repository data loss or
misconfiguration in terms of the extra behaviors applied during checkout.

If the second fetch call is ignored, CleanBeforeCheckout option will also be ignored
as it doesn't implement the decorateCloneCommand which is used by the git fetch call.
To ensure that it is not ignored, decorateCloneCommand has been implemented for this
particular extension. The unit test GitSCMTest#testCleanBeforeCheckout doesn't fail
after the removal of second fetch call now.
@rishabhBudhouliya
Copy link
Contributor Author

@MarkEWaite @fcojfernandez I created a separate PR for this as I think this addition will require separate discussion.

Also, I was thinking of creating separate branches cut from the fix branch (JENKINS-49757) which will represent the changes related to fix in order to achieve a non-breaking solution.

@rishabhBudhouliya
Copy link
Contributor Author

Also, I have to add unit tests confirming the before and after behavior because of this change.

@rishabhBudhouliya
Copy link
Contributor Author

The build is failing for a windows instance while it is passing on my local machine and linux instances of ci.jenkins.io.

Currently, I can't figure out the reason 4 test cases are failing. Any pointers on this would greatly help me.

@MarkEWaite MarkEWaite added the enhancement Improvement or new feature label Jun 12, 2020
@MarkEWaite
Copy link
Contributor

@rishabhBudhouliya thanks for your work on this one!

While staring at the code in the debugger, I realized that the problem is in the existing testCleanBeforeCheckout, not in your implementation of redundant fetch removal.

The test is incorrectly asserting that there should be a clean on the first build. A clean on the first build is not needed because the first build is assured of an empty workspace. The git plugin even has code which will delete files in the workspace before the first build.

The test should assert that there should be a clean on the second build.

The test failed on Windows for a particularly disconcerting reason. The CleanBeforeCheckout extension was being called before the first clone. Before the first clone, there is no git repository in the workspace. The call to git clean -xffd . was being made on a directory which did not contain a git repository. The build method was swallowing and hiding that exception and allowing the job to continue.

It was even more complicated because the tests run in the git-plugin/target directory. That means there is a git repository in one of the parent directories, and command line git happily searches up the parent directories to find that git repository.

We may want to consider setting GIT_CIELING_DIRECTORIES in our tests to avoid this type of surprise in the future.

See my proposed change in https://github.com/MarkEWaite/git-plugin/commits/CleanBeforeCheckout

@rishabhBudhouliya
Copy link
Contributor Author

I'm surprised on how I totally missed the very behavior of this extension!

There's still one thing I don't understand. Why is this happening to Windows alone? If git clean -xffd is being called before the first clone, it is bound to create an error on any platform?

I understand that it searches for a git repository in one of the parent directories and finds them, does it clean the working tree of that particular repository, if it does, why not in the case of Windows?

@MarkEWaite
Copy link
Contributor

MarkEWaite commented Jun 13, 2020 via email

@rishabhBudhouliya
Copy link
Contributor Author

Oh, I see. Adding GIT_CIELING_DIRECTORIES should be the right thing to do, I wonder if we would need additional tests to confirm this behavior.

Would you be willing to merge this commit to master or would you like me to add it to this PR? Both works fine for me.

@MarkEWaite
Copy link
Contributor

This PR still needs the fixes that I proposed in https://github.com/MarkEWaite/git-plugin/commits/CleanBeforeCheckout . Even with those fixes, I'm not ready to merge this until after git client plugin 3.3.0 has been released with JGit 5.8.0 and the other changes that are in the master branch.

I hope to release git client plugin 3.3.0 within the next week or 10 days so that it is available near the release of Jenkins 2.235.1 LTS.

The reason to add a "clean" before checkout behaviour is to clean the working directory
of a repository before checking out a branch. In this case, the remote repository is
being cloned for the first time which means there is no local git repository.
First build is known to have a clean workspace so there is no need to
clean it before checkout.

Second build should clean because the workspace might be cluttered
with extra files from the first build.

Also removes several useless operations in the test (related to buildLog)
@rishabhBudhouliya
Copy link
Contributor Author

rishabhBudhouliya commented Jun 24, 2020

@MarkEWaite @fcojfernandez As discussed on the Gitter channel, I have added a check on refspec and CloneOption to ensure user provided refspec is not missed by the new fix.

I have also interactively tested the new commit, I will share my findings during the GSoC weekly sync.

Comment on lines 1164 to 1168
for (RefSpec ref:initialFetchRefSpecs) {
if (!ref.toString().contains("refs/heads")) {
isDefaultRefspec = false; // if refspec is not of default type, preserve second fetch
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only one of the elements does not contain refs/heads then you are marking that is not a defaultRefspec. Could you explain what is your intention here when there is more than one element? Sorry but it's a bit unclear for me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If honor refspec for initial clone is disabled, the first clone is going to fetch all the branches and references related to them. All of those refspecs contain "refs/heads/xx" in the : format of refspec.

My assumption is that the given refspec contains ""refs/heads" this pattern, there is no need to call the second fetch in any case.

In the case of a narrow refspec which contains any references other than of branches, we will allow the second fetch call to take place, i.e, we will not avoid the redundant fetch call.
For an example, if honorRefSpec == false and refspec = "+refs/heads/master:refs/remotes/origin/master +refs/pull/553/head:refs/remotes/origin/pull/553", we can safely assume that this is not the default refspec because we have a "refs/pull" here.
In the case of finding multiple refspecs, if any refspec is other than the default format pattern, the code assumes that it is worth fetching with the second fetch call because it may have been missed by the first/initial clone.

src/main/java/hudson/plugins/git/GitSCM.java Outdated Show resolved Hide resolved
Copy link
Member

@fcojfernandez fcojfernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If honor refspec for initial clone is disabled, the first clone is going to fetch all the branches and references related to them. All of those refspecs contain "refs/heads/xx" in the : format of refspec.

My assumption is that the given refspec contains ""refs/heads" this pattern, there is no need to call the second fetch in any case.

In the case of a narrow refspec which contains any references other than of branches, we will allow the second fetch call to take place, i.e, we will not avoid the redundant fetch call.
For an example, if honorRefSpec == false and refspec = "+refs/heads/master:refs/remotes/origin/master +refs/pull/553/head:refs/remotes/origin/pull/553", we can safely assume that this is not the default refspec because we have a "refs/pull" here.
In the case of finding multiple refspecs, if any refspec is other than the default format pattern, the code assumes that it is worth fetching with the second fetch call because it may have been missed by the first/initial clone.

Thanks for the example! Now I understand well your code :)

}
}
// if initial fetch refspec contains "refs/heads/*" (default refspec), ignore the second fetch call
return removeSecondFetch;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if initialFetchRefSpecs is null, then you're removing the second fetch. At this moment, I cannot think about a situation when it will be null, but ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test that passed in a null refspec but the UserRemoteConfig constructor converts nulls to empty strings. I think that is unreachable code due to the UserRemoteConfig use of fixEmpty().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my assumptions, the need to add the null check was to check for empty refspecs, as Mark as correctly pointed out, the fixEmpty() converts any empty refspec into a null refspec.

If the UserRemoteConfig was not fixing empty refspecs, the logic presented by me would gladly pass empty refspecs as a narrow refspec and will not avoid the second fetch. But since it is there, it should not be a concern for us.

Comment on lines 1172 to 1176
if (!option.isHonorRefspec()) {
removeSecondFetch = isDefaultRefspec;
} else {
removeSecondFetch = true; // avoid second fetch call if honor refspec is enabled
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if clause didn't help me to see clearer what you meant. While having a "!" in an if clause is perfectly fine, the "!" in the if-else might cause confusion. I would have expected

if something is true
   sentence1
else
   sentence2

while this piece of code is

if something is not false
   sentence2
else
   sentence1

My personal preference is to have the first approach in terms of readability. But it's just a matter of personal taste, so take this advice as it is, just an advice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great suggestion @fcojfernandez. I have added this in c2e097d.

Reduce differences to master branch
One of the branches was not being reached by tests prior to this
change.
Also reuses the existing assertion to provide a new assertion.
Code was already assuming that argument is non-null.  The
annotation makes it explicit that the value must be non-null and
can be checked by spotbugs that it is non-null.
@MarkEWaite
Copy link
Contributor

@rishabhBudhouliya I was able to spend more time working through the code in a debugger to assure that the tests covered the exisitng branches in the new code. I added two new tests to increase the coverage of the cases we had detected during interactive testing.

I've confirmed that the UserRemoteConfig constructor will convert a null refspec into a non-null value. However, I did not feel confident enough in all paths to justify removing the null handling in the determine... method. I think it is great as is.

I consider this code ready to merge once the CI jobs confirm it passes tests. I'll plan to merge it within the next 24 hours unless someone else raises an objection.


/* Create a ref for the fake pull in the source repository */
String[] expectedResult = {""};
CliGitCommand gitCmd = new CliGitCommand(testRepo.git, "update-ref", "refs/pull/553/head", "HEAD");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While writing the code, I was thinking how would I proceed with writing a unit test for the same. The solution I had in my mind was to actually clone a remote the git client repository and fetch a pull request reference.

I didn't imagine to update the reference to point to a pull request. Thanks for showing me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I hadn't discovered the technique until just recently myself. Plenty to learn for all of us.

@rishabhBudhouliya
Copy link
Contributor Author

rishabhBudhouliya commented Jun 28, 2020

@MarkEWaite Thank you for adding two very important test cases. I have reviewed the changes and applied a small change in the if-else condition for the purpose of better readability.

While I couldn't write a test case like testRetainRedundantFetch(), I did in fact interactively test these cases:

  • Scenario 1: CloneOption with honor refspec is false

Test#1 -> refspec → +refs/heads/master:refs/remotes/origin/master +refs/pull/553/head:refs/remotes/origin/pull/553
Results -> Second fetch is not skipped.

Test#2 -> refspec → +refs/heads/master:refs/remotes/origin/master
Results -> Second fetch is skipped

Test#3 -> respec is null
Results -> second fetch is skipped

  • Scenario 2: CloneOption is not added by the user, it is null
    Same tests repeated with the same expected results.

if (random.nextBoolean()) {
/* Randomly enable shallow clone, should not alter test assertions */
CloneOption cloneOptionMaster = new CloneOption(false, null, null);
cloneOptionMaster.setDepth(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to understand, why would need a shallow clone to improve coverage when we already have a full normal clone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. One branch was not being reached when a cloneOption extension is detected but does not have honor refspec enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I didn't consider other CloneOptions. If honor refspec is false, with avoiding the second fetch, we will also avoid other clone options like Shallow Clone, Disable tags, Reference a repo and timeout.

The good news is that the first fetch is capable to execute all of these options if we miss the second fetch. It should not make any difference to the user's expectation on git repository information after the checkout.

@rishabhBudhouliya
Copy link
Contributor Author

@MarkEWaite https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fgit-plugin/detail/PR-904/25/pipeline this build is failing for a linux instance for some reason I can't figure out. The maven build has succeeded.

@MarkEWaite
Copy link
Contributor

@MarkEWaite https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fgit-plugin/detail/PR-904/25/pipeline this build is failing for a linux instance for some reason I can't figure out. The maven build has succeeded.

The ci.jenkins.io agents are provisioned by the EC2 plugin. Either the EC2 plugin or the agents have a reliability issue. They seem to randomly disconnect. We have a reconnect script that runs once a minute, but that is just a temporary technique until we can spend more time to identify and resolve the root problem.

@MarkEWaite
Copy link
Contributor

@MarkEWaite I have added the fixes you have proposed and merged the master branch as well.
I promised to help you with the new release but I failed in doing so.

Is there something I can do currently to help and to speed up the process?

That's great. I've built a version of the git plugin and placed it into my test infrastructure. The publicly visible portion of the test infrastructure is in my docker-lfs repository on the lts-with-plugins branch.

I had an unexpected set of pauses and stops from about 12:00 to 8:00 AM my time. GitHub was having some issues during that time, but I can't tell if that was related. I'll run additional tests in that environment over the next day or two.

I think you're doing exactly the right thing with your focus on preparing for your presentation this week. I hope to have completed my exploratory testing by the end of my working day tomorrow.

Copy link
Contributor

@MarkEWaite MarkEWaite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to add an "opt-out" global switch before release, but this has passed all my testing and multiple code reviews. Thanks @rishabhBudhouliya

@rishabhBudhouliya
Copy link
Contributor Author

We'll need to add an "opt-out" global switch before release, but this has passed all my testing and multiple code reviews. Thanks @rishabhBudhouliya

Thanks @MarkEWaite for merging the fix, I will work on the opt-out switch on priority and hopefully raise a pull request soon.

@MarkEWaite
Copy link
Contributor

Thanks @MarkEWaite for merging the fix, I will work on the opt-out switch on priority and hopefully raise a pull request soon.

In case it helps, refer to #924 as a good example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants