Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry when SocketException with some sleep #674

Merged
merged 10 commits into from
Jan 24, 2020
Merged

Conversation

v1v
Copy link
Contributor

@v1v v1v commented Jan 23, 2020

Description

Fixes #539

Use the sleep approach as stated in #373

Edit from @bitwiseman:

Before this change I'm not sure that the retry mechanism was very effective. If an exception is thrown from "getResultCode", we can reuse the same HttpURLConnection and just try again. But if the exception is throw later, the connection will keep the same status and retry will not accomplish what was intended.
Instead of trying to get fancy about figuring that out, this change creates a new connection for each retry. It also moves all fetches to include the same code path including this retry mechanism, so no matter how we get the data it has the retry feature.

Questions

  • How do I test it?
    I've been trying different approaches to test this particular scenario, the main issue is regarding the retryInvalidCached404Response method, that's called from the _fetch method in the Requester class since it does validate the response and therefore I'm not sure how I can mock/reproduce the timeout.
        // I did mock a connection reset only for testing purpose. and changed the condition to be
        // either instanceof SocketException or instanceof SocketTimeoutException 
        this.apiServer().stubFor(get(urlMatching(".*test/timeout"))
                .willReturn(aResponse().withFault(Fault.CONNECTION_RESET_BY_PEER)));

  • Is connection reset by peer a valid use case to retry?

Before submitting a PR:

We love getting PRs, but we hate asking people for the same basic changes every time.

  • Push your changes to a branch other than master. Create your PR from that branch.
  • Add JavaDocs and other comments
  • Write tests that run and pass in CI. See CONTRIBUTING.md for details on how to capture snapshot data.
  • Run mvn -P ci install site locally. This may reformat your code, commit those changes. If this command doesn't succeed, your change will not pass CI.

@bitwiseman
Copy link
Member

bitwiseman commented Jan 23, 2020

@v1v
Thanks for working on this fix!

How do I test it?

You can change retryInvalidCached404Response to have this:

int responseCode = 0;
try {
    uc.getResponseCode();
catch (Exception e) {
}

Then for the mocking, you'll need use the scenario feature of WireMock. It looks like this.

https://github.com/jenkinsci/github-branch-source-plugin/blob/7179854f068f0012b3d1222f6924e8de5d437513/src/test/java/org/jenkinsci/plugins/github_branch_source/GitHubSCMSourceTest.java#L110-L138

Do note you don't have do all this in code, you can record json files and construct scenarios that way too. Do whatever makes the most sense to you.

Finally, if it comes down to it, you could make some of the private methods package accessible and then use Mockito to mock/or spy them. I'm open to ideas.

Is connection reset by peer a valid use case to retry?

I'm not sure. What is your opinion?

@bitwiseman
Copy link
Member

And you can change retryInvalidCached404Response to have this:

int responseCode = 0;
try {
    uc.getResponseCode();
catch (Exception e) {
}

@v1v
Copy link
Contributor Author

v1v commented Jan 23, 2020

Awesome! Thanks for the tips, I think I did manage to have something in place, not sure if the UTs I just coded are good enough, the stacktrace when running locally already shows the retries to work as expected

[INFO] Running org.kohsuke.github.TimeoutRetryTest
Jan 23, 2020 9:06:22 PM org.kohsuke.github.Requester parse
INFO: timed out accessing  http://localhost:65123/repos/github-api-test-org/github-api/branches/test/timeout. Sleeping 500 milliseconds before retrying... ; will try 2 more time(s)
java.net.SocketException: Connection reset
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
...
Jan 23, 2020 9:06:23 PM org.kohsuke.github.Requester parse
INFO: timed out accessing  http://localhost:65123/repos/github-api-test-org/github-api/branches/test/timeout. Sleeping 500 milliseconds before retrying... ; will try 1 more time(s)
java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:210)
....
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.042 s - in org.kohsuke.github.TimeoutRetryTest

Is connection reset by peer a valid use case to retry?
I'm not sure. What is your opinion?

I think being more defensive might help when the third party systems don't behave as expected. So I'd prefer to retry again even if there is a reset by peer issue or something else.

A couple of questions:

  • I didn't read the guideline, so should I close this particular PR and open a new one that uses a different branch? For some reason, I did miss that particular branch creation :_(
  • Any clues what's the test errors related to?
[ERROR] Errors: 
[ERROR]   GitHubCachingTest.OkHttpConnector_Cache_MaxAgeDefault_Zero_GitHubRef_Error:142 » GHFileNotFound
[ERROR]   GitHubCachingTest.OkHttpConnector_Cache_MaxAgeDefault_Zero_GitHubRef_Error:143 » GHFileNotFound

@@ -87,6 +87,9 @@ protected void initializeServers() {
@Override
protected void before() {
super.before();
this.apiServer()
Copy link
Member

@bitwiseman bitwiseman Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather you do this in the test class. Override before and add this code there.

@@ -923,7 +939,11 @@ private void retryInvalidCached404Response() throws IOException {
// scenarios. If GitHub ever fixes their issue and/or begins providing accurate ETags to
// their 404 responses, this will result in at worst two requests being made for each 404
// responses. However, only the second request will count against rate limit.
int responseCode = uc.getResponseCode();
int responseCode = 0;
Copy link
Member

@bitwiseman bitwiseman Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks. Take a look at #678. When you get a chance. You don't have to change this, just meant as an FYI.

@bitwiseman
Copy link
Member

@v1v
Thanks for this PR. The tests found some odd behavior in the 404 retry. That's what the GitHubCachingTest failures were. I cleaned up the code path - now all fetches go through one code path by wrapping all supplier calls in in socket exception handling. I also expanded on your test.

Take a look at my changes and tell me what you think.

@bitwiseman bitwiseman marked this pull request as ready for review January 24, 2020 06:22
@v1v
Copy link
Contributor Author

v1v commented Jan 24, 2020

@bitwiseman thanks for your support and guide. I've just reviewed the changes and they do cover all the other cases when there could be a timeout so it looks fabulous. I actually learned how to mock the scenario with retry and success. 💯

@bitwiseman bitwiseman merged commit aeb5e5f into hub4j:master Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Response code -1 and response message null when making github API calls
2 participants