Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedule reconnect in case of HTTP_GONE #1800

Merged
merged 6 commits into from
Oct 9, 2019

Conversation

honnix
Copy link
Contributor

@honnix honnix commented Oct 7, 2019

Reset resource version to null and schedule reconnect in case of HTTP_GHONE

This tries to address #1529

Reset resource version to null and schedule reconnect in case of
HTTP_GHONE
@centos-ci
Copy link

Can one of the admins verify this patch?

@honnix
Copy link
Contributor Author

honnix commented Oct 7, 2019

Couldn't figure out how this could fail PATCH request. 🤔

@rohanKanojia
Copy link
Member

ok to test

@rohanKanojia
Copy link
Member

@honnix : Not your fault. Our tests are flaky.

@rohanKanojia rohanKanojia added the changelog missing A line to changelog.md regarding the change is not added label Oct 7, 2019
@rohanKanojia rohanKanojia requested review from iocanel and oscerd October 7, 2019 09:53
@honnix
Copy link
Contributor Author

honnix commented Oct 7, 2019

I seems I should fix these test cases as they look very relevant.

@rohanKanojia
Copy link
Member

I meant integration tests, not unit tests

@honnix
Copy link
Contributor Author

honnix commented Oct 7, 2019

@rohanKanojia Yeah I'm working on it. Not very familiar with the code base, so I didn't find the test case.

@honnix honnix force-pushed the reconnect-on-http-gone branch from b93e136 to 22d79d1 Compare October 7, 2019 14:22
@honnix honnix changed the title schedule reconnect in case of HTTP_GONE (WIP) schedule reconnect in case of HTTP_GONE Oct 7, 2019
@honnix
Copy link
Contributor Author

honnix commented Oct 7, 2019

Now this makes me wondering whether watching from the beginning of history is a bad idea. Can someone point me how resourceVersion is set initially before creating a WatchConnectionManager?

@@ -92,55 +92,38 @@ public void onClose(KubernetesClientException cause) {}
}

@Test
public void testOutdated() throws InterruptedException {
logger.info("testOutdated");
public void testHttpErrorReconnectOutdatedAndModified() throws InterruptedException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not named properly and doesn't seem to be picked up by junit.

@honnix honnix changed the title (WIP) schedule reconnect in case of HTTP_GONE schedule reconnect in case of HTTP_GONE Oct 8, 2019
@honnix
Copy link
Contributor Author

honnix commented Oct 8, 2019

Maybe it is ok because when this happens, old resource versions got wiped out from history.

@honnix
Copy link
Contributor Author

honnix commented Oct 8, 2019

@rohanKanojia PTAL again. Thanks.

@rohanKanojia
Copy link
Member

@honnix : Could you please add a line to CHANGELOG regarding this change?

@honnix
Copy link
Contributor Author

honnix commented Oct 9, 2019

@rohanKanojia Sure. I saw that label but didn't realise I should update it. Will do.

@rohanKanojia
Copy link
Member

[merge]

@wiggzz
Copy link

wiggzz commented Nov 5, 2019

Thanks - appreciate the quick turn around.

@honnix
Copy link
Contributor Author

honnix commented Nov 5, 2019

Likewise, thanks for pointing out the defect.

@rohanKanojia
Copy link
Member

@honnix @wiggzz : Thanks guys, I will cut a release soon this week and let your guys know

@honnix honnix mentioned this pull request Nov 20, 2019
12 tasks
@xiateapu
Copy link

I'm a novice, so.... could you please tell me when or why kubernetes api returns HTTP_GONE?

@wiggzz
Copy link

wiggzz commented Jan 14, 2020

I'm a novice, so.... could you please tell me when or why kubernetes api returns HTTP_GONE?

In the context of a watch, it will return HTTP_GONE when you ask to see changes for a resourceVersion that is too old - i.e. when it can no longer tell you what has changed since that version, since too many things have changed. In that case, you'll need to start again, by not specifying a resourceVersion in which case the watch will send you the current state of the thing you are watching and then send updates from that point.

@rohanKanojia
Copy link
Member

@xiateapu : ^^

@xiateapu
Copy link

Actually i had this problem as shown below:
image

It seems that resourceVersion has a big gap between last watched and current ?
So i'm very confused that why dose the too many things have changed happened?Can‘t kubernetes tell me these change in time?

@xiateapu
Copy link

I'm a novice, so.... could you please tell me when or why kubernetes api returns HTTP_GONE?

In the context of a watch, it will return HTTP_GONE when you ask to see changes for a resourceVersion that is too old - i.e. when it can no longer tell you what has changed since that version, since too many things have changed. In that case, you'll need to start again, by not specifying a resourceVersion in which case the watch will send you the current state of the thing you are watching and then send updates from that point.

Thanks for your answer ^ ^

@xiateapu
Copy link

@xiateapu : ^^

^ ^

@wind57
Copy link
Contributor

wind57 commented Jul 4, 2022

I'm a novice, so.... could you please tell me when or why kubernetes api returns HTTP_GONE?

In the context of a watch, it will return HTTP_GONE when you ask to see changes for a resourceVersion that is too old - i.e. when it can no longer tell you what has changed since that version, since too many things have changed. In that case, you'll need to start again, by not specifying a resourceVersion in which case the watch will send you the current state of the thing you are watching and then send updates from that point.

what about an informer? ResourceEventHandler has a default method onNothing. Its documentation says the same thing :

Called after an empty list is retrieved on start or after an HTTP GONE when the Store is empty

I am asking this in the context of spring-cloud-kubernetes project. Previously, we had used Watches for detecting changes in config maps/secrets. I am trying to move to informers now. So our code is like this at the moment:

					@Override
					public void onClose(WatcherException exception) {
						log.warn("ConfigMaps watch closed", exception);
						Optional.ofNullable(exception).map(e -> {
							log.debug("Exception received during watch", e);
							return exception.asClientException();
						}).map(KubernetesClientException::getStatus).map(Status::getCode)
								.filter(c -> c.equals(HttpURLConnection.HTTP_GONE)).ifPresent(c -> watch());
					}

If I move this to informers, it seems that I should handle this in the same manner, something like:

				@Override
				public void onNothing() {
					boolean isStoreEmpty = informer.getStore().list().isEmpty();
					if(!isStoreEmpty) {
						// HTTP_GONE, thus re-inform
						inform();
					}
				}

I am a bit lost (due to my limited knowledge) between "this should be handled", or an informer (like a watcher) will know what to do on its own.

Can some of the pros comment on this please? thank you.

@manusa
Copy link
Member

manusa commented Jul 4, 2022

Hi @wind57
There have been a few issues on this repo about Spring Cloud Kubernetes and the HTTP Gone problem. With more recent versions of the client, this should be less of a problem since watchers are started using bookmarks (if the cluster supports it).

In any case, as suggested in some of the reported issues, Spring Cloud should migrate from using raw Watchers to Informers.

Informers are self-healing, so as a client consumer you should not worry about reconnecting, and so on. You can use the onNothing event to log or monitor these incidents, but you shouldn't do any error-recovery stuff.

For your use-case, you can consider Informers as improved Watchers which take care of all the stuff for you.

@wind57
Copy link
Contributor

wind57 commented Jul 4, 2022

thank you @manusa. this is exactly what I am trying to do currently, move to informers, it's almost ready For the time being I will comment onNothing and see if we get any reports in the future.

As usual, much appreciate your answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog missing A line to changelog.md regarding the change is not added
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants