-
Notifications
You must be signed in to change notification settings - Fork 184
Watch stream should handle HTTP error before unmarshaling event #57
Comments
We are having exactly the issue the you described: our watch gets a 401, it disconnects and reconnects automatically ending in an undesired status. Our case is:
Possible solutions:
Stack trace:
Note that the only place where resource_version is managed inside of the Which should be the option to fix this? I'd be willing to give a PR to fix the issue. |
The current watch client code is dynamic and assumes that API server always responds with correct API object to deserialize. When API server responds with I would suggest that we should check if the response is error code before deserializing, and reason about the proper behavior for different error code. |
Yeah we have the same issue with timeout from AWS ELB, resulting in "Invalid value for Was going to put in PR for unmarshal_event to raise TimeoutError so as to avoid too many changes affecting existing use cases: Line 82 or kubernetes/watch/watch.py so that any logging is a little more clearer than potentially looking like its an issue elsewhere depending on what is being watched. |
Raise an ApiException for error events that indicate a watch failure despite the HTTP response indicating success. Fixes kubernetes-client#57 Signed-off-by: Fabian Reinartz <[email protected]>
Raise an ApiException for error events that indicate a watch failure despite the HTTP response indicating success. Fixes kubernetes-client#57 Signed-off-by: Fabian Reinartz <[email protected]>
Raise an ApiException for error events that indicate a watch failure despite the HTTP response indicating success. Fixes kubernetes-client#57 Signed-off-by: Fabian Reinartz <[email protected]>
Raise an ApiException for error events that indicate a watch failure despite the HTTP response indicating success. Fixes kubernetes-client#57 Signed-off-by: Fabian Reinartz <[email protected]>
From API concepts:
It would be great if Python code could correctly support this automatically. |
Based on #102 I made an improved version:
This is almost what API concepts describe with one important difference: API concepts and go library provide a function called "list and watch" which ties listing and watching together. This package (to my knowledge) does not provide that. Because of that we have to throw an exception if the latest event is too old. Otherwise we could just retry the whole "list and watch". The problem with current approach (in my branch) is that if code was not able to resume automatically (because last event was too old) then just restarting watch with |
Is there any reason not adding while True:
resp = func(*args, **kwargs)
try:
for line in iter_resp_lines(resp):
yield self.unmarshal_event(line, return_type)
if self._stop:
break
except Exception as e:
pass
finally:
kwargs['resource_version'] = self.resource_version
resp.close()
resp.release_conn()
if timeouts or self._stop:
break |
I do not think so. The only reason why exception is thrown there is because we access wrong fields in the |
I made #133 which should fix this for good. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
I could be mistaken but looking at the infinite loop for the watch stream doesn't handle the case when you receive an event that is expired i.e. a HTTP status code of 410.
Looking at the code it seems that if the event is expired then resp should return something along the lines of
And
unmarshall_event
should fail to deserialize the object and break. Andself.resource_version
should just be stuck on the resource_version of the event that was expired.Am I missing something here?
The text was updated successfully, but these errors were encountered: