Ambassador should report being unhealthy when unable to ensure its configuration is not stale. #854

jvosantos · 2018-10-05T14:40:08Z

Please describe your use case / problem.
When an ambassador pod is unable to communicate with the Kubernetes API to monitor service changes, the ambassador pod continues healthy and in service, even though its configuration might be stale/out of date. After some time it should report as being unhealthy as to attempt some sort of self heal (restarting, becoming unready for some time, etc). If a new mapping appears or is changed, the faulty ambassador pod will not be working as it should causing a possible outage.

Log from an ambassador pod not being able to communicate with Kubernetes API and still in service

2018-10-02 09:44:36 kubewatch 0.35.2 ERROR: could not watch for Kubernetes service changes
Traceback (most recent call last):
  File "/ambassador/kubewatch.py", line 517, in main
    watch_loop(restarter)
  File "/ambassador/kubewatch.py", line 418, in watch_loop
    for evt in watched:
  File "/usr/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 122, in stream
    resp = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 14358, in list_service_for_all_namespaces
    (data) = self.list_service_for_all_namespaces_with_http_info(**kwargs)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 14455, in list_service_for_all_namespaces_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 321, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 342, in request
    headers=headers)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in GET
    query_params=query_params)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Tue, 02 Oct 2018 09:44:36 GMT', 'Content-Length': '186'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"resourceVersion: Invalid value: \\"None\\": strconv.ParseUint: parsing \\"None\\": invalid syntax","code":500}\n'

Describe the solution you'd like
I would like the liveness or readiness probe of ambassador deployment to report as unhealthy when it can't communicate with the Kubernetes API.

Describe alternatives you've considered
The ambassador pod can also consider this a fatal error and simply exit, causing the pod to be restarted.

The text was updated successfully, but these errors were encountered:

samuela · 2018-12-25T05:41:44Z

I'm seeing these errors constantly on GKE and it's certainly troubling. I'm not sure why ambassador is making a request that results in a 500 error in the first place. We're on ambassador version 0.40.2 and kubernetes master version 1.11.3-gke.23.

stale · 2019-06-05T08:43:34Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jvosantos changed the title ~~Ambassador should report being unhealthy when unable to assure its configuration is not stale.~~ Ambassador should report being unhealthy when unable to ensure its configuration is not stale. Oct 5, 2018

MattCollinge mentioned this issue Nov 11, 2018

ERROR: could not watch for Kubernetes service changes #928

Closed

stale bot added the stale Issue is stale and will be closed label Jun 5, 2019

stale bot closed this as completed Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ambassador should report being unhealthy when unable to ensure its configuration is not stale. #854

Ambassador should report being unhealthy when unable to ensure its configuration is not stale. #854

jvosantos commented Oct 5, 2018

samuela commented Dec 25, 2018

stale bot commented Jun 5, 2019

Ambassador should report being unhealthy when unable to ensure its configuration is not stale. #854

Ambassador should report being unhealthy when unable to ensure its configuration is not stale. #854

Comments

jvosantos commented Oct 5, 2018

samuela commented Dec 25, 2018

stale bot commented Jun 5, 2019