Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambassador should report being unhealthy when unable to ensure its configuration is not stale. #854

Closed
jvosantos opened this issue Oct 5, 2018 · 2 comments
Labels
stale Issue is stale and will be closed

Comments

@jvosantos
Copy link

Please describe your use case / problem.
When an ambassador pod is unable to communicate with the Kubernetes API to monitor service changes, the ambassador pod continues healthy and in service, even though its configuration might be stale/out of date. After some time it should report as being unhealthy as to attempt some sort of self heal (restarting, becoming unready for some time, etc). If a new mapping appears or is changed, the faulty ambassador pod will not be working as it should causing a possible outage.

Log from an ambassador pod not being able to communicate with Kubernetes API and still in service

2018-10-02 09:44:36 kubewatch 0.35.2 ERROR: could not watch for Kubernetes service changes
Traceback (most recent call last):
  File "/ambassador/kubewatch.py", line 517, in main
    watch_loop(restarter)
  File "/ambassador/kubewatch.py", line 418, in watch_loop
    for evt in watched:
  File "/usr/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 122, in stream
    resp = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 14358, in list_service_for_all_namespaces
    (data) = self.list_service_for_all_namespaces_with_http_info(**kwargs)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 14455, in list_service_for_all_namespaces_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 321, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 342, in request
    headers=headers)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in GET
    query_params=query_params)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Tue, 02 Oct 2018 09:44:36 GMT', 'Content-Length': '186'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"resourceVersion: Invalid value: \\"None\\": strconv.ParseUint: parsing \\"None\\": invalid syntax","code":500}\n'

Describe the solution you'd like
I would like the liveness or readiness probe of ambassador deployment to report as unhealthy when it can't communicate with the Kubernetes API.

Describe alternatives you've considered
The ambassador pod can also consider this a fatal error and simply exit, causing the pod to be restarted.

@jvosantos jvosantos changed the title Ambassador should report being unhealthy when unable to assure its configuration is not stale. Ambassador should report being unhealthy when unable to ensure its configuration is not stale. Oct 5, 2018
@samuela
Copy link
Contributor

samuela commented Dec 25, 2018

I'm seeing these errors constantly on GKE and it's certainly troubling. I'm not sure why ambassador is making a request that results in a 500 error in the first place. We're on ambassador version 0.40.2 and kubernetes master version 1.11.3-gke.23.

@stale
Copy link

stale bot commented Jun 5, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issue is stale and will be closed label Jun 5, 2019
@stale stale bot closed this as completed Jun 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issue is stale and will be closed
Projects
None yet
Development

No branches or pull requests

2 participants