Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Incident] m2lines hub unavailable for a short period of time #1248

Closed
5 tasks
yuvipanda opened this issue Apr 26, 2022 · 6 comments
Closed
5 tasks

[Incident] m2lines hub unavailable for a short period of time #1248

yuvipanda opened this issue Apr 26, 2022 · 6 comments
Assignees

Comments

@yuvipanda
Copy link
Member

Summary

From https://2i2c.freshdesk.com/a/tickets/125

Around 4:40 ET, everything stopped. I refreshed my browser and just saw a plain text message "service unavailable". This lasted for about 5 minutes. Then it came back. The notebook and cluster were still alive. I carried on.

In the end, it was not that big of a problem. But it was very disruptive. All of the new hub users who are about to hit this hub will get very frustrated if they have the same experience.

Impact on users

Important information

  • Hub URL: {{ INSERT HUB URL HERE }}
  • Support ticket ref: {{ INSERT SUPPORT REF HERE }}

Tasks and updates

  • Discuss and address incident, leaving comments below with updates
  • Incident has been dealt with or is over
  • Copy/paste the after-action report below and fill in relevant sections
  • Incident title is discoverable and accurate
  • All actionable items in report have linked GitHub Issues
After-action report template
# After-action report

These sections should be filled out once we've resolved the incident and know what happened.
They should focus on the knowledge we've gained and any improvements we should take.

## Timeline

_A short list of dates / times and major updates, with links to relevant comments in the issue for more context._

All times in {{ most convenient timezone}}.

- {{ yyyy-mm-dd }} - [Summary of first update](link to comment)
- {{ yyyy-mm-dd }} - [Summary of another update](link to comment)
- {{ yyyy-mm-dd }} - [Summary of final update](link to comment)


## What went wrong

_Things that could have gone better. Ideally these should result in concrete
action items that have GitHub issues created for them and linked to under
Action items._

- Thing one
- Thing two

## Where we got lucky

_These are good things that happened to us but not because we had planned for them._

- Thing one
- Thing two

## Follow-up actions

_Every action item should have a GitHub issue (even a small skeleton of one) attached to it, so these do not get forgotten. These issues don't have to be in `infrastructure/`, they can be in other repositories._

### Process improvements

1. {{ summary }} [link to github issue]
2. {{ summary }} [link to github issue]

### Documentation improvements

1. {{ summary }} [link to github issue]
2. {{ summary }} [link to github issue]

### Technical improvements

1. {{ summary }} [link to github issue]
2. {{ summary }} [link to github issue]
@yuvipanda
Copy link
Member Author

Looking at logs with k -n prod logs hub-7b469478f4-5qlb7 -c hub --previous, I see that the hub pod restarted with an error message I've never actually seen before ever:

ERROR:asyncio:Exception in callback JupyterHub.initialize.<locals>.log_init_time(<Task finishe...piException()>) at /usr/local/lib/python3.8/dist-packages/jupyterhub/app.py:2542
handle: <Handle JupyterHub.initialize.<locals>.log_init_time(<Task finishe...piException()>) at /usr/local/lib/python3.8/dist-packages/jupyterhub/app.py:2542>
Traceback (most recent call last):
  File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2543, in log_init_time
    n_spawners = f.result()
  File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2245, in init_spawners
    spawner = user.spawners[orm_spawner.name]
  File "/usr/local/lib/python3.8/dist-packages/jupyterhub/user.py", line 186, in __getitem__
    self[key] = self.spawner_factory(key)
  File "/usr/local/lib/python3.8/dist-packages/jupyterhub/user.py", line 361, in _new_spawner
    spawner = spawner_class(**spawn_kwargs)
  File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/spawner.py", line 210, in __init__
    self._start_watching_pods()
  File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/spawner.py", line 2291, in _start_watching_pods
    return self._start_reflector(
  File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/spawner.py", line 2249, in _start_reflector
    self.__class__.reflectors[key] = ReflectorClass(
  File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/reflector.py", line 206, in __init__
    self.start()
  File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/reflector.py", line 378, in start
    self._list_and_update()
  File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/reflector.py", line 227, in _list_and_update
    initial_resources = getattr(self.api, self.list_method_name)(**kwargs)
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api/core_v1_api.py", line 15302, in list_namespaced_pod
    return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api/core_v1_api.py", line 15413, in list_namespaced_pod_with_http_info
    return self.api_client.call_api(
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/rest.py", line 239, in GET
    return self.request("GET", url,
  File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/rest.py", line 233, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '685c57d6-7409-49c5-b38d-913a4123f222', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '', 'X-Kubernetes-Pf-Prioritylevel-Uid': '', 'Date': 'Tue, 26 Apr 2022 20:42:31 GMT', 'Content-Length': '593'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \\"system:serviceaccount:prod:hub\\" cannot list resource \\"pods\\" in API group \\"\\" in the namespace \\"prod\\": RBAC: [clusterrole.rbac.authorization.k8s.io \\"system:service-account-issuer-discovery\\" not found, clusterrole.rbac.authorization.k8s.io \\"system:basic-user\\" not found, clusterrole.rbac.authorization.k8s.io \\"system:public-info-viewer\\" not found, clusterrole.rbac.authorization.k8s.io \\"system:discovery\\" not found]","reason":"Forbidden","details":{"kind":"pods"},"code":403}\n'

[E 2022-04-26 20:42:31.271 JupyterHub app:2989]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2986, in launch_instance_async
        await self.initialize(argv)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2558, in initialize
        await gen.with_timeout(
      File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
        self._context.run(self._callback, *self._args)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2543, in log_init_time
        n_spawners = f.result()
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2245, in init_spawners
        spawner = user.spawners[orm_spawner.name]
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/user.py", line 186, in __getitem__
        self[key] = self.spawner_factory(key)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/user.py", line 361, in _new_spawner
        spawner = spawner_class(**spawn_kwargs)
      File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/spawner.py", line 210, in __init__
        self._start_watching_pods()
      File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/spawner.py", line 2291, in _start_watching_pods
        return self._start_reflector(
      File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/spawner.py", line 2249, in _start_reflector
        self.__class__.reflectors[key] = ReflectorClass(
      File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/reflector.py", line 206, in __init__
        self.start()
      File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/reflector.py", line 378, in start
        self._list_and_update()
      File "/home/jovyan/.local/lib/python3.8/site-packages/kubespawner/reflector.py", line 227, in _list_and_update
        initial_resources = getattr(self.api, self.list_method_name)(**kwargs)
      File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api/core_v1_api.py", line 15302, in list_namespaced_pod
        return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
      File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api/core_v1_api.py", line 15413, in list_namespaced_pod_with_http_info
        return self.api_client.call_api(
      File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 348, in call_api
        return self.__call_api(resource_path, method,
      File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 180, in __call_api
        response_data = self.request(
      File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 373, in request
        return self.rest_client.GET(url,
      File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/rest.py", line 239, in GET
        return self.request("GET", url,
      File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/rest.py", line 233, in request
        raise ApiException(http_resp=r)
    kubernetes.client.exceptions.ApiException: (403)
    Reason: Forbidden
    HTTP response headers: HTTPHeaderDict({'Audit-Id': '685c57d6-7409-49c5-b38d-913a4123f222', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '', 'X-Kubernetes-Pf-Prioritylevel-Uid': '', 'Date': 'Tue, 26 Apr 2022 20:42:31 GMT', 'Content-Length': '593'})
    HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \\"system:serviceaccount:prod:hub\\" cannot list resource \\"pods\\" in API group \\"\\" in the namespace \\"prod\\": RBAC: [clusterrole.rbac.authorization.k8s.io \\"system:service-account-issuer-discovery\\" not found, clusterrole.rbac.authorization.k8s.io \\"system:basic-user\\" not found, clusterrole.rbac.authorization.k8s.io \\"system:public-info-viewer\\" not found, clusterrole.rbac.authorization.k8s.io \\"system:discovery\\" not found]","reason":"Forbidden","details":{"kind":"pods"},"code":403}\n'

@yuvipanda
Copy link
Member Author

Looking at previous hub logs to that (via https://cloudlogging.app.goo.gl/eodshAcsprGxppAp8), I see:

    urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.63.240.1', port=443): Max retries exceeded with url: /api/v1/namespaces/prod/pods?fieldSelector=&labelSelector=component%3Dsingleuser-server (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f46bf39b430>: Failed to establish a new connection: [Errno 111] Connection refused'))

This started around 2022-04-26 13:38:37.964 PDT and basically went on until 2022-04-26 13:42:31.272 PDT, which is exactly the time window @rabernat reported the issue on.

So it looks like the k8s master was dead for those times, but somehow this also affected the proxy / nginx - otherwise, only the hub itself is supposed to be unavailable when the k8s master is dead.

@yuvipanda
Copy link
Member Author

Similar set of log error messages around the same time in the proxy

0:42:21.596 [ConfigProxy] error: 503 GET /user/rabernat/dask/clusters socket hang up
20:42:21.599 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:21.602 [ConfigProxy] error: 503 GET /user/rabernat/api/metrics/v1 socket hang up
20:42:21.603 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:25.320 [ConfigProxy] error: 503 GET /user/rabernat/api/sessions socket hang up
20:42:25.321 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:25.324 [ConfigProxy] error: 503 GET /user/rabernat/api/kernels socket hang up
20:42:25.325 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:26.392 [ConfigProxy] error: 503 GET /user/rabernat/api/terminals socket hang up
20:42:26.394 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:27.513 [ConfigProxy] error: 503 GET /user/rabernat/api/contents socket hang up
20:42:27.515 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:28.936 [ConfigProxy] error: 503 GET /user/rabernat/lab/tree/Untitled1.ipynb socket hang up
20:42:28.938 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:29.318 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:29.319 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:31.542 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:31.543 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:33.123 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:33.124 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:33.292 [ConfigProxy] error: 503 GET /user/rabernat/lab/tree/Untitled1.ipynb socket hang up
20:42:33.295 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:33.403 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:33.408 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:35.223 [ConfigProxy] error: 503 GET /user/rabernat/lab/tree/Untitled1.ipynb socket hang up
20:42:35.224 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:35.310 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:35.311 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:36.508 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:36.509 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:37.317 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:37.318 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:43.145 [ConfigProxy] error: 503 GET / connect ECONNREFUSED 10.63.248.192:8081
20:42:43.146 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:43.604 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:43.606 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:54.792 [ConfigProxy] error: 503 GET / connect ECONNREFUSED 10.63.248.192:8081
20:42:54.795 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:55.250 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:55.251 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:57.630 [ConfigProxy] error: 503 GET / connect ECONNREFUSED 10.63.248.192:8081
20:42:57.631 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:57.766 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:57.767 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:59.447 [ConfigProxy] error: 503 GET / connect ECONNREFUSED 10.63.248.192:8081
20:42:59.448 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:42:59.531 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:42:59.533 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:43:07.727 [ConfigProxy] error: 503 GET / connect ECONNREFUSED 10.63.248.192:8081
20:43:07.729 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}
20:43:07.866 [ConfigProxy] error: 503 GET /favicon.ico connect ECONNREFUSED 10.63.248.192:8081
20:43:07.867 [ConfigProxy] error: Failed to get custom error page: Error: connect ECONNREFUSED 10.63.248.192:8081
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.63.248.192',
  port: 8081
}

@yuvipanda
Copy link
Member Author

@rabernat do you know which page gave you 'service unavailable'? Was it the hub home page or the specific notebook you were on?

@yuvipanda
Copy link
Member Author

This just bumps up the priority of #1102 - that makes the k8s master far more robust (at some cost), but probably worth it here.

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Apr 27, 2022
Less prone to k8s API failure this way, although it costs about
70$ a month

Ref 2i2c-org#1248
Ref 2i2c-org#1102
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Apr 27, 2022
Less prone to k8s API failure this way, although it costs about
70$ a month

Ref 2i2c-org#1248
Ref 2i2c-org#1102
@yuvipanda
Copy link
Member Author

Fixed by #1251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

1 participant