Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash: toKubernetesError encountered an unknown error unexpectedly during Kub... #5217

Closed
stilist opened this issue Oct 9, 2023 · 6 comments · Fixed by #5326
Closed

Crash: toKubernetesError encountered an unknown error unexpectedly during Kub... #5217

stilist opened this issue Oct 9, 2023 · 6 comments · Fixed by #5326

Comments

@stilist
Copy link

stilist commented Oct 9, 2023

Crash report

Error message

Failed processing Build type=container name=asgard (took 431.41 sec). This is what happened:
Error: getaddrinfo ENOTFOUND aks-[hash].[uuid].privatelink.westus2.azmk8s.io
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:84:26)
    at GetAddrInfoReqWrap.callbackTrampoline (node:internal/async_hooks:130:17)

What did you do?

garden deploy --sync=monolith

I think I had previously run garden cleanup ns, because k9s didn't show anything in my namespace.

Your environment

  • OS: macOS Sonoma (14.0)
  • How I'm running Kubernetes: Azure AKS
  • Garden version: 0.13.16

Frequency

I've only seen this happen once.

Workaround

DNS resolution worked on redeploy.

Additional context

I doubt there's anything you can do about DNS issues, but I think it would be good to have an error handler for this issue.

@stefreak
Copy link
Member

stefreak commented Oct 9, 2023

Thank you for reporting this @stilist

I think it might make sense to just generally handle all instances of OSErrnoException in toGardenError and convert them to a new kind of GardenError, e.g. SystemCallError or something like that; Otherwise it will be impossible to eliminate these kinds of crashes completely as there are so many possible failure modes for almost anything.

WDYT @TimBeyer ?

@stefreak
Copy link
Member

stefreak commented Oct 9, 2023

Hmm, the weird thing is that we already handle OSErrnoException in toKubernetesError. OK, we should definitely be able to fix this particular one but it will require some testing.

@stefreak
Copy link
Member

This line should catch this: https://github.com/garden-io/garden/blob/d460f3f0f4eadacc831c9d189ecc25fbf5239c5a/core/src/plugins/kubernetes/retry.ts#L119C22-L119C22

We need to find a reliable way to reproduce this, and use a debugger to find out why it doesn’t.

@stefreak
Copy link
Member

Likely we need to handle DNS errors (https://nodejs.org/api/dns.html#error-codes) and possibly also SystemError (https://nodejs.org/api/errors.html#class-systemerror)

stefreak added a commit that referenced this issue Oct 31, 2023
DNS error codes are not included in the OS error constants
(os.constants) / https://nodejs.org/docs/latest-v18.x/api/os.html#error-constants

This commit adds DNS error codes to the list of handled errno errors.

Fixes #5217
stefreak added a commit that referenced this issue Oct 31, 2023
DNS error codes are not included in the OS error constants
(os.constants) / https://nodejs.org/docs/latest-v18.x/api/os.html#error-constants

This commit adds DNS error codes to the list of handled errno errors.

Fixes #5217
stefreak added a commit that referenced this issue Oct 31, 2023
DNS error codes are not included in the OS error constants
(os.constants) / https://nodejs.org/docs/latest-v18.x/api/os.html#error-constants

This commit adds DNS error codes to the list of handled errno errors.

Fixes #5217
@stefreak
Copy link
Member

I do not know how to reliably reproduce this, but #5326 should fix this issue.

@stefreak
Copy link
Member

Came up with a simple unit test in 7f785f7 to confirm that it should fix the issue.

github-merge-queue bot pushed a commit that referenced this issue Nov 1, 2023
* fix: handle DNS error codes as ErrnoException

DNS error codes are not included in the OS error constants
(os.constants) / https://nodejs.org/docs/latest-v18.x/api/os.html#error-constants

This commit adds DNS error codes to the list of handled errno errors.

Fixes #5217

* test: isErrnoException

* test: remove unnecessary assertions

* fix(lint): remove unused import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants