Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

14 UNAVAILABLE: GOAWAY received #620

Closed
WaldoJeffers opened this issue May 22, 2019 · 12 comments · Fixed by #795
Closed

14 UNAVAILABLE: GOAWAY received #620

WaldoJeffers opened this issue May 22, 2019 · 12 comments · Fixed by #795
Assignees
Labels
api: spanner Issues related to the googleapis/nodejs-spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@WaldoJeffers
Copy link
Contributor

WaldoJeffers commented May 22, 2019

Environment details

  • OS: Linux
  • Node.js version: 10
  • yarn version: 1.15
  • @google-cloud/spanner version: 3.1.0

Steps to reproduce

  1. Run a web service using nodejs-spanner
  2. Observe random 14 UNAVAILABLE: GOAWAY errors

It happens 1-5 times a day

The full stacktrace is

14 UNAVAILABLE: GOAWAY received Error: 14 UNAVAILABLE: GOAWAY received
    at Object.exports.createStatusError (/app/node_modules/grpc/src/common.js:91:15)
    at ClientReadableStream._emitStatusIfDone (/app/node_modules/grpc/src/client.js:233:26)
    at ClientReadableStream._receiveStatus (/app/node_modules/grpc/src/client.js:211:8)
    at Object.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:1272:15)
    at InterceptingListener._callNext (/app/node_modules/grpc/src/client_interceptors.js:568:42)
    at InterceptingListener.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:618:8)
    at Object.onReceiveStatus (/app/node_modules/grpc-gcp/build/src/index.js:93:25)
    at InterceptingListener._callNext (/app/node_modules/grpc/src/client_interceptors.js:568:42)
    at InterceptingListener.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:618:8)
    at /app/node_modules/grpc/src/client_interceptors.js:1029:24

Interestingly, we had a very similar error (13 INTERNAL: GOAWAY) when using node-spanner v2.x. We were hoping to see this error disappear by upgrading, which it did, but it apparently got replaced by this one ^^

It might be related to #234 or to any of these issues https://github.com/grpc/grpc-node/issues?utf8=%E2%9C%93&q=is%3Aissue+GOAWAY+

@bcoe bcoe added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. needs more info This issue needs more information from the customer to proceed. and removed needs more info This issue needs more information from the customer to proceed. labels Jun 4, 2019
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Jun 4, 2019
@WaldoJeffers
Copy link
Contributor Author

Hello, any news on this? It's still happening every day. Is there anything I can do which would help understanding the issue?

@bcoe
Copy link
Contributor

bcoe commented Jun 25, 2019

@AVaksman any thoughts on this one?

@AVaksman
Copy link
Contributor

AVaksman commented Jul 3, 2019

It is a transient error gRPC - 14.
This error is auto retryable by default (if autoRetry is not manually disabled on the client, link)
Recommendation would be to try increasing maxRetries value on the client or to implement a retry logic in your application.

@AVaksman
Copy link
Contributor

AVaksman commented Jul 9, 2019

The GOAWAY could be caused by the keepalive settings. If add GRPC_TRACE=all and GRPC_VERBOSITY=ERROR environment variable, you may get more information about the error that can help determine the root cause.

@WaldoJeffers
Copy link
Contributor Author

Hello @AVaksman , thanks for the suggestion, I'll add those env variables right away. I'm not sure when I'll be able to post an update on this, as we get those errors really erratically.

Here's a screenshot for the last 30 days:
image

@WaldoJeffers
Copy link
Contributor Author

Hello again @AVaksman , sorry if that's a stupid question, but where would the extra logs appear?
We had the error again today, and I can't seem to find any new information :/

This is from StackDriver:
image

@Duncan00
Copy link

We got similar problem, will try to increase maxRetries to see whether can resolve.
https://googleapis.dev/nodejs/spanner/latest/global.html#ClientConfig

@bcoe
Copy link
Contributor

bcoe commented Oct 21, 2019

@WaldoJeffers 👋 I would expect the detailed logging to show up as console.info level logs in your stackdriver logging.

Have you been continuing to see these issues over the past couple months?

@rzeng95
Copy link

rzeng95 commented Oct 22, 2019

Hi! If 14 UNAVAILABLE: GOAWAY is considered transient, should it be added to the list of retryable status codes here?
https://github.com/googleapis/nodejs-spanner/blob/v4.2.0/src/transaction-runner.ts#L29

And if it isn't considered retryable, how can we best handle these errors within runTransactionAsync? Thanks!

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Nov 18, 2019
@WaldoJeffers
Copy link
Contributor Author

@bcoe Sorry for the late reply. Yes I can confirm we are still seeing very regularly across all our services, which use different versions of the nodejs-spanner library. Could this error be linked to specific versions of the library, or is it unrelated?

@bcoe
Copy link
Contributor

bcoe commented Jan 9, 2020

looping in @olavloite as well, who's been doing a lot of work on the library recently.

@WaldoJeffers I would definitely suggest you try to get on the latest version of Node.js spanner, if possible, there have been a variety of fixes to gRPC over the past two months, so the version can definitely have an impact.

@olavloite
Copy link
Contributor

@WaldoJeffers After some digging and debugging I think I've found the reason for this. It occurs if the backend returns an UNAVAILABLE error for a streaming RPC before that streaming RPC has returned a resume token. The Node client only retries a streaming RPC if it has already received a resume token, but it is also possible to retry the streaming RPC before it has received a resume token by restarting the entire stream. I'll submit a fix in the coming days.

Note: A streaming RPC in this case basically means any call that executes a SQL statement, as the Spanner client uses the RPC executeStreamingSql for both queries and DML statements. The only exception is the execution of batched DML statements, which uses a separate RPC.

@olavloite olavloite self-assigned this Jan 10, 2020
olavloite added a commit to olavloite/nodejs-spanner that referenced this issue Jan 12, 2020
The streaming call executeStreamingSql is not automatically retried by gax, as
the gapic configuration for the call does not specify any error codes that should
automatically be retried. Instead, the PartialResultStream is responsible for
retrying these calls with the appropriate resume token. Until now, the call was
only retried when a valid resume token had been seen for the stream, meaning that
if the initial call failed with a retryable error code (e.g. UNAVAILABLE), the
stream would fail with this error. This fix ensures that the call is also retried
when the error occurs for the initial call or before the stream has returned a
valid resume token.

Fixes googleapis#620.
olavloite added a commit to olavloite/nodejs-spanner that referenced this issue Jan 12, 2020
The streaming call executeStreamingSql is not automatically retried by gax, as
the gapic configuration for the call does not specify any error codes that should
automatically be retried. Instead, the PartialResultStream is responsible for
retrying these calls with the appropriate resume token. Until now, the call was
only retried when a valid resume token had been seen for the stream, meaning that
if the initial call failed with a retryable error code (e.g. UNAVAILABLE), the
stream would fail with this error. This fix ensures that the call is also retried
when the error occurs for the initial call or before the stream has returned a
valid resume token.

Fixes googleapis#620.
olavloite added a commit to olavloite/nodejs-spanner that referenced this issue Jan 14, 2020
The streaming call executeStreamingSql is not automatically retried by gax, as
the gapic configuration for the call does not specify any error codes that should
automatically be retried. Instead, the PartialResultStream is responsible for
retrying these calls with the appropriate resume token. Until now, the call was
only retried when a valid resume token had been seen for the stream, meaning that
if the initial call failed with a retryable error code (e.g. UNAVAILABLE), the
stream would fail with this error. This fix ensures that the call is also retried
when the error occurs for the initial call or before the stream has returned a
valid resume token.

Fixes googleapis#620.
olavloite added a commit that referenced this issue Jan 15, 2020
fix: retry executeStreamingSql when error code is retryable

The streaming call executeStreamingSql is not automatically retried by gax, as the gapic configuration for the call does not specify any error codes that should automatically be retried. Instead, the PartialResultStream is responsible for retrying these calls with the appropriate resume token. Until now, the call was only retried when a valid resume token had been seen for the stream, meaning that if the initial call failed with a retryable error code (e.g. UNAVAILABLE), the stream would fail with this error. This fix ensures that the call is also retried when the error occurs for the initial call or before the stream has returned a valid resume token.

Fixes #620.
AVaksman pushed a commit to AVaksman/nodejs-spanner that referenced this issue Jan 21, 2020
…is#795)

fix: retry executeStreamingSql when error code is retryable

The streaming call executeStreamingSql is not automatically retried by gax, as the gapic configuration for the call does not specify any error codes that should automatically be retried. Instead, the PartialResultStream is responsible for retrying these calls with the appropriate resume token. Until now, the call was only retried when a valid resume token had been seen for the stream, meaning that if the initial call failed with a retryable error code (e.g. UNAVAILABLE), the stream would fail with this error. This fix ensures that the call is also retried when the error occurs for the initial call or before the stream has returned a valid resume token.

Fixes googleapis#620.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/nodejs-spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants