Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit error message when failing to create the health check stream due to failing per-RPC credentials calls #8030

Open
atollena opened this issue Jan 23, 2025 · 1 comment
Labels
Area: Client Includes Channel/Subchannel/Streams, Connectivity States, RPC Retries, Dial/Call Options and more. Type: Behavior Change Behavior changes not categorized as bugs Type: Feature New features or improvements in behavior

Comments

@atollena
Copy link
Collaborator

atollena commented Jan 23, 2025

Use case(s) - what problem will this feature solve?

When health check is enabled (via "healthCheckConfig": {"serviceName": ""} in the service config) and the client fails to create a stream, the channel remains in CONNECTING state and no error message is output. This makes the task of troubleshooting problems caused by creating the health check watch stream hard, as there are neither logs nor RPC errors, and RPC typically fail with deadline exceeded (if a deadline is set).

The code that swallows errors when creating the health check Watch stream is here:

grpc-go/health/client.go

Lines 74 to 78 in fbff2ab

rawS, err := newStream(healthCheckMethod)
if err != nil {
continue retryConnection
}
. Errors when creating a stream can come from:

  1. The transport is not ready or nil (

    grpc-go/balancer_wrapper.go

    Lines 362 to 365 in 38a8b9a

    if transport == nil {
    return nil, status.Errorf(codes.Unavailable, "SubConn state is not Ready")
    }
    and

    grpc-go/stream.go

    Lines 1233 to 1236 in e912015

    if t == nil {
    // TODO: return RPC error here?
    return nil, errors.New("transport provided is nil")
    }
    ) IIUC it doesn't apply in this case, since the health check watch is only started when the transport is ready
  2. A CallOption returns an error (I don't think this can happen here, since we don't provide the ability to customize call options for health checks)

    grpc-go/stream.go

    Lines 1253 to 1255 in e912015

    if err := o.before(c); err != nil {
    return nil, toRPCErr(err)
    }
  3. We fail to get codec or compressor for the call (again not applicable to health check since gRPC controls that part)
  4. We fail to get a transport or create a stream on the transport (

    grpc-go/stream.go

    Lines 353 to 358 in e912015

    if err := a.getTransport(); err != nil {
    return err
    }
    if err := a.newStream(); err != nil {
    return err
    }
    ). This can happen when we fail to create header fields, e.g. the call to GetRequestMetadata to get per-RPC credentials fails (that is the case I ran into).

Proposed Solution

Transition the subchannel to TRANSIENT_FAILURE if we failed to create the watch stream. Continue the retry loop.

Alternatives Considered

If changing the behavior of of health checks when per RPC credentials fail is not desirable, at least logging through channelz would be nice.

@atollena atollena added the Type: Feature New features or improvements in behavior label Jan 23, 2025
@atollena
Copy link
Collaborator Author

I provided an illustration for the problem in e10aa04.

@atollena atollena added Type: Behavior Change Behavior changes not categorized as bugs Area: Client Includes Channel/Subchannel/Streams, Connectivity States, RPC Retries, Dial/Call Options and more. labels Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Client Includes Channel/Subchannel/Streams, Connectivity States, RPC Retries, Dial/Call Options and more. Type: Behavior Change Behavior changes not categorized as bugs Type: Feature New features or improvements in behavior
Projects
None yet
Development

No branches or pull requests

1 participant