-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FailOnNonTempDialError doesn't fail #2266
Comments
To workaround this, I used a modified version of the blocking code in DialContext and ended up with using something similar to this instead of
|
My use case is I was connecting to a GRPC server via a custom dialer. There are certain types of errors which I know I want to fail immediately so I was hoping this would do the trick. My client is effectively a proxy with expensive connection setup so I was caching my grpc connection objects, and if the underlying connection immediately fails I'd rather not cache it in that case. I get that it's really difficult to determine if an error is permanent or not -- but presumably if someone sets FailOnNonTempDialError, they would expect failure to occur if a non-temporary failure occurs. You get what you asked for, right? And if FailOnNonTempDialError isn't set, then it should retry over and over again as it does now. |
It seems that what you need is to know the underlying connection error when a ClientConn is not working. Does that sound right? There's #2055 to return the latest connection error in the ClientConn, but we haven't finalized on how to do that. Let us know if this solution will work for you. |
Yes, that's about right. However, I also want it to fail fast -- basically, I want my workaround code above. I don't see how 2055 solves that. |
With #2055 or something similar, you can wait on the |
I think we can fix this so the old behavior keeps working -- i.e. FailOnNonTempDialError would be a fail-fast for blocking Dial calls. Blocking dials already poll connectivity status; we just need to check the last error when the state is transient failure and exit if it is of the non-temporary variety. |
Increasing priority to keep it on our radar since this was an unintentional behavior change. |
I am seeing this in containerd/containerd#2576 when attempting to connect to unix sockets that either do not exist or the user doesn't have the proper permissions to read/write. My expectation is that the dial attempt would fail immediately (aka fail fast) in these two scenarios but instead I get "context deadline exceeded". |
Pushed a potential fix for this in #2276 |
What version of gRPC are you using?
07ef407
What version of Go are you using (
go version
)?go version go1.10.3 darwin/amd64
What operating system (Linux, Windows, …) and version?
What did you do?
If possible, provide a recipe for reproducing the error.
What did you expect to see?
The documentation for FailOnNonTempDialError says:
I would expect that if a non-temporary error was returned from the dialer, I would see the dial immediately return.
What did you see instead?
Instead, it just prints
Fail
over and over again since it keeps trying to retry (the thing it's not supposed to do).Perhaps I'm misunderstanding the documentation?
The text was updated successfully, but these errors were encountered: