-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THRIFT-5233: Handle I/O timeouts in go library #2181
Conversation
6ed2cfc
to
e598828
Compare
} | ||
// For anything else, don't retry | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for loop is how we implement context deadline check for TBinaryProtocol, as the first read in ReadMessageBegin
is ReadI32
which calls readAll
.
} | ||
// For anything else, don't retry | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for loop is how we implement context deadline check in TCompactProtocol.
} | ||
// For anything else, do not retry | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for loop is how we implement context deadline check in THeaderProtocol, as this is the first read ReadMessageBegin (if t.reedReadFrame
returned false above, then the actual ReadMessageBegin
will call the underlying TBinaryProtocol.ReadMessageBegin or TCompactProtocol.ReadMessageBegin, which already handled it).
@@ -64,6 +64,10 @@ func (p *tTransportException) Unwrap() error { | |||
return p.err | |||
} | |||
|
|||
func (p *tTransportException) Timeout() bool { | |||
return p.typeId == TIMED_OUT | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also newly added to make isTimeoutError
implementation easier (so it does not need to try to check for TTransportException and unwrap it).
@dcelasun OK the CI is "passing" now so it's ready for review. I marked the actual logical changes above, everything else are basically just for propagating context. |
I think you missed the client side :)
|
e598828
to
4dca38e
Compare
Fixed. Sorry I missed that one. We totally ignored those |
4dca38e
to
7e91865
Compare
Just realized that I also need to fix |
7e91865
to
a19c240
Compare
Client: go As discussed in the JIRA ticket, this commit changes how we handle I/O timeouts in the go library. This is a breaking change that adds context to all Read*, Write*, and Skip functions to TProtocol, along with the compiler change to support that, and also adds context to TStandardClient.Recv, TDeserializer, TStruct, and a few others. Along with the function signature changes, this commit also implements context cancellation check in the following TProtocol's ReadMessageBegin implementations: - TBinaryProtocol - TCompactProtocol - THeaderProtocol In those ReadMessageBegin implementations, if the passed in context object has a deadline attached, it will keep retrying the I/O timeout errors, until the deadline on the context object passed. They won't retry I/O timeout errors if the passed in context does not have a deadline attached (still return on the first error).
a19c240
to
e79f764
Compare
Fixed. I think that fixed all the issues, but please do let me know if I missed anything else. |
In my POV, this context argument should not be added, at least not the way it currently is.
|
@zerosnake0 I replied some of your points below. I would also suggest you to read the associated JIRA ticket, which provides slightly more background about what problem this change solves.
Unlike Go1, there's no backward compatibility guarantee in thrift library. If you take a look at https://github.com/apache/thrift/blob/master/CHANGES.md, breaking changes are added in every release.
|
|
Resolving a real issue sounds like a fitting case of "necessary", won't you think? In the JIRA ticket me and Can discussed how to implement it, including the alternative of doing it on transport level ( On hindsight yes the loop retry fits more naturally in TTransport, but some problems of that are:
|
I suggest you take a look at the golang official net/http implementation before any furthur discussion |
If I may, "go read net/http" is not a productive attitude. You are arguing against a change that was already discussed, agreed to and merged. That's fine, it's even welcome. But the onus is on you to convince us there is a better way and that we should revert this. You didn't even address @fishy's first and second point. |
I surely read the first point and I agree with the "rarer implementation point" so I said nothing. Same for the 3rd point. And why i mentioned net/http because there is something similar for this kind of issue and I'm not sure if you know it. You quote my phrase in just three word "go read net/http" when I typed such a long phrase. I apologize if there is any word misusage. What's more, this commit is related with many PR done by @fishy (for example, for the connectivity check), that's why I want to post here and discuss with him before doing anything |
@zerosnake0 so my guess is by "net/http" you mean client.go and transport.go. Yes there are things we can learn from there, but there are also a lot of key differences between thrift's transport and http transport:
Both options are breaking changes to TTransport, both options were already discussed in the JIRA ticket, and the reason we picked the current implemented option over them can be found there as well. If we completely rewrite the whole thrift go library now, we probably will choose an implementation similar to how http library does over something we have right now. But we are not doing that complete rewrite and the current implementation is the least breaking one, as far as we could tell. If you have a better approach, I would love to see something more concrete from it (e.g. code/PR, or some more detailed design on how to do it). The connectivity check change happened before this change, and is not depending on this change, btw. |
Sorry I didn't specify which file to look yesterday, I mean more to server.go, in the readRequest method the approch is more like the 2.b that you pointed. Personally I think 2.b will be more reasonable because we need to set timeout specifically every time the IO reads/writes the request/response Another reason is that currently the only reason cancels the context is the disconnection, which will surely affect all your IO operations after. I will try to work something out and let you know |
This is not true. The feature this change implemented is mainly for clients to use (servers don't care about read timeouts unless they don't intent to support long connections/client pools). It's very common for clients to set a timeout in the context object. |
@fishy I have a couple of clarification questions about the expected behavior of the timeouts. Sorry if I misunderstood something.
Thanks! |
|
Client: go
As discussed in the JIRA ticket, this commit changes how we handle I/O
timeouts in the go library.
This is a breaking change that adds context to all Read*, Write*, and
Skip functions to TProtocol, along with the compiler change to support
that, and also adds context to TStandardClient.Recv, TDeserializer,
TStruct, and a few others.
Along with the function signature changes, this commit also implements
context cancellation check in the following TProtocol's ReadMessageBegin
implementations:
In those ReadMessageBegin implementations, if the passed in context
object has a deadline attached, it will keep retrying the I/O timeout
errors, until the deadline on the context object passed. They won't
retry I/O timeout errors if the passed in context does not have a
deadline attached (still return on the first error).
[skip ci]
anywhere in the commit message to free up build resources.