-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "deadline exceeded" issues discovered in conformance tests #643
Changes from 2 commits
ea2bb3e
07ab64b
d5965e8
39a162c
d42edba
199d17f
2711779
f895d34
e06e630
49f888c
7aa1d1f
76ee0c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -141,12 +141,17 @@ func (w *envelopeWriter) write(env *envelope) *Error { | |
prefix[0] = env.Flags | ||
binary.BigEndian.PutUint32(prefix[1:5], uint32(env.Data.Len())) | ||
if _, err := w.writer.Write(prefix[:]); err != nil { | ||
err = wrapIfContextError(err) | ||
if connectErr, ok := asError(err); ok { | ||
return connectErr | ||
} | ||
return errorf(CodeUnknown, "write envelope: %w", err) | ||
} | ||
if _, err := io.Copy(w.writer, env.Data); err != nil { | ||
err = wrapIfContextError(err) | ||
if connectErr, ok := asError(err); ok { | ||
return connectErr | ||
} | ||
return errorf(CodeUnknown, "write message: %w", err) | ||
} | ||
return nil | ||
|
@@ -235,6 +240,10 @@ func (r *envelopeReader) Read(env *envelope) *Error { | |
// add any alarming text about protocol errors, though. | ||
return NewError(CodeUnknown, err) | ||
} | ||
err = wrapIfContextError(err) | ||
if connectErr, ok := asError(err); ok { | ||
return connectErr | ||
} | ||
// Something else has gone wrong - the stream didn't end cleanly. | ||
if connectErr, ok := asError(err); ok { | ||
return connectErr | ||
|
@@ -252,7 +261,7 @@ func (r *envelopeReader) Read(env *envelope) *Error { | |
if r.readMaxBytes > 0 && size > int64(r.readMaxBytes) { | ||
_, err := io.CopyN(io.Discard, r.reader, size) | ||
if err != nil && !errors.Is(err, io.EOF) { | ||
return errorf(CodeUnknown, "read enveloped message: %w", err) | ||
return errorf(CodeResourceExhausted, "message is larger than configured max %d - unable to determine message size: %w", r.readMaxBytes, err) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was made consistent with the exact same logic that lives in the decompression code. Its handling (always returning "resource exhausted" since we know the message is too big) seemed better, and its error message more helpful, so I copy+pasted it here. |
||
} | ||
return errorf(CodeResourceExhausted, "message size %d is larger than configured max %d", size, r.readMaxBytes) | ||
} | ||
|
@@ -274,6 +283,10 @@ func (r *envelopeReader) Read(env *envelope) *Error { | |
readN, | ||
) | ||
} | ||
err = wrapIfContextError(err) | ||
if connectErr, ok := asError(err); ok { | ||
return connectErr | ||
} | ||
return errorf(CodeUnknown, "read enveloped message: %w", err) | ||
} | ||
env.Flags = prefixes[0] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,7 @@ import ( | |
"fmt" | ||
"net/http" | ||
"net/url" | ||
"os" | ||
"strings" | ||
|
||
"google.golang.org/protobuf/proto" | ||
|
@@ -292,6 +293,12 @@ func wrapIfContextError(err error) error { | |
if errors.Is(err, context.DeadlineExceeded) { | ||
return NewError(CodeDeadlineExceeded, err) | ||
} | ||
// Ick, some dial errors can be returned as os.ErrDeadlineExceeded | ||
// instead of context.DeadlineExceeded :( | ||
// https://github.com/golang/go/issues/64449 | ||
if errors.Is(err, os.ErrDeadlineExceeded) { | ||
return NewError(CodeDeadlineExceeded, err) | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Confirming: whenever we hit this case, the actual context has been cancelled - correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It cannot be guaranteed without also checking the context. However, the same is true of the check above:
So this check does not make the situation any worse IMO. And, if the referenced Go bug is fixed, it would behave exactly this way, even with only the The only way to guarantee that the context has been cancelled is to also pass in the context and check There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
While trying to create a repro test for the Go bug, I think I may have found a test formulation that can reveal this issue more consistently. It isn't pretty, since it's timing related, and different execution environments will have different timing that is likely to tickle it (10-core MBP vs. VM/container in CI infra). Let me put that together and verify it can repro this issue on main and then see what you think of it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just pushed a commit with a repro test. And... sadly... it fails on this branch. So there is some other place a deadline exceeded error is miscategorized as "unknown" 😢 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, I think I've now squashed all of the bugs. The upside is that the test seemed to fail pretty consistently without all of these fixes, so I think it will be a reasonably good way to identify issues in CI. But there are a couple of downsides:
See the last three commits for details:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
😬 Yikes! |
||
return err | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just for consistency/symmetry with the decompress code above, which directly calls
dst.ReadFrom
instead of usingio.Copy
.