-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alts: Fix TsiSocket doWrite on short writes #15962
Conversation
Signed-off-by: yihuaz <[email protected]>
21ac793
to
1647c44
Compare
@antoniovicente F.Y.I. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for taking on improvements to how alts TsiSocket::doWrite interacts with buffer watermarks. First round of comments to improve my understanding of what the code is doing and some potential edge cases that I noticed.
tsi_result status = frame_protector_->protect(buffer, raw_write_buffer_); | ||
ENVOY_CONN_LOG(debug, "TSI: protected buffer left: {} result: {}", callbacks_->connection(), | ||
buffer.length(), tsi_result_to_string(status)); | ||
return repeatProtectAndWrite(buffer, end_stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the raw_write_buffer_.length() > 0 code below this early return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need it to send handshake data to its peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but think that it could be moved to the "if (!handshake_complete_) {" branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
while (true) { | ||
uint64_t bytes_to_drain_this_iteration = | ||
prev_bytes_to_drain_ > 0 ? prev_bytes_to_drain_ | ||
: std::min(buffer.length(), actual_frame_size_to_use_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does actual_frame_size_to_use_ include framing overhead or not?
We need bytes excluding overhead in order to emit a single protected frame instead of a large frame followed by a tiny frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It includes the framing overhead. You are right. When computing bytes_to_drain_this_iteration, we should exclude the overhead from actual_frame_size_to_use_.
ENVOY_CONN_LOG(debug, "TSI: protecting buffer size: {}", callbacks_->connection(), | ||
bytes_to_protect_this_iteration.length()); | ||
tsi_result status = | ||
frame_protector_->protect(bytes_to_protect_this_iteration, raw_write_buffer_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing the protect method to accept a string_view with the input bytes to avoid copying into bytes_to_protect_this_iteration just to copy again inside protect as it populates the grpc slice.
I think that ExternallyManagedSlice could be used instead of UnmanagedMemorySlice to save another copy during protect, but we can sort that out later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Yeah, it does not make sense to make bytes copy twice.
// Short write. Exit. | ||
if (raw_write_buffer_.length() > 0) { | ||
prev_bytes_to_drain_ = | ||
prev_bytes_to_drain_ > 0 ? prev_bytes_to_drain_ : bytes_to_drain_this_iteration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that when prev_bytes_to_drain_ > 0 the following is true:
prev_bytes_to_drain_ == bytes_to_drain_this_iteration
this could be: prev_bytes_to_drain_ = bytes_to_drain_this_iteration;
A comment would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice catch. We can simply do prev_bytes_to_drain_ = bytes_to_drain_this_iteration;
ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(), | ||
raw_write_buffer_.length(), end_stream); | ||
result = raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); | ||
total_bytes_written += result.bytes_processed_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total_bytes_written should be tracking bytes_to_drain_this_iteration, not the encrypted bytes.
Updates to total_bytes_written should happen around the buffer.drain below and look something like:
total_bytes_written += bytes_to_drain_this_iteration;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged.
Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false}; | ||
|
||
while (true) { | ||
uint64_t bytes_to_drain_this_iteration = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an odd case that merits special consideration:
prev_bytes_to_drain_ == 0 && raw_write_buffer_.length() > 0
I think this can happen in the case where doHandshake adds bytes to raw_write_buffer_ but also completes the handshake. When this happens, the protect call below is skipped, but bytes_to_drain_this_iteration ends up being > 0, which could result in bytes in the input buffer being discarded without being sent.
Ways to detect:
ASSERT((prev_bytes_to_drain_ == 0) == (raw_write_buffer_.length() == 0);
ASSERT(prev_bytes_to_drain_ >= buffer.length());
ASSERT(buffer.length() >= bytes_to_drain_this_iteration)
before the call to buffer.drain()
further down
Possibly adding an ASSERT
to OwnedImpl::drainImpl
to detect attempts to drain more bytes than are in the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per this code, I think we have a guarantee that raw_write_buffer_.length() is 0 when entering doWrite() for the first time after handshake completes. Also, it does not make sense that a peer wants to send non-handshake data without first confirming if the handshake completes successfully. In other words, during handshake, peer A will send whatever it receives from peer B to the handshake service in order to get the bytes to send to peer B. Here, peer A will not concatenate any non-handshake data to the data received from peer B, and send them to the handshake service because peer A has not received any confirmation from the handshake service that handshake completes successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw_write_buffer_ will be non-empty after the doWrite just after handshake completes if that doWrite results in a partial write. I know this may be really unlikely but it is possible for raw_write_buffer_ to be non-empty after handshake completes. I think it also true that the peer that completes the handshake first will need to do a write after handshake completes locally to provide the remote peer the information it needs to complete its own handshake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible solution: branch on raw_write_buffer_.length() > 0
instead of prev_bytes_to_drain_
.
When raw_write_buffer_.length() > 0
then bytes_to_drain_this_iteration = prev_bytes_to_drain_
and we should attempt a write even if bytes_to_drain_this_iteration
is 0 which would happen if the bytes in the buffer are handshake bytes.
When raw_write_buffer_.length() > 0
then bytes_to_drain_this_iteration = std::min(buffer.length(), max_unprotected_frame_size_)
. If >0, attempt to protect and write those bytes, else break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a slightly different approach by introducing a new field - prev_handshake_bytes_to_drain_
that indicates if we need to drain handshake data before doing regular protect+write operations. PTAL.
Signed-off-by: yihuaz <[email protected]>
cc09a39
to
b9d5778
Compare
Signed-off-by: yihuaz <[email protected]>
f2a42d0
to
a104741
Compare
bytes_to_drain_this_iteration, tsi_result_to_string(status)); | ||
} | ||
|
||
// Write raw_write_buffer_ to nework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling: nework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false}; | ||
|
||
while (true) { | ||
uint64_t bytes_to_drain_this_iteration = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw_write_buffer_ will be non-empty after the doWrite just after handshake completes if that doWrite results in a partial write. I know this may be really unlikely but it is possible for raw_write_buffer_ to be non-empty after handshake completes. I think it also true that the peer that completes the handshake first will need to do a write after handshake completes locally to provide the remote peer the information it needs to complete its own handshake.
tsi_result status = frame_protector_->protect(buffer, raw_write_buffer_); | ||
ENVOY_CONN_LOG(debug, "TSI: protected buffer left: {} result: {}", callbacks_->connection(), | ||
buffer.length(), tsi_result_to_string(status)); | ||
return repeatProtectAndWrite(buffer, end_stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but think that it could be moved to the "if (!handshake_complete_) {" branch.
@@ -12,7 +12,7 @@ namespace Alts { | |||
namespace { | |||
// Include 4 bytes frame message type and 16 bytes tag length. | |||
// It is consistent with gRPC ALTS zero copy frame protector implementation. | |||
static constexpr int FrameOverheadSize = 0; | |||
static constexpr int FrameOverheadSize = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of max_unprotected_frame_size_ and setMaxUnprotectedFrameSize seems to make testing for this overhead constant difficult. Or at least that's the impression I get from no tests needing update when changing this overhead value. max_unprotected_frame_size_ is read in a single location in repeatProtectAndWrite, consider changing that to "actual_frame_size_to_use_ - FrameOverheadSize"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not use a hard-coded FrameOverheadSize
within the code because in the test we used fake frame protector which has a different frame overhead size (i.e., 4 bytes). The code is updated. PTAL.
Signed-off-by: yihuaz <[email protected]>
4581f5a
to
db79a59
Compare
? prev_handshake_bytes_to_drain_ | ||
: (prev_bytes_to_drain_ > 0 | ||
? prev_bytes_to_drain_ | ||
: std::min(buffer.length(), actual_frame_size_to_use_ - frame_overhead_size_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expression seems very complex. Consider the following instead:
- ASSERT(prev_handshake_bytes_to_drain_ == 0) when entering repeatProtectAndWrite
- in TsiSocket::doWrite, check if prev_handshake_bytes_to_drain_ > 0 and if so attempt to flush the buffer and only call repeatProtectAndWrite if the flush was successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice suggestion. Done.
if (raw_write_buffer_.length() > 0) { | ||
ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(), | ||
raw_write_buffer_.length(), end_stream); | ||
return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No bytes from buffer were consumed so bytes written should be reported as being 0 regardless of what doWrite on the raw socket says.
Also, in cases where handshake completes successfully, it would be ideal to fall through and execute repeatProtectAndWrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
Regarding fall through, I do not think it is going to happen because ALTS TSI is an async implementation which means in doHandshake(), it calls next() and immediately returns, which leaves handshake_complete_ being still false. When the response from handshake service comes back, handshake_complete_ will be set to true if handshake completes successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add ASSERT(!handshake_complete_) after the doHandshake() call with an explanation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: yihuaz <[email protected]>
41e78c1
to
a4c3230
Compare
@@ -327,18 +313,32 @@ Network::IoResult TsiSocket::repeatProtectAndWrite(Buffer::Instance& buffer, boo | |||
} | |||
|
|||
Network::IoResult TsiSocket::doWrite(Buffer::Instance& buffer, bool end_stream) { | |||
Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: result is never used without it being written by a raw_buffer_socket_->doWrite. Consider removing this local variable and instead declaring it when doWrite is called.
Network::IoResult result = raw_buffer_socket_->doWrite(...);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (raw_write_buffer_.length() > 0) { | ||
ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(), | ||
raw_write_buffer_.length(), end_stream); | ||
return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add ASSERT(!handshake_complete_) after the doHandshake() call with an explanation?
raw_write_buffer_.length(), end_stream); | ||
return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); | ||
// Check if we need to flush outstanding handshake bytes. | ||
if (prev_handshake_bytes_to_drain_ > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There are no reads for the exact number of bytes in prev_handshake_bytes_to_drain_, just wherever or not it is > 0.
Consider using a boolean instead to indicate that raw_write_buffer_ contains handshake data that needs to be flushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
raw_write_buffer_.length(), end_stream); | ||
return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); | ||
// Check if we need to flush outstanding handshake bytes. | ||
if (prev_handshake_bytes_to_drain_ > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ASSERT(raw_write_buffer_.length() > 0) since prev_handshake_bytes_to_drain_ > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if (raw_write_buffer_.length() > 0) { | ||
ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(), | ||
raw_write_buffer_.length(), end_stream); | ||
result = raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coverage report shows that this branch is not covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice finding! It turns out this code path should not be triggered at all. Because in an async TSI, the buffer to be sent to peer (i.e., raw_write_buffer_) should be filled by the handshake service and sent to the peer in the callback function (i.e., doHandshakeNextDone) upon finishing the process of handshake request. After scheduling a handshake request, we do not know if raw_write_buffer_ is ready to be sent to its peer so should not attempt the network write.
EXPECT_CALL(*server_.raw_socket_, doWrite(_, false)) | ||
.WillOnce(Invoke([&](Buffer::Instance& buffer, bool) { | ||
Network::IoResult result = {Network::PostIoAction::KeepOpen, 4, false}; | ||
server_to_client_.add(buffer.linearize(0), 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think 0 is a valid argument to linearize. I think you need linearize(4) to match the number of bytes you're adding.
Alternatively, consider:
server_to_client_.move(buffer, 4);
which is a nicer version of add(..., 4) followed by drain(4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice suggestion. The code is much simpler now.
Signed-off-by: yihuaz <[email protected]>
Sorry for the late response. All comments are addressed. PTAL. |
// Envoy ALTS implements asynchronous tsi_handshaker_next() interface | ||
// which returns immediately after scheduling a handshake request to | ||
// the handshake service. The handshake response will be handled by a | ||
// dedicated thread in a seperate API within which handshake_complete_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing format checks due to spell error.
seperate -> separate
Signed-off-by: yihuaz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thanks @antoniovicente for the detailed review! I am wondering if there is anything else remaining to be addressed before this PR gets merged? |
Signed-off-by: yihuaz <[email protected]> Signed-off-by: Gokul Nair <[email protected]>
Signed-off-by: yihuaz <[email protected]> Signed-off-by: Gokul Nair <[email protected]>
This PR resolves the third issue described in #15296.