Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alts: Fix TsiSocket doWrite on short writes #15962

Merged
merged 7 commits into from
May 4, 2021

Conversation

yihuazhang
Copy link
Contributor

This PR resolves the third issue described in #15296.

@yihuazhang
Copy link
Contributor Author

@antoniovicente F.Y.I.

@antoniovicente antoniovicente self-assigned this Apr 13, 2021
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for taking on improvements to how alts TsiSocket::doWrite interacts with buffer watermarks. First round of comments to improve my understanding of what the code is doing and some potential edge cases that I noticed.

tsi_result status = frame_protector_->protect(buffer, raw_write_buffer_);
ENVOY_CONN_LOG(debug, "TSI: protected buffer left: {} result: {}", callbacks_->connection(),
buffer.length(), tsi_result_to_string(status));
return repeatProtectAndWrite(buffer, end_stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the raw_write_buffer_.length() > 0 code below this early return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need it to send handshake data to its peer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but think that it could be moved to the "if (!handshake_complete_) {" branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

while (true) {
uint64_t bytes_to_drain_this_iteration =
prev_bytes_to_drain_ > 0 ? prev_bytes_to_drain_
: std::min(buffer.length(), actual_frame_size_to_use_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does actual_frame_size_to_use_ include framing overhead or not?

We need bytes excluding overhead in order to emit a single protected frame instead of a large frame followed by a tiny frame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It includes the framing overhead. You are right. When computing bytes_to_drain_this_iteration, we should exclude the overhead from actual_frame_size_to_use_.

ENVOY_CONN_LOG(debug, "TSI: protecting buffer size: {}", callbacks_->connection(),
bytes_to_protect_this_iteration.length());
tsi_result status =
frame_protector_->protect(bytes_to_protect_this_iteration, raw_write_buffer_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing the protect method to accept a string_view with the input bytes to avoid copying into bytes_to_protect_this_iteration just to copy again inside protect as it populates the grpc slice.

I think that ExternallyManagedSlice could be used instead of UnmanagedMemorySlice to save another copy during protect, but we can sort that out later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yeah, it does not make sense to make bytes copy twice.

// Short write. Exit.
if (raw_write_buffer_.length() > 0) {
prev_bytes_to_drain_ =
prev_bytes_to_drain_ > 0 ? prev_bytes_to_drain_ : bytes_to_drain_this_iteration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that when prev_bytes_to_drain_ > 0 the following is true:
prev_bytes_to_drain_ == bytes_to_drain_this_iteration

this could be: prev_bytes_to_drain_ = bytes_to_drain_this_iteration;

A comment would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice catch. We can simply do prev_bytes_to_drain_ = bytes_to_drain_this_iteration;

ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(),
raw_write_buffer_.length(), end_stream);
result = raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));
total_bytes_written += result.bytes_processed_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_bytes_written should be tracking bytes_to_drain_this_iteration, not the encrypted bytes.

Updates to total_bytes_written should happen around the buffer.drain below and look something like:
total_bytes_written += bytes_to_drain_this_iteration;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged.

Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false};

while (true) {
uint64_t bytes_to_drain_this_iteration =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an odd case that merits special consideration:

prev_bytes_to_drain_ == 0 && raw_write_buffer_.length() > 0

I think this can happen in the case where doHandshake adds bytes to raw_write_buffer_ but also completes the handshake. When this happens, the protect call below is skipped, but bytes_to_drain_this_iteration ends up being > 0, which could result in bytes in the input buffer being discarded without being sent.

Ways to detect:
ASSERT((prev_bytes_to_drain_ == 0) == (raw_write_buffer_.length() == 0);
ASSERT(prev_bytes_to_drain_ >= buffer.length());
ASSERT(buffer.length() >= bytes_to_drain_this_iteration) before the call to buffer.drain() further down
Possibly adding an ASSERT to OwnedImpl::drainImpl to detect attempts to drain more bytes than are in the buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per this code, I think we have a guarantee that raw_write_buffer_.length() is 0 when entering doWrite() for the first time after handshake completes. Also, it does not make sense that a peer wants to send non-handshake data without first confirming if the handshake completes successfully. In other words, during handshake, peer A will send whatever it receives from peer B to the handshake service in order to get the bytes to send to peer B. Here, peer A will not concatenate any non-handshake data to the data received from peer B, and send them to the handshake service because peer A has not received any confirmation from the handshake service that handshake completes successfully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_write_buffer_ will be non-empty after the doWrite just after handshake completes if that doWrite results in a partial write. I know this may be really unlikely but it is possible for raw_write_buffer_ to be non-empty after handshake completes. I think it also true that the peer that completes the handshake first will need to do a write after handshake completes locally to provide the remote peer the information it needs to complete its own handshake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible solution: branch on raw_write_buffer_.length() > 0 instead of prev_bytes_to_drain_.

When raw_write_buffer_.length() > 0 then bytes_to_drain_this_iteration = prev_bytes_to_drain_ and we should attempt a write even if bytes_to_drain_this_iteration is 0 which would happen if the bytes in the buffer are handshake bytes.

When raw_write_buffer_.length() > 0 then bytes_to_drain_this_iteration = std::min(buffer.length(), max_unprotected_frame_size_). If >0, attempt to protect and write those bytes, else break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a slightly different approach by introducing a new field - prev_handshake_bytes_to_drain_ that indicates if we need to drain handshake data before doing regular protect+write operations. PTAL.

bytes_to_drain_this_iteration, tsi_result_to_string(status));
}

// Write raw_write_buffer_ to nework.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: nework

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false};

while (true) {
uint64_t bytes_to_drain_this_iteration =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_write_buffer_ will be non-empty after the doWrite just after handshake completes if that doWrite results in a partial write. I know this may be really unlikely but it is possible for raw_write_buffer_ to be non-empty after handshake completes. I think it also true that the peer that completes the handshake first will need to do a write after handshake completes locally to provide the remote peer the information it needs to complete its own handshake.

tsi_result status = frame_protector_->protect(buffer, raw_write_buffer_);
ENVOY_CONN_LOG(debug, "TSI: protected buffer left: {} result: {}", callbacks_->connection(),
buffer.length(), tsi_result_to_string(status));
return repeatProtectAndWrite(buffer, end_stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but think that it could be moved to the "if (!handshake_complete_) {" branch.

@@ -12,7 +12,7 @@ namespace Alts {
namespace {
// Include 4 bytes frame message type and 16 bytes tag length.
// It is consistent with gRPC ALTS zero copy frame protector implementation.
static constexpr int FrameOverheadSize = 0;
static constexpr int FrameOverheadSize = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of max_unprotected_frame_size_ and setMaxUnprotectedFrameSize seems to make testing for this overhead constant difficult. Or at least that's the impression I get from no tests needing update when changing this overhead value. max_unprotected_frame_size_ is read in a single location in repeatProtectAndWrite, consider changing that to "actual_frame_size_to_use_ - FrameOverheadSize"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not use a hard-coded FrameOverheadSize within the code because in the test we used fake frame protector which has a different frame overhead size (i.e., 4 bytes). The code is updated. PTAL.

? prev_handshake_bytes_to_drain_
: (prev_bytes_to_drain_ > 0
? prev_bytes_to_drain_
: std::min(buffer.length(), actual_frame_size_to_use_ - frame_overhead_size_));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expression seems very complex. Consider the following instead:

  • ASSERT(prev_handshake_bytes_to_drain_ == 0) when entering repeatProtectAndWrite
  • in TsiSocket::doWrite, check if prev_handshake_bytes_to_drain_ > 0 and if so attempt to flush the buffer and only call repeatProtectAndWrite if the flush was successful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion. Done.

if (raw_write_buffer_.length() > 0) {
ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(),
raw_write_buffer_.length(), end_stream);
return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bytes from buffer were consumed so bytes written should be reported as being 0 regardless of what doWrite on the raw socket says.

Also, in cases where handshake completes successfully, it would be ideal to fall through and execute repeatProtectAndWrite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

Regarding fall through, I do not think it is going to happen because ALTS TSI is an async implementation which means in doHandshake(), it calls next() and immediately returns, which leaves handshake_complete_ being still false. When the response from handshake service comes back, handshake_complete_ will be set to true if handshake completes successfully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add ASSERT(!handshake_complete_) after the doHandshake() call with an explanation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -327,18 +313,32 @@ Network::IoResult TsiSocket::repeatProtectAndWrite(Buffer::Instance& buffer, boo
}

Network::IoResult TsiSocket::doWrite(Buffer::Instance& buffer, bool end_stream) {
Network::IoResult result = {Network::PostIoAction::KeepOpen, 0, false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: result is never used without it being written by a raw_buffer_socket_->doWrite. Consider removing this local variable and instead declaring it when doWrite is called.

Network::IoResult result = raw_buffer_socket_->doWrite(...);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (raw_write_buffer_.length() > 0) {
ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(),
raw_write_buffer_.length(), end_stream);
return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add ASSERT(!handshake_complete_) after the doHandshake() call with an explanation?

raw_write_buffer_.length(), end_stream);
return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));
// Check if we need to flush outstanding handshake bytes.
if (prev_handshake_bytes_to_drain_ > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: There are no reads for the exact number of bytes in prev_handshake_bytes_to_drain_, just wherever or not it is > 0.

Consider using a boolean instead to indicate that raw_write_buffer_ contains handshake data that needs to be flushed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

raw_write_buffer_.length(), end_stream);
return raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));
// Check if we need to flush outstanding handshake bytes.
if (prev_handshake_bytes_to_drain_ > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ASSERT(raw_write_buffer_.length() > 0) since prev_handshake_bytes_to_drain_ > 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (raw_write_buffer_.length() > 0) {
ENVOY_CONN_LOG(debug, "TSI: raw_write length {} end_stream {}", callbacks_->connection(),
raw_write_buffer_.length(), end_stream);
result = raw_buffer_socket_->doWrite(raw_write_buffer_, end_stream && (buffer.length() == 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coverage report shows that this branch is not covered by tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice finding! It turns out this code path should not be triggered at all. Because in an async TSI, the buffer to be sent to peer (i.e., raw_write_buffer_) should be filled by the handshake service and sent to the peer in the callback function (i.e., doHandshakeNextDone) upon finishing the process of handshake request. After scheduling a handshake request, we do not know if raw_write_buffer_ is ready to be sent to its peer so should not attempt the network write.

EXPECT_CALL(*server_.raw_socket_, doWrite(_, false))
.WillOnce(Invoke([&](Buffer::Instance& buffer, bool) {
Network::IoResult result = {Network::PostIoAction::KeepOpen, 4, false};
server_to_client_.add(buffer.linearize(0), 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think 0 is a valid argument to linearize. I think you need linearize(4) to match the number of bytes you're adding.

Alternatively, consider:
server_to_client_.move(buffer, 4);

which is a nicer version of add(..., 4) followed by drain(4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion. The code is much simpler now.

@yihuazhang
Copy link
Contributor Author

Sorry for the late response. All comments are addressed. PTAL.

// Envoy ALTS implements asynchronous tsi_handshaker_next() interface
// which returns immediately after scheduling a handshake request to
// the handshake service. The handshake response will be handled by a
// dedicated thread in a seperate API within which handshake_complete_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing format checks due to spell error.

seperate -> separate

Signed-off-by: yihuaz <[email protected]>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yihuazhang
Copy link
Contributor Author

Thanks @antoniovicente for the detailed review! I am wondering if there is anything else remaining to be addressed before this PR gets merged?

@antoniovicente antoniovicente merged commit 595bfcf into envoyproxy:main May 4, 2021
gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 5, 2021
gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants