Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[network] handle onWriteReady socket failure #15137

Closed
wants to merge 3 commits into from

Conversation

asraa
Copy link
Contributor

@asraa asraa commented Feb 22, 2021

Signed-off-by: Asra Ali [email protected]

Commit Message: Remove RELEASE_ASSERT and handle socket connection failure
Additional Description:

RELEASE_ASSERT(socket_->getSocketOption(SOL_SOCKET, SO_ERROR, &error, &error_size).rc_ == 0,
was changed from ASSERT to RELEASE_ASSERT in #10107 and so the else branch following never executes: . Improve log message to understand crash (maybe an OOM condition)
Risk Level: Arguably negative risk. Removes risk of crash in favor of error handling (that was already existing)

Signed-off-by: Asra Ali <[email protected]>
sunjayBhatia
sunjayBhatia previously approved these changes Feb 22, 2021
source/common/network/connection_impl.cc Outdated Show resolved Hide resolved
"");
Api::SysCallIntResult result =
socket_->getSocketOption(SOL_SOCKET, SO_ERROR, &error, &error_size);
RELEASE_ASSERT(result.rc_ == 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this RELEASE_ASSERT to ENVOY_BUG + error handling?

Also, are there other examples where getSocketOption is followed by an ASSERT or RELEASE_ASSERT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other cases of getSocketOption all use error handling.

@asraa
Copy link
Contributor Author

asraa commented Feb 22, 2021

@sunjayBhatia I decided to change this to error handling as it was in release mode before the refactor when it was an ASSERT. It seems like this bug has a reasonable chance of triggering

@asraa asraa changed the title [network] improve onWriteReady abort message [network] handle onWriteReady socket failure Feb 22, 2021
Signed-off-by: Asra Ali <[email protected]>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing this to error handling instead of release_assert.

@@ -656,8 +656,6 @@ void ConnectionImpl::onWriteReady() {
socklen_t error_size = sizeof(error);
Api::SysCallIntResult result =
socket_->getSocketOption(SOL_SOCKET, SO_ERROR, &error, &error_size);
RELEASE_ASSERT(result.rc_ == 0,
fmt::format("Failed to connect: {}", errorDetails(result.errno_)));

if (error == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undefined behavior: error is not inialized if result.rc_ != 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I was under the impression error would be set, but it's actually two separate things. I'll handle getSocketOption return error with an ENVOY_BUG (since this is unexpected) and close socket, and error != 0 additionally with close socket.

Checking my understanding. getSocketOption can fail for e.g. if the fd is invalid, or we don't have enough resources. If we succeed, the socket may still have a pending error returned with error that happened between socket calls. Maybe like connection refused?

@@ -669,7 +667,8 @@ void ConnectionImpl::onWriteReady() {
return;
}
} else {
ENVOY_CONN_LOG(debug, "delayed connection error: {}", *this, error);
ENVOY_BUG(result.rc_ == 0, fmt::format("failed to connect: {}", errorDetails(result.errno_)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 24, 2021
@antoniovicente antoniovicente self-assigned this Mar 24, 2021
@asraa asraa removed the stale stalebot believes this issue/PR has not been touched recently label Mar 24, 2021
@asraa
Copy link
Contributor Author

asraa commented Mar 24, 2021

Removing the stale, fell off my radar with some WIP tests. Will be revisiting next week.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 24, 2021
@github-actions
Copy link

github-actions bot commented May 1, 2021

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot closed this May 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale stalebot believes this issue/PR has not been touched recently waiting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants