Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grid: Track HTTP/3 broken-ness in ConnectivityGrid. #15933

Merged
merged 14 commits into from
Apr 19, 2021

Conversation

RyanTheOptimist
Copy link
Contributor

Mark HTTP/3 as broken when the HTTP/3 connect attempt fails but the
fallback attempt succeeds.
When HTTP/3 is broken, do not attempt to initiate new HTTP/3 connections.
Add an assert that the Grid's host and the callback's host are the same.

Risk Level: Low
Testing: Unit tests
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

@RyanTheOptimist
Copy link
Contributor Author

/assign @alyssawilk

This doesn't yet handle marking QUIC as un-broken and I'm not entire sure I'm in love with this approach for tracking state, so perhaps we should chat about that.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! Excited to see progress on brokenness. Here's some preliminary thoughts, and happy to sync up when you're in :-)

source/common/http/conn_pool_grid.cc Outdated Show resolved Hide resolved
ENVOY_LOG(trace, "{} pool successfully connected to host '{}'.", describePool(attempt->pool()),
grid_.host_->hostname());
if (http3_attempt_failed_) {
// If the HTTP/3 attempt failed but the other attempt succeeded, mark HTTP/3 as broken.
grid_.set_is_http3_broken(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to handle the case where H2 succeeds before H3 fails as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's the TODO here:

// TODO: Ensure that if HTTP/2 succeeds, we can allow the HTTP/3 connection to run to completion.

How 'bout I address this in a followup since I suspect that might result in a bit of refactoring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned out to be less work than I expected, so I've resolved this TODO and handle this case now. (Added a test for it)

@@ -91,6 +91,8 @@ class ConnectivityGrid : public ConnectionPool::Instance,
const StreamInfo::StreamInfo& info,
absl::optional<Http::Protocol> protocol);

bool http3_attempt_failed() const { return http3_attempt_failed_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You and your google habits. no snake case for you. :-P

http3AttemptFailed()
ditto for brokenness below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I renamed is_http3_broken() and set_is_http3_broken(). I think that's all?

source/common/http/conn_pool_grid.cc Outdated Show resolved Hide resolved
Mark HTTP/3 as broken when the HTTP/3 connect attempt fails but the
fallback attempt succeeds.
When HTTP/3 is broken, do not attempt to initiate new HTTP/3 connections.
Add an assert that the Grid's host and the callback's host are the same.

Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Ryan Hamilton <[email protected]>
@RyanTheOptimist
Copy link
Contributor Author

Blergh, somehow two of my commits did not have signoff on them, do I had to follow the instructions to fix which required a force push. I'd love to understand how those commits did not get signoff.

Signed-off-by: Ryan Hamilton <[email protected]>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! This is looking solid :-)
I think you need a main merge, probably to deal with the newStream returns. Also adding David since it'd be good to have another person familiar with the code, though he's welcome to only do a cursory pass.

@@ -70,11 +78,14 @@ void ConnectivityGrid::WrapperCallbacks::onConnectionAttemptFailed(

// If this point is reached, all pools have been tried. Pass the pool failure up to the
// original caller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment, as pool failure may not be passed up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

auto delete_this_on_return = attempt->removeFromList(connection_attempts_);
ConnectionPool::Callbacks* callbacks = inner_callbacks_;
inner_callbacks_ = nullptr;
// If there are other connection attempts in progress, let them complete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimization: If H3 connected but H2 was created due to timeout, do we want H2 to complete? Should we only continue this path if if it's an H3 attempt outstanding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure works for me. I have some feeling in the back of my mind that there might be some corner case where we might wish TCP had run to completion ('cause HTTP/3 is going to end up getting marked as broken) but that doesn't actually seem plausible. Done.

source/common/http/conn_pool_grid.cc Show resolved Hide resolved
host_->hostname());
createNextPool();
pool = ++pool;
}
// TODO(#15649) track pools with successful connections: don't always start at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you doing this TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as part of this PR, but I'm very happy to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are - by skipping pool ahead above :-P Functioanlly if QUIC is broken we start at H2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL! I totally misread what the TODO was for. You're right. Removed.

source/common/http/conn_pool_grid.cc Show resolved Hide resolved
// Removes this from the owning list, deleting it.
void deleteThis();

// Marks HTTP/3 broken If the HTTP/3 attempt failed but the other attempt succeeded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the other -> a TCP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// The timer which tracks when new connections should be attempted.
Event::TimerPtr next_attempt_timer_;
// The iterator to the last pool which had a connection attempt.
PoolIterator current_;
// True if the HTTP/3 attempt failed.
bool http3_attempt_failed_{};
// True if the ALPN attempt succeeded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this TCP? I thought H3 has ALPN for all we hard code it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, based on previous conversations I thought that was the name of the other pool but looking at the code, I see it's called "Mixed". Sure, we TCP works for me. Done.

@@ -131,6 +139,12 @@ class ConnectivityGrid : public ConnectionPool::Instance,
// Returns the next pool in the ordered priority list.
absl::optional<PoolIterator> nextPool(PoolIterator pool_it);

// Returns true if |pool| is the Grid's HTTP/3 connection pool.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Envoy rarely does the || style
Also optional Grid's -> grid's?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Ryan Hamilton <[email protected]>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Just a few more nits and I think we can land this :-)

source/common/http/conn_pool_grid.cc Show resolved Hide resolved
host_->hostname());
createNextPool();
pool = ++pool;
}
// TODO(#15649) track pools with successful connections: don't always start at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are - by skipping pool ahead above :-P Functioanlly if QUIC is broken we start at H2

test/common/http/conn_pool_grid_test.cc Show resolved Hide resolved
Signed-off-by: Ryan Hamilton <[email protected]>
Copy link
Contributor Author

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

source/common/http/conn_pool_grid.cc Show resolved Hide resolved
host_->hostname());
createNextPool();
pool = ++pool;
}
// TODO(#15649) track pools with successful connections: don't always start at
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL! I totally misread what the TODO was for. You're right. Removed.

test/common/http/conn_pool_grid_test.cc Show resolved Hide resolved
@RyanTheOptimist
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15933 (comment) was created by @RyanTheOptimist.

see: more, trace.

alyssawilk
alyssawilk previously approved these changes Apr 15, 2021
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG! Merge pending master being unlocked :-)

@alyssawilk
Copy link
Contributor

Argh, I was hoping to merge this when the branch opened but we're still slogging through release work and it's not going to happen.
@asraa or @antoniovicente would you mind merging it for me tomorrow since I'll be out? it's code not used in prod and I've cleared not needing cross-company sign off, just need to merge :-)

@antoniovicente antoniovicente self-assigned this Apr 16, 2021
@antoniovicente
Copy link
Contributor

Argh, I was hoping to merge this when the branch opened but we're still slogging through release work and it's not going to happen.
@asraa or @antoniovicente would you mind merging it for me tomorrow since I'll be out? it's code not used in prod and I've cleared not needing cross-company sign off, just need to merge :-)

Sure, I'll look into it tomorrow

@RyanTheOptimist
Copy link
Contributor Author

Argh, I was hoping to merge this when the branch opened but we're still slogging through release work and it's not going to happen.
@asraa or @antoniovicente would you mind merging it for me tomorrow since I'll be out? it's code not used in prod and I've cleared not needing cross-company sign off, just need to merge :-)

Sure, I'll look into it tomorrow

Thanks, @antoniovicente !

Copy link
Contributor

@DavidSchinazi DavidSchinazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

source/common/http/conn_pool_grid.h Show resolved Hide resolved
source/common/http/conn_pool_grid.h Show resolved Hide resolved
source/common/http/conn_pool_grid.cc Show resolved Hide resolved
source/common/http/conn_pool_grid.cc Show resolved Hide resolved
source/common/http/conn_pool_grid.cc Outdated Show resolved Hide resolved
source/common/http/conn_pool_grid.cc Outdated Show resolved Hide resolved
source/common/http/conn_pool_grid.cc Outdated Show resolved Hide resolved
source/common/http/conn_pool_grid.cc Show resolved Hide resolved
test/common/http/conn_pool_grid_test.cc Outdated Show resolved Hide resolved
Signed-off-by: Ryan Hamilton <[email protected]>
Copy link
Contributor

@DavidSchinazi DavidSchinazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

source/common/http/conn_pool_grid.cc Show resolved Hide resolved
source/common/http/conn_pool_grid.h Show resolved Hide resolved
source/common/http/conn_pool_grid.h Show resolved Hide resolved
Signed-off-by: Ryan Hamilton <[email protected]>
DavidSchinazi
DavidSchinazi previously approved these changes Apr 16, 2021
source/common/http/conn_pool_grid.h Show resolved Hide resolved
asraa
asraa previously approved these changes Apr 16, 2021
Signed-off-by: Ryan Hamilton <[email protected]>
@RyanTheOptimist RyanTheOptimist dismissed stale reviews from asraa and DavidSchinazi via 66695ab April 17, 2021 15:15
@RyanTheOptimist
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15933 (comment) was created by @RyanTheOptimist.

see: more, trace.

…ancels

the request to the underlying pool, if it is still registered with the pool
when it is destroyed.

(It would be really nice if the Cancellable object itself could take care
of this cleanup, but one step at a time).

Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Ryan Hamilton <[email protected]>
Copy link
Contributor

@DavidSchinazi DavidSchinazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix on the lifetimes!

Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Merging with @alyssawilk out

@asraa asraa merged commit f5665d0 into envoyproxy:main Apr 19, 2021
gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 6, 2021
* grid: Track HTTP/3 broken-ness in ConnectivityGrid.
Mark HTTP/3 as broken when the HTTP/3 connect attempt fails but the
fallback attempt succeeds. When HTTP/3 is broken, do not attempt to initiate new HTTP/3 connections.
Add an assert that the Grid's host and the callback's host are the same.

Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Gokul Nair <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants