-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grid: Track HTTP/3 broken-ness in ConnectivityGrid. #15933
Conversation
/assign @alyssawilk This doesn't yet handle marking QUIC as un-broken and I'm not entire sure I'm in love with this approach for tracking state, so perhaps we should chat about that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet! Excited to see progress on brokenness. Here's some preliminary thoughts, and happy to sync up when you're in :-)
source/common/http/conn_pool_grid.cc
Outdated
ENVOY_LOG(trace, "{} pool successfully connected to host '{}'.", describePool(attempt->pool()), | ||
grid_.host_->hostname()); | ||
if (http3_attempt_failed_) { | ||
// If the HTTP/3 attempt failed but the other attempt succeeded, mark HTTP/3 as broken. | ||
grid_.set_is_http3_broken(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to handle the case where H2 succeeds before H3 fails as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's the TODO here:
// TODO: Ensure that if HTTP/2 succeeds, we can allow the HTTP/3 connection to run to completion.
How 'bout I address this in a followup since I suspect that might result in a bit of refactoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned out to be less work than I expected, so I've resolved this TODO and handle this case now. (Added a test for it)
source/common/http/conn_pool_grid.h
Outdated
@@ -91,6 +91,8 @@ class ConnectivityGrid : public ConnectionPool::Instance, | |||
const StreamInfo::StreamInfo& info, | |||
absl::optional<Http::Protocol> protocol); | |||
|
|||
bool http3_attempt_failed() const { return http3_attempt_failed_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You and your google habits. no snake case for you. :-P
http3AttemptFailed()
ditto for brokenness below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I renamed is_http3_broken() and set_is_http3_broken(). I think that's all?
Mark HTTP/3 as broken when the HTTP/3 connect attempt fails but the fallback attempt succeeds. When HTTP/3 is broken, do not attempt to initiate new HTTP/3 connections. Add an assert that the Grid's host and the callback's host are the same. Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Ryan Hamilton <[email protected]>
…nection fails. Signed-off-by: Ryan Hamilton <[email protected]>
08c7571
to
d9859ce
Compare
Blergh, somehow two of my commits did not have signoff on them, do I had to follow the instructions to fix which required a force push. I'd love to understand how those commits did not get signoff. |
Signed-off-by: Ryan Hamilton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! This is looking solid :-)
I think you need a main merge, probably to deal with the newStream returns. Also adding David since it'd be good to have another person familiar with the code, though he's welcome to only do a cursory pass.
source/common/http/conn_pool_grid.cc
Outdated
@@ -70,11 +78,14 @@ void ConnectivityGrid::WrapperCallbacks::onConnectionAttemptFailed( | |||
|
|||
// If this point is reached, all pools have been tried. Pass the pool failure up to the | |||
// original caller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment, as pool failure may not be passed up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
source/common/http/conn_pool_grid.cc
Outdated
auto delete_this_on_return = attempt->removeFromList(connection_attempts_); | ||
ConnectionPool::Callbacks* callbacks = inner_callbacks_; | ||
inner_callbacks_ = nullptr; | ||
// If there are other connection attempts in progress, let them complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimization: If H3 connected but H2 was created due to timeout, do we want H2 to complete? Should we only continue this path if if it's an H3 attempt outstanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure works for me. I have some feeling in the back of my mind that there might be some corner case where we might wish TCP had run to completion ('cause HTTP/3 is going to end up getting marked as broken) but that doesn't actually seem plausible. Done.
source/common/http/conn_pool_grid.cc
Outdated
host_->hostname()); | ||
createNextPool(); | ||
pool = ++pool; | ||
} | ||
// TODO(#15649) track pools with successful connections: don't always start at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you doing this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as part of this PR, but I'm very happy to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are - by skipping pool ahead above :-P Functioanlly if QUIC is broken we start at H2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL! I totally misread what the TODO was for. You're right. Removed.
source/common/http/conn_pool_grid.h
Outdated
// Removes this from the owning list, deleting it. | ||
void deleteThis(); | ||
|
||
// Marks HTTP/3 broken If the HTTP/3 attempt failed but the other attempt succeeded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the other -> a TCP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
source/common/http/conn_pool_grid.h
Outdated
// The timer which tracks when new connections should be attempted. | ||
Event::TimerPtr next_attempt_timer_; | ||
// The iterator to the last pool which had a connection attempt. | ||
PoolIterator current_; | ||
// True if the HTTP/3 attempt failed. | ||
bool http3_attempt_failed_{}; | ||
// True if the ALPN attempt succeeded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call this TCP? I thought H3 has ALPN for all we hard code it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, based on previous conversations I thought that was the name of the other pool but looking at the code, I see it's called "Mixed". Sure, we TCP works for me. Done.
source/common/http/conn_pool_grid.h
Outdated
@@ -131,6 +139,12 @@ class ConnectivityGrid : public ConnectionPool::Instance, | |||
// Returns the next pool in the ordered priority list. | |||
absl::optional<PoolIterator> nextPool(PoolIterator pool_it); | |||
|
|||
// Returns true if |pool| is the Grid's HTTP/3 connection pool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Envoy rarely does the || style
Also optional Grid's -> grid's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Ryan Hamilton <[email protected]>
1f8b70d
to
7ee4c93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Just a few more nits and I think we can land this :-)
source/common/http/conn_pool_grid.cc
Outdated
host_->hostname()); | ||
createNextPool(); | ||
pool = ++pool; | ||
} | ||
// TODO(#15649) track pools with successful connections: don't always start at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are - by skipping pool ahead above :-P Functioanlly if QUIC is broken we start at H2
Signed-off-by: Ryan Hamilton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
source/common/http/conn_pool_grid.cc
Outdated
host_->hostname()); | ||
createNextPool(); | ||
pool = ++pool; | ||
} | ||
// TODO(#15649) track pools with successful connections: don't always start at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL! I totally misread what the TODO was for. You're right. Removed.
/retest |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG! Merge pending master being unlocked :-)
Argh, I was hoping to merge this when the branch opened but we're still slogging through release work and it's not going to happen. |
Sure, I'll look into it tomorrow |
Thanks, @antoniovicente ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Signed-off-by: Ryan Hamilton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Ryan Hamilton <[email protected]>
66695ab
/retest |
Retrying Azure Pipelines: |
…ancels the request to the underlying pool, if it is still registered with the pool when it is destroyed. (It would be really nice if the Cancellable object itself could take care of this cleanup, but one step at a time). Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Ryan Hamilton <[email protected]>
Signed-off-by: Ryan Hamilton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix on the lifetimes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Merging with @alyssawilk out
* grid: Track HTTP/3 broken-ness in ConnectivityGrid. Mark HTTP/3 as broken when the HTTP/3 connect attempt fails but the fallback attempt succeeds. When HTTP/3 is broken, do not attempt to initiate new HTTP/3 connections. Add an assert that the Grid's host and the callback's host are the same. Signed-off-by: Ryan Hamilton <[email protected]> Signed-off-by: Gokul Nair <[email protected]>
Mark HTTP/3 as broken when the HTTP/3 connect attempt fails but the
fallback attempt succeeds.
When HTTP/3 is broken, do not attempt to initiate new HTTP/3 connections.
Add an assert that the Grid's host and the callback's host are the same.
Risk Level: Low
Testing: Unit tests
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A