-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quic: fixing the disconnect between quiche stream count and Envoy #18694
Changes from 6 commits
d4a4f82
7390eef
b48d119
c38fc9a
98b584b
c6d5bc4
ed2b450
4fca90c
8d2e36f
2c508a9
2986fa1
b03f111
edd45cf
764a701
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -175,18 +175,23 @@ void ConnPoolImplBase::attachStreamToClient(Envoy::ConnectionPool::ActiveClient& | |
} | ||
ENVOY_CONN_LOG(debug, "creating stream", client); | ||
|
||
// Latch capacity before updating remaining streams. | ||
uint64_t capacity = client.currentUnusedCapacity(); | ||
client.remaining_streams_--; | ||
if (client.remaining_streams_ == 0) { | ||
ENVOY_CONN_LOG(debug, "maximum streams per connection, DRAINING", client); | ||
host_->cluster().stats().upstream_cx_max_requests_.inc(); | ||
transitionActiveClientState(client, Envoy::ConnectionPool::ActiveClient::State::DRAINING); | ||
} else if (client.numActiveStreams() + 1 >= client.concurrent_stream_limit_) { | ||
} else if (capacity <= 1) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm confused by this check. How do we get here if capacity == 0? Wouldn't we not be attaching to this client if we don't have any capacity? Should this by capacity == 1 with an ASSERT that capacity != 0? I'm probably missing something. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, it can't be 0, I was just trying to be as consistent as possible with the prior >=. I'll update to == 1 since it's confusing. |
||
// As soon as the new stream is created, the client will be maxed out. | ||
transitionActiveClientState(client, Envoy::ConnectionPool::ActiveClient::State::BUSY); | ||
} | ||
|
||
// Decrement the capacity, as there's one less stream available for serving. | ||
state_.decrConnectingAndConnectedStreamCapacity(1); | ||
// For HTTP/3, the capacity is updated in newStreamEncoder. | ||
if (trackStreamCapacity()) { | ||
state_.decrConnectingAndConnectedStreamCapacity(1); | ||
} | ||
// Track the new active stream. | ||
state_.incrActiveStreams(1); | ||
num_active_streams_++; | ||
|
@@ -213,14 +218,17 @@ void ConnPoolImplBase::onStreamClosed(Envoy::ConnectionPool::ActiveClient& clien | |
// If the effective client capacity was limited by concurrency, increase connecting capacity. | ||
// If the effective client capacity was limited by max total streams, this will not result in an | ||
// increment as no capacity is freed up. | ||
if (client.remaining_streams_ > client.concurrent_stream_limit_ - client.numActiveStreams() - 1 || | ||
had_negative_capacity) { | ||
// We don't update the capacity for HTTP/3 as the stream count should only | ||
// increase when a MAX_STREAMS frame is received. | ||
if (trackStreamCapacity() && (client.remaining_streams_ > client.concurrent_stream_limit_ - | ||
client.numActiveStreams() - 1 || | ||
had_negative_capacity)) { | ||
state_.incrConnectingAndConnectedStreamCapacity(1); | ||
} | ||
if (client.state() == ActiveClient::State::DRAINING && client.numActiveStreams() == 0) { | ||
// Close out the draining client if we no longer have active streams. | ||
client.close(); | ||
} else if (client.state() == ActiveClient::State::BUSY) { | ||
} else if (client.state() == ActiveClient::State::BUSY && client.currentUnusedCapacity() != 0) { | ||
transitionActiveClientState(client, ActiveClient::State::READY); | ||
if (!delay_attaching_stream) { | ||
onUpstreamReady(); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,7 +56,7 @@ class ActiveClient : public LinkedObject<ActiveClient>, | |
// Returns the application protocol, or absl::nullopt for TCP. | ||
virtual absl::optional<Http::Protocol> protocol() const PURE; | ||
|
||
int64_t currentUnusedCapacity() const { | ||
virtual int64_t currentUnusedCapacity() const { | ||
int64_t remaining_concurrent_streams = | ||
static_cast<int64_t>(concurrent_stream_limit_) - numActiveStreams(); | ||
|
||
RyanTheOptimist marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
@@ -102,6 +102,10 @@ class ActiveClient : public LinkedObject<ActiveClient>, | |
virtual void drain(); | ||
|
||
ConnPoolImplBase& parent_; | ||
// The count of remaining streams allowed for this connection. | ||
// This will start out as the total number of streams per connection if capped | ||
// by configuration, or it will be set to std::numeric_limits<uint32_t>::max() to be | ||
// (functionally) unlimited. | ||
Comment on lines
+107
to
+108
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unrelated to your change, but we should probably make this absl::optional to avoid the extremely rare but possible case where this is not actually unlimited. :) If you end up pushing more changes maybe TODO this? |
||
uint32_t remaining_streams_; | ||
uint32_t concurrent_stream_limit_; | ||
Upstream::HostDescriptionConstSharedPtr real_host_description_; | ||
|
@@ -148,6 +152,10 @@ class ConnPoolImplBase : protected Logger::Loggable<Logger::Id::pool> { | |
virtual ~ConnPoolImplBase(); | ||
|
||
void deleteIsPendingImpl(); | ||
// By default, the connection pool will track connected and connecting stream | ||
// capacity as streams are created and destroyed. QUIC does custom stream | ||
// accounting so will override this to false. | ||
virtual bool trackStreamCapacity() { return true; } | ||
|
||
// A helper function to get the specific context type from the base class context. | ||
template <class T> T& typedContext(AttachContext& context) { | ||
|
@@ -234,6 +242,9 @@ class ConnPoolImplBase : protected Logger::Loggable<Logger::Id::pool> { | |
void decrClusterStreamCapacity(uint32_t delta) { | ||
state_.decrConnectingAndConnectedStreamCapacity(delta); | ||
} | ||
void incrClusterStreamCapacity(uint32_t delta) { | ||
state_.incrConnectingAndConnectedStreamCapacity(delta); | ||
} | ||
void dumpState(std::ostream& os, int indent_level = 0) const { | ||
const char* spaces = spacesForLevel(indent_level); | ||
os << spaces << "ConnPoolImplBase " << this << DUMP_MEMBER(ready_clients_.size()) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,56 @@ class ActiveClient : public MultiplexedActiveClientBase { | |
public: | ||
ActiveClient(Envoy::Http::HttpConnPoolImplBase& parent, | ||
Upstream::Host::CreateConnectionData& data); | ||
|
||
// Http::ConnectionCallbacks | ||
void onMaxStreamsChanged(uint32_t num_streams) override; | ||
|
||
RequestEncoder& newStreamEncoder(ResponseDecoder& response_decoder) override { | ||
ASSERT(quiche_capacity_ != 0); | ||
// Each time a quic stream is allocated the quic capacity needs to get | ||
// decremented. See comments by quiche_capacity_. | ||
updateCapacity(quiche_capacity_ - 1); | ||
return MultiplexedActiveClientBase::newStreamEncoder(response_decoder); | ||
} | ||
|
||
// Overload the default capacity calculations to return the quic capacity | ||
// (modified by any stream limits in Envoy config) | ||
int64_t currentUnusedCapacity() const override { | ||
return std::min<int64_t>(quiche_capacity_, effectiveConcurrentStreamLimit()); | ||
} | ||
|
||
void updateCapacity(uint64_t new_quiche_capacity) { | ||
// Each time we update the capacity make sure to reflect the update in the | ||
// connection pool. | ||
// | ||
// Due to interplay between the max number of concurrent streams Envoy will | ||
// allow and the max number of streams per connection this is not as simple | ||
// as just updating based on the delta between quiche_capacity_ and | ||
// new_quiche_capacity, so we use the delta between the actual calculated | ||
// capacity before and after the update. | ||
RyanTheOptimist marked this conversation as resolved.
Show resolved
Hide resolved
|
||
uint64_t old_capacity = currentUnusedCapacity(); | ||
quiche_capacity_ = new_quiche_capacity; | ||
uint64_t new_capacity = currentUnusedCapacity(); | ||
|
||
if (new_capacity < old_capacity) { | ||
parent_.decrClusterStreamCapacity(old_capacity - new_capacity); | ||
} else if (old_capacity < new_capacity) { | ||
parent_.incrClusterStreamCapacity(new_capacity - old_capacity); | ||
} | ||
} | ||
|
||
// Unlike HTTP/2 and HTTP/1, rather than having a cap on the number of active | ||
// streams, QUIC has a fixed number of streams available which is updated via | ||
// the MAX_STREAMS frame. | ||
RyanTheOptimist marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// | ||
// As such each time we create a new stream for QUIC, the capacity goes down | ||
// by one, but unlike the other two codecs it is _not_ restored on stream | ||
// closure. | ||
// | ||
// We track the QUIC capacity here, and overload currentUnusedCapacity so the | ||
// connection pool can accurately keep track of when it is safe to create new | ||
// streams. | ||
uint64_t quiche_capacity_ = 100; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure that this initialization to 100 is quite right. At the QUIC protocol layer, this starts at 0 until the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we need to be at 100 or we'll prefetch a connection for each incoming stream: commented There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm. I definitely appreciate the point about the connection prefetch issue. That makes sense to me. We have to be careful that we don't actually send streams on the wire that are above the peers MAX_STREAMS limit or we will commit a protocol violation and the connection will be closed. In quiche, we do this via ShouldCreateOutgoingBidirectionalStream() returning false until we have stream capacity, but with #18614 that will return true. With 0-RTT in particular, we could send a bunch of requests in the first flight and trigger this. But maybe we already have logic which is queueing requests somewhere to avoid this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a test to future-proof that we update this before we raise the connected event. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But do we send 0-RTT request before updating this? initial_max_streams_bidi in transport parameters will update this initially. And it seems reasonable to me that 0-RTT can be sent before receiving transport param. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we don't support 0-rtt right now. When we do, we'll want to set this based on the cached params. |
||
}; | ||
|
||
// Http3 subclass of FixedHttpConnPoolImpl which exists to store quic data. | ||
|
@@ -45,6 +95,9 @@ class Http3ConnPoolImpl : public FixedHttpConnPoolImpl { | |
quic::QuicConfig& quic_config); | ||
|
||
Quic::PersistentQuicInfoImpl& quicInfo() { return *quic_info_; } | ||
// For HTTP/3 the base connection pool does not track stream capacity, rather | ||
// the HTTP3 active client does. | ||
bool trackStreamCapacity() override { return false; } | ||
|
||
private: | ||
// Store quic helpers which can be shared between connections and must live | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,21 @@ | |
|
||
#include "quic_filter_manager_connection_impl.h" | ||
|
||
namespace quic { | ||
namespace test { | ||
|
||
// TODO(alyssawilk) add the necessary accessors to quiche and remove this. | ||
class QuicSessionPeer { | ||
public: | ||
static quic::QuicStreamIdManager& | ||
getStreamIdManager(Envoy::Quic::EnvoyQuicClientSession* session) { | ||
return session->ietf_streamid_manager_.bidirectional_stream_id_manager_; | ||
} | ||
}; | ||
|
||
} // namespace test | ||
} // namespace quic | ||
|
||
namespace Envoy { | ||
namespace Quic { | ||
|
||
|
@@ -87,6 +102,15 @@ void EnvoyQuicClientSession::SetDefaultEncryptionLevel(quic::EncryptionLevel lev | |
} | ||
} | ||
|
||
void EnvoyQuicClientSession::OnCanCreateNewOutgoingStream(bool unidirectional) { | ||
if (http_connection_callbacks_ && !unidirectional) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit:
|
||
quic::QuicStreamIdManager& manager = quic::test::QuicSessionPeer::getStreamIdManager(this); | ||
ASSERT(manager.outgoing_max_streams() >= manager.outgoing_stream_count()); | ||
uint32_t streams_available = manager.outgoing_max_streams() - manager.outgoing_stream_count(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be nice to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ack There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we want There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah I started with that, but unfortunately it doesn't pan out i [2021-10-20 20:45:07.866][402][critical][assert] [source/common/quic/envoy_quic_client_session.cc:109] assert failure: manager.outgoing_max_streams() > manager.outgoing_stream_count(). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think you're right that this is a quiche bug. In particular, I think QuicSession::StreamDraining() is doing the wrong thing and needs to wrap the call to OnCanCreateNewOutgoingStream(unidirectional) inside of So the assert you have sounds fine for now and I'll look into fixing that in quiche, if that SGTY? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SGTM :-) |
||
http_connection_callbacks_->onMaxStreamsChanged(streams_available); | ||
} | ||
} | ||
|
||
std::unique_ptr<quic::QuicSpdyClientStream> EnvoyQuicClientSession::CreateClientStream() { | ||
ASSERT(codec_stats_.has_value() && http3_options_.has_value()); | ||
return std::make_unique<EnvoyQuicClientStream>(GetNextOutgoingBidirectionalStreamId(), this, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this technically happen for H/2 also? I thought the peer could send a new settings frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MAX_STREAMS is http/3 specific but the same does occur for onSettings above.
Apart from not wanting to manufacture a fake settings struct, I don't think we should combine the two since settings tells us the number of streams we're allowed to have open at a given time and onMaxStreamsChanged allows a fixed number of streams to be opened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see. Do you mind just adding a comment about how this is different from H2? For the uninitiated for H3 this is kind of confusing.