Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

honour routes timeout if max stream duration is set with out stream duration #15585

Merged
merged 15 commits into from
Apr 8, 2021

Conversation

ramaraochavali
Copy link
Contributor

Commit Message: honour routes timeout if maxstream duration is set with out specifying max stream duration inside it.
Additional Description:
Risk Level: Low
Testing: Added
Docs Changes: N/A
Release Notes: N/A (There is already a bug fix mentioned for MaxStreamDuration in current.rst)
Platform Specific Features: N/A
Fixes #15530

Signed-off-by: Rama Chavali <[email protected]>
@ramaraochavali
Copy link
Contributor Author

@mattklein123 @markdroth PTAL

@jmarantz
Copy link
Contributor

@adamsjob FYI

@mattklein123
Copy link
Member

Assigning over to @alyssawilk who I think reviewed the other timeout PRs.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching and fixing this!

@@ -156,6 +156,13 @@ FilterUtility::finalTimeout(const RouteEntry& route, Http::RequestHeaderMap& req
} else {
timeout.global_timeout_ = route.timeout();
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we switched timeouts, we moved the instantiation of the timeout to
ConnectionManagerImpl::ActiveStream::refreshDurationTimeout()

Mind moving this logic there?

Signed-off-by: Rama Chavali <[email protected]>
Signed-off-by: Rama Chavali <[email protected]>
Signed-off-by: Rama Chavali <[email protected]>
// Test timeout when route timeout needs to be used if max_stream_duration is not specified.
// Regression test for https://github.com/envoyproxy/envoy/issues/15530.
// TODO(ramaraochavali): move this test to DurationTimeout test.
TEST_F(HttpConnectionManagerImplTest, DurationTimeoutFromRouteTimeout) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to create a separate test for this because clusterInfo is repeatedly called and it keeps triggering the clearRouteCache so the timer gets fired multiple times. The same problem exists for https://github.com/envoyproxy/envoy/blob/main/test/common/http/conn_manager_impl_test.cc#L2173 where timer is repeatedly enabled. Having two such tests in the same test is not working. So I made it as separate test and TODO to look at the issue later.

@ramaraochavali
Copy link
Contributor Author

@alyssawilk moved to refreshDurationTimeout. PTAL.

@markdroth markdroth removed their assignment Mar 24, 2021
@markdroth
Copy link
Contributor

Alyssa knows this code a lot better than I do, so I'll let her review.

@ramaraochavali
Copy link
Contributor Author

ramaraochavali commented Mar 25, 2021

@alyssawilk can you PTAL when you get chance? Also the failure seems not related to this change

// If we are using new timeouts and MaxStreamDuration is not set, use the route's timeout
// value. This is the case when route's MaxStreamDuration is specified without inner
// MaxStreamDuration i.e. only grpc_timeout_header_max is set.
if (!route->maxStreamDuration()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if grpc_timeout_header_max is set when do we use it? usingNewTimeouts would be true and we'd ignore all the code below, no?

Can you please fix and regression test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, Yeah.. Missed that. Will add a test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssawilk Can you PTAL at the fix to see if this is correct? One thing I found was this needs changes to most of the test cases because now we will be initiating request timer most of the times. So I just want to ensure this is the right direction before I make changes the tests(some of which I have already done - please take a look at the changes)

Signed-off-by: Rama Chavali <[email protected]>
std::chrono::milliseconds timeout;
bool disable_timer = false;
auto grpc_timeout = Grpc::Common::getGrpcTimeout(*request_headers_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offhand I recall debate around the original PR about how the new fields should be applied and it was the consensus from the gRPC folks that the difference wasn't "is this a gRPC request or not" but "is the gRPC timeout header present or not" so I think switching to this is against the intent of the folks who added this to the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. This is good to know. I will change it accordingly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. It's super complicated and confusing, so thanks for continuing to plug away at it!

Signed-off-by: Rama Chavali <[email protected]>
@ramaraochavali
Copy link
Contributor Author

ramaraochavali commented Apr 2, 2021

@alyssawilk Sorry for the delay. Fixed. PTAL. I have added a regression test for this and left a TODO to merge to DurationTimeout test later as I am having some issues with mock timer (even for existing tests also this problem exists).

Signed-off-by: Rama Chavali <[email protected]>
Signed-off-by: Rama Chavali <[email protected]>
Signed-off-by: Rama Chavali <[email protected]>
Signed-off-by: Rama Chavali <[email protected]>
@ramaraochavali ramaraochavali requested a review from dio as a code owner April 2, 2021 13:18
Signed-off-by: Rama Chavali <[email protected]>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix to timeout config handling.

@ramaraochavali
Copy link
Contributor Author

@alyssawilk Can you PTAL when you get chance?

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I thought you meant to address the test TODO in this PR. I'm fine with it being a follow-up!

@ramaraochavali
Copy link
Contributor Author

@alyssawilk @antoniovicente one thing I realized is when timeout happens with new timeouts we call https://github.com/envoyproxy/envoy/blob/main/source/common/http/conn_manager_impl.cc#L762 and that sets http status to 408 (RequestTimeout) instead of gateway timeout (504) that router sets. Is this expected ? or do we need to change it to return 504 for timeouts? if it is expected - should we document it some where?

snowp added a commit to snowp/envoy that referenced this pull request Apr 22, 2021
Adds a runtime flag for disabling the behavior introduced in envoyproxy#15585

Signed-off-by: Snow Pettersen <[email protected]>
snowp added a commit to snowp/envoy that referenced this pull request Apr 22, 2021
…stream duration (envoyproxy#15585)"

This reverts commit ffa1680.

Signed-off-by: Snow Pettersen <[email protected]>
snowp added a commit that referenced this pull request Apr 23, 2021
…stream duration (#15585)" (#16133)

This reverts commit ffa1680.

Signed-off-by: Snow Pettersen <[email protected]>
@rgs1
Copy link
Member

rgs1 commented Apr 27, 2021

FYI this broke web sockets for us (by forcing them to terminate earlier), wondering if we can contribute more integration tests here to catch these sort of regressions.

htuch pushed a commit that referenced this pull request May 4, 2021
…stream duration (#15585)" (#16289)

This reverts commit ffa1680.

Signed-off-by: Dmitri Dolguikh <[email protected]>
gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 6, 2021
…stream duration (envoyproxy#15585)" (envoyproxy#16133)

This reverts commit ffa1680.

Signed-off-by: Snow Pettersen <[email protected]>
Signed-off-by: Gokul Nair <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

routes timeout is ignored if we set max stream duration
7 participants