-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http: connection pool - Allow a cancel callback of a request cancel other requests #7345
http: connection pool - Allow a cancel callback of a request cancel other requests #7345
Conversation
Signed-off-by: Yuval Kohavi <[email protected]>
a6c35d3
to
b76a968
Compare
This seems reasonable to me, @mattklein123 WDYT? |
From a very quick look I'm not convinced this is the fix we want, but at a high level I think I understand what is going on and I think this is on the right track. Thanks a ton for digging in so deeply. I think the next step is can you write a test that crashes? I think that will help better understand the scenario and then we can walk through what the right solution is? Thank you! /wait |
Signed-off-by: Yuval Kohavi <[email protected]>
@mattklein123 - added a test that crashes. |
@yuval-k sorry why can't you add an HTTP/1 test? I understand the problem and it seems like you should be able to setup a test that repros the problem without going directly into the base class? /wait-any |
Hi @mattklein123 , I will be out of the office (big family event) this coming week so I'll have to get back to this on the following week (week of June30th). |
Signed-off-by: Yuval Kohavi <[email protected]>
Signed-off-by: Yuval Kohavi <[email protected]>
Hi @mattklein123, I added a failing test via the http1 conn-pool. would love your insight on how to fix this bug. |
@yuval-k I think my original comment still stands: can you get a test for the HTTP/1.1 that fails? I don't think we should be testing the base class. Also, once that works, feel free to put your proposed fix back and then we can look at it all together. Thank you! /wait |
Hi @mattklein123 See my last commit - I added a failing http1 unit test here: 458c4fb - Did you mean adding an integration test? |
Signed-off-by: Yuval Kohavi <[email protected]>
Ah OK sorry, I missed that. Can you add your fix now? The I can take a look. Thank you! /wait |
Signed-off-by: Yuval Kohavi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, at a high level this makes sense to me but some more comments and cleanups will help me solidify my understanding. Thanks for working on this!
/wait
ConnPoolCallbacks callbacks1; | ||
uint32_t pool_failure_calls{}; | ||
EXPECT_CALL(callbacks1.pool_failure_, ready()) | ||
.Times(AtMost(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use WillOnce here and below? This should be deterministic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
essentially one and only one of those of callbacks should be called (as one of them cancels the other).
it depends on the implementation of the conn pool which one is first; while i can check the impl to see which one it will be, i thought it is better that the test covers the accepted behaviour to allow the impl to change without breaking the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point taken, but my preference would be simplify the test for now. Feel free to leave a comment about how it's tied to the implementation.
// Simulate connection failure. | ||
EXPECT_CALL(conn_pool_, onClientDestroy()); | ||
conn_pool_.test_clients_[0].connection_->raiseEvent(Network::ConnectionEvent::RemoteClose); | ||
EXPECT_EQ(1, pool_failure_calls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I asked this above. Why is this not deterministic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, see my answer above; essentially we need to see that only one of the callbacks was called.
Signed-off-by: Yuval Kohavi <[email protected]>
Sorry am traveling will get to this next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this LGTM. Please update the PR title (and description if needed) to be more descriptive of the actual problem.
/wait
Signed-off-by: Yuval Kohavi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, one follow up question.
/wait-any
Signed-off-by: Yuval Kohavi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Fixes an ASSERT crash with the connection pool in the following scenario:
After a gdb session I found out the following:
ConnPoolImplBase::newPendingRequest
ConnPoolImplBase::purgePendingRequests
is calledrequest->callbacks_.onPoolFailure
is called for the first requestsendLocalReply
inFilter::onComplete
)sendLocalReply
ends the stream which calls the JWT filteronDestory
onDestory
(eventually) cancels the pending requestsConnPoolImplBase::onPendingRequestCancel
(all of this from the middle ofConnPoolImplBase::purgePendingRequests
's call toUpstreamRequest::onPoolFailure
)ConnPoolImplBase::onPendingRequestCancel
tries to remove the pending request for the list, but it is not there any-more aspurgePendingRequests
moved the list to a stack variableLinkedObject::removeFromList
as it cannot find the element in the list.The fix is to move the cancelling requests their own list, and remove pending requests from that new list as long as it is not empty.
Risk Level: Mid\High
Testing: Added unit test
Docs Changes: N\A
Release Notes: N\A
stacktrace.txt