Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport: add timeout for writing GOAWAY on http2Client.Close() #7371

Merged
merged 19 commits into from
Aug 16, 2024

Conversation

aranjans
Copy link
Contributor

@aranjans aranjans commented Jul 1, 2024

fixes #7314

Problem

In gRPC 1.64, we released a feature "Send GOAWAY to server on Client Transport Shutdown", post which in case of network hang when we are not able to write GOAWAY frame from client, closing connections will take a very long time.

What does this PR fix

This PR fixes this issue by having a timeout for loopyWriter to write GOAWAY so that http2Client.Close() doesn't wait for too long.

RELEASE NOTES:

  • transport/http2_client: fix waiting on http2Client.Close() in case of network hang for writing GOAWAY.

@aranjans aranjans requested a review from purnesh42H July 1, 2024 08:04
@aranjans aranjans added this to the 1.66 Release milestone Jul 1, 2024
Copy link

codecov bot commented Jul 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.85%. Comparing base (ced812e) to head (fad99ba).
Report is 15 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7371      +/-   ##
==========================================
+ Coverage   81.58%   81.85%   +0.27%     
==========================================
  Files         358      360       +2     
  Lines       27388    27539     +151     
==========================================
+ Hits        22345    22543     +198     
+ Misses       3828     3796      -32     
+ Partials     1215     1200      -15     
Files Coverage Δ
internal/transport/http2_client.go 91.90% <100.00%> (-0.16%) ⬇️

... and 38 files with indirect coverage changes

@purnesh42H purnesh42H requested a review from arjan-bal July 2, 2024 03:58
@purnesh42H purnesh42H self-assigned this Jul 2, 2024
internal/transport/transport_test.go Outdated Show resolved Hide resolved
internal/transport/transport_test.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/transport_test.go Outdated Show resolved Hide resolved
internal/transport/transport_test.go Outdated Show resolved Hide resolved
@aranjans aranjans removed the request for review from arjan-bal July 2, 2024 06:24
internal/transport/transport_test.go Outdated Show resolved Hide resolved
internal/transport/http2_client.go Outdated Show resolved Hide resolved
internal/transport/transport_test.go Outdated Show resolved Hide resolved
@aranjans aranjans force-pushed the aranjans_7314 branch 2 times, most recently from 8e61c11 to 3ea4a82 Compare July 2, 2024 10:21
@dfawley dfawley assigned aranjans and unassigned purnesh42H Jul 2, 2024
@aranjans aranjans force-pushed the aranjans_7314 branch 2 times, most recently from c81d450 to 3684229 Compare July 3, 2024 10:21
@aranjans aranjans assigned purnesh42H and unassigned aranjans Jul 3, 2024
@purnesh42H purnesh42H assigned aranjans and unassigned purnesh42H Jul 4, 2024
@aranjans aranjans force-pushed the aranjans_7314 branch 2 times, most recently from 3363377 to 87da73c Compare July 5, 2024 06:42
}

func (hc *hangingConn) Write(b []byte) (n int, err error) {
n, err = hc.Conn.Write(b)
if isGreetingDone.Load() == true {
if hc.isGreetingDone.Load() == true {
// Hang the Write for more than goAwayLoopyWriterTimeout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this timer? Otherwise it will release on its own and pass on master@HEAD, won't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping, this doesn't seem done.

Copy link
Contributor Author

@aranjans aranjans Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm... I think we would need to have timeout for hc.hangConn, otherwise this go-routine will be leaked via this call. So hc.Conn.Write will be hang after we made this write to the conn, which will eventually make the above go-routine to run forever and end up in leaked go-routine.(btw we already have deadline for net.Conn.Write defined here at the top of ct.Close, but then the current test will test only for this loopyWriter to timeout).
WDYT?

Copy link
Contributor

@purnesh42H purnesh42H Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely familiar with the goroutine you are referring to but one suggestion is that you probably don't have to do the actual write n, err = hc.Conn.Write(b) in case of goaway? You can just compare the passed bytes b if its goaway and then just hang? or is that not possible because different structs are being written?

Copy link
Contributor Author

@aranjans aranjans Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually we don't even need to compare it, we can just make sure hangingConn hangs from the time it receives the goAway request.

@dfawley I have removed the timer logic for hc.Write, and updated the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably good to release the <-hc.hangConn at the end of test?

internal/transport/transport_test.go Outdated Show resolved Hide resolved
internal/transport/transport_test.go Outdated Show resolved Hide resolved
@dfawley dfawley assigned aranjans and unassigned dfawley Aug 13, 2024
@dfawley
Copy link
Member

dfawley commented Aug 15, 2024

I'd really like to wrap this up before we cut the release branch. Will that be possible?

@arjan-bal arjan-bal modified the milestones: 1.66 Release, 1.67 Release Aug 16, 2024
@aranjans aranjans assigned dfawley and unassigned aranjans Aug 16, 2024
case <-hc.hangConn:
case <-timer.C:
}
if n == goAwayFrameSize { // hang the conn after the goAway is received
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the atomic?

This is worse, since you're just assuming that: 1. Nothing else written will be the size of the GOAWAY frame, and 2. the GOAWAY will be written to the connection, by itself, in a single operation.

}

func (hc *hangingConn) Write(b []byte) (n int, err error) {
n, err = hc.Conn.Write(b)
if n == goAwayFrameSize { // hang the conn after the goAway is received
if hc.startHanging.Load() == true {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if x == true --> if x, please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

internal/transport/transport_test.go Outdated Show resolved Hide resolved
internal/transport/transport_test.go Outdated Show resolved Hide resolved
internal/transport/transport_test.go Outdated Show resolved Hide resolved
internal/transport/transport_test.go Outdated Show resolved Hide resolved
@dfawley dfawley changed the title transport/http2_client: add timeout for writing GOAWAY on client.Close() transport: add timeout for writing GOAWAY on http2Client.Close() Aug 16, 2024
@dfawley dfawley merged commit 4e29cc6 into grpc:master Aug 16, 2024
13 checks passed
infovivek2020 pushed a commit to infovivek2020/grpc-go that referenced this pull request Aug 18, 2024
aranjans added a commit to aranjans/grpc-go that referenced this pull request Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Closing connection takes up to 15 minutes.
6 participants