Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close HTTP2 connections on timeout in influxdb outputs #7517

Merged
merged 4 commits into from
May 19, 2020

Conversation

danielnelson
Copy link
Contributor

closes #5905

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@danielnelson danielnelson added area/influxdb fix pr to fix corresponding bug labels May 15, 2020
@danielnelson danielnelson added this to the 1.14.3 milestone May 15, 2020
@danielnelson danielnelson requested a review from ssoroka May 15, 2020 06:17
Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to hit request changes just to hold up the production line here (sorry!).

I think this pattern of reconnect on timeout might be detrimental to a server that has timed out due to load and problems keeping up. Re-establishing the connection means renegotiating the tcp/ip and TLS connections, incurring significant CPU overhead. A limping server with a few clients like this would perform much worse with this change.

[edit]
I don't have all the context loaded for this.. if it's constrained to the db create call, it's probably fine, especially since I believe we don't keep retrying it.
[/edit]

Let me know what you think.

@danielnelson
Copy link
Contributor Author

This is for all HTTP exchanges, not just the create database call. Anytime we timeout a request it is going to be more expensive on the server as work that has been done will be retried and must be processed again on top of the new connection and TLS handshake.

In this situation, its not clear if the server will complete the request or not. It's also not clear if the connection is still valid for reuse. With HTTP/1 the connection is discarded by Go, but with HTTP/2 the connection is reused indefinitely.

The timeout option gives the user control over how fast we give up, so I think if someone is concerned about the connection setup costs they would just want to increase the timeout.

@@ -209,6 +209,12 @@ func (c *httpClient) CreateDatabase(ctx context.Context, database string) error

resp, err := c.client.Do(req.WithContext(ctx))
if err != nil {
// Close connection after a timeout error. If this is a HTTP2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy pasted in three places. That's enough to make a new function worth the trouble.

func (c *httpClient) handleTimeout(err error)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/influxdb fix pr to fix corresponding bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

telegraf does not reconnect to influxdb
3 participants