[outputs.influxdb_v2] add exponential backoff, and respect client error responses #8662

ssoroka · 2021-01-08T06:28:18Z

change max wait on error to 30s
handle more http response codes, both for future-proofing and for playing nice with gateways and proxies
4xx response codes now result in metrics being dropped, because server will never accept them. This prevents infinite delivery of failing metrics, blocking transmission at head of line.
add exponential backoff as well as respecting retry-after header. Pick whichever is more up to 30s max
include the server response code in the logs, as well as clarify if the metric was dropped.

resolves #8571

…or responses

reimda · 2021-01-11T17:03:17Z

plugins/outputs/influxdb_v2/http.go

+	}
+	// take the highest value from both, but not over the max wait.
+	retry := math.Max(backoff, retryAfterHeader)
+	retry = math.Min(retry, defaultMaxWait)


I would expect telegraf to obey the Retry-After header even if it's longer than defaultMaxWait. If the backend is in trouble and ops needs to quiet down retries, they will increase Retry-After and telegraf should obey it.

yeah, agreed. I had checked with the db team, and all of their use cases either don't use it (OSS 2.x doesn't use it at all), or it's only used as rate limiting. I suspect it would be prudent to still have a maximum as I wouldn't really want it to sit idle for 2 hours due to an expiry bug. I think we should set defaultMaxWait to whatever we're comfortable with as a max wait time and leave it at that. I can't really see anything above 60s being hugely valuable, but it seems a lot more likely to cause problems.

I'm proposing upping the defaultMaxWait to 60s. let me know what you think.

I prefer trusting the header unconditionally but if you don't want to, it's not a big deal for me because the situation we're talking about is unlikely anyway. Increasing absolute max is the next best thing, whether it's 1 or 5 or 10 minutes.

sspaink · 2021-01-11T21:00:41Z

!signed-cla

telegraf-tiger

🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

ssoroka · 2021-01-11T21:03:57Z

!signed-cla

telegraf-tiger

🤝 ✅ CLA has been signed. Thank you!

reimda · 2021-01-12T22:26:25Z

plugins/outputs/influxdb_v2/http.go

+	}
+	// take the highest value from both, but not over the max wait.
+	retry := math.Max(backoff, retryAfterHeader)
+	retry = math.Min(retry, defaultMaxWait)


I prefer trusting the header unconditionally but if you don't want to, it's not a big deal for me because the situation we're talking about is unlikely anyway. Increasing absolute max is the next best thing, whether it's 1 or 5 or 10 minutes.

plugins/outputs/influxdb_v2/http.go

srebhan

LGTM!

…or responses (#8662) * [outputs.influxdb_v2] add exponential backoff, and respect client error responses * add test * Update to 60 seconds * fix test (cherry picked from commit 9c7cf99)

…or responses (influxdata#8662) * [outputs.influxdb_v2] add exponential backoff, and respect client error responses * add test * Update to 60 seconds * fix test

ssoroka added 2 commits January 8, 2021 01:23

[outputs.influxdb_v2] add exponential backoff, and respect client err…

e111ed0

…or responses

add test

969e318

ivorybilled approved these changes Jan 8, 2021

View reviewed changes

ssoroka requested a review from reimda January 8, 2021 19:14

reimda reviewed Jan 11, 2021

View reviewed changes

telegraf-tiger bot requested changes Jan 11, 2021

View reviewed changes

telegraf-tiger bot approved these changes Jan 11, 2021

View reviewed changes

reimda approved these changes Jan 12, 2021

View reviewed changes

ssoroka commented Jan 21, 2021

View reviewed changes

plugins/outputs/influxdb_v2/http.go Outdated Show resolved Hide resolved

ssoroka added 2 commits January 21, 2021 14:35

Update to 60 seconds

d94daf6

fix test

db90813

srebhan approved these changes Jan 26, 2021

View reviewed changes

srebhan self-assigned this Jan 26, 2021

ssoroka merged commit 9c7cf99 into master Jan 27, 2021

ssoroka deleted the influxdbv2-retry branch January 27, 2021 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[outputs.influxdb_v2] add exponential backoff, and respect client error responses #8662

[outputs.influxdb_v2] add exponential backoff, and respect client error responses #8662

ssoroka commented Jan 8, 2021

reimda Jan 11, 2021

ssoroka Jan 12, 2021

reimda Jan 12, 2021

sspaink commented Jan 11, 2021

telegraf-tiger bot left a comment

ssoroka commented Jan 11, 2021

telegraf-tiger bot left a comment

reimda Jan 12, 2021

srebhan left a comment

[outputs.influxdb_v2] add exponential backoff, and respect client error responses #8662

[outputs.influxdb_v2] add exponential backoff, and respect client error responses #8662

Conversation

ssoroka commented Jan 8, 2021

reimda Jan 11, 2021

Choose a reason for hiding this comment

ssoroka Jan 12, 2021

Choose a reason for hiding this comment

reimda Jan 12, 2021

Choose a reason for hiding this comment

sspaink commented Jan 11, 2021

telegraf-tiger bot left a comment

Choose a reason for hiding this comment

ssoroka commented Jan 11, 2021

telegraf-tiger bot left a comment

Choose a reason for hiding this comment

reimda Jan 12, 2021

Choose a reason for hiding this comment

srebhan left a comment

Choose a reason for hiding this comment