Adding a threaded parsing option for statsD input plugin #6922

josephpeacock · 2020-01-18T00:47:54Z

Required for all PRs:

[ X ] Signed CLA.
[ X] Associated README.md updated.
[X ] Has appropriate unit tests.

danielnelson · 2020-01-22T01:32:21Z

plugins/inputs/statsd/statsd.go

+
+	MaxParserThreads int `toml:"max_parser_threads"`


Let's not expose this option, since it is so tied to an implementation detail. Let's just pick a good default number and always start this many. How about 3?

I'm not sure if I agree with that. It's definitely tied to an implementation detail, but I think there is value in allowing the level of concurrency to be configurable. For instance, if I choose to run telegraf on the same machine as my app, I may want to limit it's footprint. But if I set up a dedicated instance to act as a metrics aggregator, being able to spend more CPU to increase throughput (and reduce the number of servers I need to use) could be very useful.

Looking at the existing telegraf.conf, it seems as if there are a few places where this is configurable (but maybe we should rename it?)

https://github.com/influxdata/telegraf/blob/master/etc/telegraf.conf#L6124
https://github.com/influxdata/telegraf/blob/master/etc/telegraf.conf#L373
https://github.com/influxdata/telegraf/blob/master/etc/telegraf.conf#L4961

Maybe there's some way we could expose a level of concurrency as an option without tying it directly to the number of goroutines?

I'm not confident enough yet that this is the right solution, at the very least we would need to do better performance testing before I'd be willing to add it. I'd also like to discuss this with @reimda before we commit to any new options.

danielnelson · 2020-01-22T01:39:16Z

plugins/inputs/statsd/statsd_test.go

-		time.Sleep(time.Millisecond)
+		for len(listener.in) > 0 {
+			fmt.Printf("Left in buffer: %v \n", len(listener.in))
+			time.Sleep(time.Millisecond)


Sorry about the current state of the tests for this plugin which encourages this, but let's not add any new tests/benchmarks that require sleep.

danielnelson · 2020-01-22T07:34:29Z

Here is a bit of testing I did. Until timestamp 07:10 it is running with this branch and 3 goroutines, after this timestamp it is running with the current code (master with #6921).

The extra goroutines are reducing the dropped packets, but for some reason there is a slight increase in udp errors at the system level and a moderate decrease in the total number of packets captured. The number of processed packets is about the same. Could we be slowing the read goroutine by having more readers of the channel? If so, why is the system udp_inerrors only slightly decreased?

josephpeacock · 2020-01-23T00:54:18Z

I'm not sure if I'm seeing the decrease in packets captured? From your image, it looks like the green line is the total number of packets received, which remains relatively consistent across both sides of the 7:10 timestamp. It's just the rate that drops, but that makes sense, as the calculation for rate is packets_received - packets_dropped, and packet drops rose dramatically after the switch.

I agree that the udp_inerrors seem to be coming in at a higher rate with the goroutine changes based on your graph. But I'm unsure why. I'll do some looking around as well, but do you have any insight into what types of things cause these errors? (It looks like they come in regularly anyways. Is the error profile the same or different with the different uses?)

I'll also try to get something spun up to check this out myself.

josephpeacock · 2020-02-28T22:50:02Z

@danielnelson I'd love to revisit this and see if we can get a resolution. (Sorry for the long delay, I've been on and off vacation for the last month or so). Did you get a chance to discuss with @reimda?

We are running a forked version with these changes in a pre-production environment and having good results. (Up to ~120k-130k TPM for a single telegraf instance).

danielnelson · 2020-03-02T20:37:58Z

I don't want to add any new options to the plugin interface but if you will update with a fixed number of reader goroutines, probably 3 or so, then we can merge.

josephpeacock · 2020-03-03T19:23:50Z

Fair enough. I updated the PR with 5 (I tend to think the slightly higher number gains us a bit more benefit. We tested up to 1000 without hitting CPU issues), but I can bump it down to 3 if you feel strongly.

Joe Peacock added 2 commits January 17, 2020 16:15

Adding a threaded parsing option for statsD input plugin

35e5f4d

Fixing formatting errors

f2c564d

josephpeacock mentioned this pull request Jan 18, 2020

StatsD Input: Add option for threaded parsing of metrics #6919

Closed

danielnelson suggested changes Jan 22, 2020

View reviewed changes

danielnelson added this to the 1.14.0 milestone Jan 22, 2020

danielnelson added area/statsd feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin labels Jan 22, 2020

Removing added benchmark tests as per code review feedback

7e6c588

sjwang90 removed this from the 1.14.0 milestone Feb 21, 2020

Removing configuration item per code review feedback

662f3c0

Formatting

6207e60

danielnelson approved these changes Mar 3, 2020

View reviewed changes

danielnelson added this to the 1.14.0 milestone Mar 3, 2020

danielnelson merged commit ab8438d into influxdata:master Mar 3, 2020

josephpeacock mentioned this pull request Mar 4, 2020

Make StatsD Plugin Parser Concurrency Level Configurable #7109

Closed

athoune pushed a commit to bearstech/telegraf that referenced this pull request Apr 17, 2020

Add threaded parsing in statsd input plugin (influxdata#6922)

d52693d

veorlo pushed a commit to GlobalNOC/telegraf that referenced this pull request May 4, 2020

Add threaded parsing in statsd input plugin (influxdata#6922)

7c1bcb6

idohalevi pushed a commit to idohalevi/telegraf that referenced this pull request Sep 29, 2020

Add threaded parsing in statsd input plugin (influxdata#6922)

03a9c2a

gnjack mentioned this pull request Jan 14, 2021

Add configurable Max TTL duration for statsd input plugin entries #8509

Merged

liorfranko mentioned this pull request Dec 21, 2022

Change statsd input number of threads/workers #12432

Closed

arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023

Add threaded parsing in statsd input plugin (influxdata#6922)

210c9fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a threaded parsing option for statsD input plugin #6922

Adding a threaded parsing option for statsD input plugin #6922

josephpeacock commented Jan 18, 2020 •

edited

Loading

danielnelson Jan 22, 2020

josephpeacock Jan 22, 2020

danielnelson Jan 22, 2020

danielnelson Jan 22, 2020

danielnelson commented Jan 22, 2020

josephpeacock commented Jan 23, 2020

josephpeacock commented Feb 28, 2020

danielnelson commented Mar 2, 2020

josephpeacock commented Mar 3, 2020

Adding a threaded parsing option for statsD input plugin #6922

Adding a threaded parsing option for statsD input plugin #6922

Conversation

josephpeacock commented Jan 18, 2020 • edited Loading

Required for all PRs:

danielnelson Jan 22, 2020

Choose a reason for hiding this comment

josephpeacock Jan 22, 2020

Choose a reason for hiding this comment

danielnelson Jan 22, 2020

Choose a reason for hiding this comment

danielnelson Jan 22, 2020

Choose a reason for hiding this comment

danielnelson commented Jan 22, 2020

josephpeacock commented Jan 23, 2020

josephpeacock commented Feb 28, 2020

danielnelson commented Mar 2, 2020

josephpeacock commented Mar 3, 2020

josephpeacock commented Jan 18, 2020 •

edited

Loading