Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execd produces only one metric if produced in a batch #7617

Closed
gregschrock opened this issue Jun 2, 2020 · 1 comment · Fixed by #7631
Closed

execd produces only one metric if produced in a batch #7617

gregschrock opened this issue Jun 2, 2020 · 1 comment · Fixed by #7631
Assignees
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@gregschrock
Copy link

Relevant telegraf.conf:

[agent]
omit_hostname = true

[[inputs.execd]]
command = ["tail", "-F", "/tmp/metrics.txt",]
signal = "none"
data_format = "influx"

[[outputs.file]]
files = [ "stdout",]
data_format = "influx"
flush_interval = "1s"
metric_batch_size = 1
json_timestamp_units = "1ns"

System info:

This was build from master as that's where the latest execd code exists.

$ telegraf --version
Telegraf unknown (git: master 1e7f714b)

My reproduction is using ci-1.13.docker from the scripts directory.

$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Steps to reproduce:

  1. Create the telegraf config
$ echo '[agent]
omit_hostname = true

[[inputs.execd]]
command = ["tail", "-F", "/tmp/metrics.txt",]
signal = "none"
data_format = "influx"

[[outputs.file]]
files = [ "stdout",]
data_format = "influx"
flush_interval = "1s"
metric_batch_size = 1
json_timestamp_units = "1ns"' > /tmp/telegraf.conf
  1. Create a file containing metrics to be produced all at once
$ echo 'metric1 value=1
metric2 value=2
metric3 value=3
' > /tmp/batch.txt
  1. Run telegraf
telegraf --config /tmp/telegraf.conf
  1. Append the batch to the tailed file (separate terminal)
cat /tmp/batch.txt >> /tmp/metrics.txt

Expected behavior:

I expect all three metrics to be produced as tail is outputting them to stdout.

metric1 value=1 1591110405176257000
metric2 value=2 1591110405176257000
metric3 value=3 1591110405176257000

Actual behavior:

Instead, only the first is produced.

metric1 value=1 1591108106648312300

It is possible to force out the remaining metrics by adding blank lines (or other metrics) to the file:

echo "" >> /tmp/metrics.txt

That will result in the second metric being produced. Likewise with the third.

Additional info:

I was able to track the issue down to the influx.StreamParser and demonstrate the issue. It seems to only be a problem with a continuous io.Reader and doesn't present itself in the current tests which use bytes.NewBuffer.

Here's what I believe is a valid demonstration of the issue. This can be added to plugins/parsers/influx/parser_test.go. I included the second for loop to demonstrated that closing the reader does produce additional metrics. I wouldn't expect that to be part of a delivered test.

func TestStreamParserProducesAllAvailableMetrics(t *testing.T) {
	expectedMetrics := 2
	r, w := io.Pipe()

	parser := NewStreamParser(r)
	parser.SetTimeFunc(DefaultTime)

	metrics := make(chan telegraf.Metric)
	defer close(metrics)

	go func() {
		w.Write([]byte("metric1 value=1\nmetric2 value=1"))
	}()

	go func() {
		for {
			fmt.Println("Waiting for a metric")
			m, err := parser.Next()
			fmt.Println("Got a metric")
			if err != nil {
				fmt.Println(err)
				return
			} else {
				metrics <- m
			}
		}
	}()

	i := 0
	for ; i < expectedMetrics && !t.Failed(); i++ {
		select {
		case m := <-metrics:
			fmt.Println(m)
			continue
		case <-time.After(1 * time.Second):
			t.Errorf("Timed out waiting for metrics. Expected %v, but received %v", expectedMetrics, i)
		}
	}

	r.Close()
	w.Close()

	for ; i < expectedMetrics; i++ {
		select {
		case m := <-metrics:
			fmt.Println(m)
			continue
		case <-time.After(1 * time.Second):
			t.Errorf("Also didn't get the metrics after close. Expected %v, but received %v", expectedMetrics, i)
		}
	}
}
@ssoroka ssoroka self-assigned this Jun 3, 2020
@danielnelson danielnelson added this to the 1.14.4 milestone Jun 5, 2020
@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Jun 5, 2020
@ssoroka
Copy link
Contributor

ssoroka commented Jun 5, 2020

Fixed! thanks for the detailed report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants