Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Enum Processor #3772

Merged
merged 5 commits into from
Jul 3, 2018
Merged

Conversation

KarstenSchnitter
Copy link
Contributor

This processor allows substitution of field values of string or bool type with
numeric values. It can be used, e.g. to map status codes such as "red", "amber"
and "green" to values such as 0, 1, 2. Multiple fields can be configured, each
with its own mapping table.

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request, I'm very interested in this one and I know quite a few other are as well. Here are a few ideas and observations I have:

[processors.enum.fields.value_mappings]
green = 0
yellow = 1
red = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will want a way to specify a default case, or perhaps we should remove the field if it does not match? If we emit the original string the data type will change, which will cause problems on insertion to the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using this plugin together with #3763, which drops any string values and thus removing unmatched values. For a more generic use-case I see the benefit in having a default behaviour to ensure a constant data type. I proposed to make dropping the field if it contains an unspecified value the default behaviour and optionally use a configurable default value:

[[processors.enum]]
  [[processors.enum.fields]]
    source = "status"
    default = -1
    [processors.enum.fields.value_mappings]
      green = 0
      yellow = 1

Would that be a reasonable solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

[processors.enum.fields.value_mappings]
green = 0
yellow = 1
red = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice if you could optionally modify the output fieldkey, so that you could transform like this:

-foo status="red"
+foo status="red",status_code=2i

Maybe something like this?:

[[processors.enum]]
  [[processors.enum.fields]]
    source = "status"
    destination = "status_code"
    [processors.enum.fields.value_mappings]
      green = 0
      yellow = 1
      red = 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea. I will implement it.

}

func (mapper *EnumMapper) Apply(in ...telegraf.Metric) []telegraf.Metric {
out := make([]telegraf.Metric, 0, len(in))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to make this as efficient as possible since it will be ran in over all metrics, perhaps we can modify in directly?

}

func (mapper *EnumMapper) applyMappings(source telegraf.Metric) (telegraf.Metric, error) {
if fields, changed := mapper.applyFieldMappings(source.Fields()); changed == true {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of grabbing all the fields, it might be faster to use HasField to check for the fields we are interested in and modify the metric instead of creating a new one.

Copy link
Contributor Author

@KarstenSchnitter KarstenSchnitter Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking with HasFieldfirst is a great suggestion. I will implement that.

As discussed above I have trouble when modifying the metric directly in the case where it has just the one metric we are interested in. Note, that the implementation will only create a new metric when a substitution was made and pass on the original object otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I flipped through the other comments and couldn't find the discussion of this, can you tell me more about the problem modifying the metric directly or point me in the right direction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can provide a simple test to demonstrate my observations:

import (
	"fmt"
	"testing"
	"time"

	"github.com/influxdata/telegraf/metric"
)

func TestDirectModification(t *testing.T) {
	metric, _ := metric.New("m",
		map[string]string{},
		map[string]interface{}{"test": "value"},
		time.Now(),
	)
	fmt.Println(metric.String()) // original state
	metric.RemoveField("test")
	fmt.Println(metric.String()) // field is not removed
	metric.AddField("test", 13)
	fmt.Println(metric.String()) // added new value
	metric.RemoveField("test")
	fmt.Println(metric.String()) // remove old value, but colon error
	metric.Fields()              // panic because of colon from above
}

This creates the following output:

$ go test
m test="value" 1519041597145903000

m test="value" 1519041597145903000

m test="value",test=13i 1519041597145903000

m ,test=13i 1519041597145903000

--- FAIL: TestDirectModification (0.00s)
panic: runtime error: index out of range [recovered]
	panic: runtime error: index out of range

goroutine 5 [running]:
testing.tRunner.func1(0xc4201080f0)
	/usr/local/Cellar/go/1.9.3/libexec/src/testing/testing.go:711 +0x2d2
panic(0x1292140, 0x1458a40)
	/usr/local/Cellar/go/1.9.3/libexec/src/runtime/panic.go:491 +0x283
github.com/influxdata/telegraf/metric.(*metric).Fields(0xc42009a200, 0x1)
	/Workspaces/go/src/github.com/influxdata/telegraf/metric/metric.go:345 +0xa22
github.com/influxdata/telegraf/plugins/processors/enum.TestDirectModification(0xc4201080f0)
	/Workspaces/go/src/github.com/influxdata/telegraf/plugins/processors/enum/demo_test.go:24 +0x4f9
testing.tRunner(0xc4201080f0, 0x12f5940)
	/usr/local/Cellar/go/1.9.3/libexec/src/testing/testing.go:746 +0xd0
created by testing.(*T).Run
	/usr/local/Cellar/go/1.9.3/libexec/src/testing/testing.go:789 +0x2de
exit status 2
FAIL	github.com/influxdata/telegraf/plugins/processors/enum	0.020s

So the following behaviour can be observed:

  • RemoveField(test) on the last field does not change the metric, the field remains.
  • AddField(test,13) creates a new field with a different value
  • RemoveField(test) removes the old value but leads to a leading ',' which indicates a problem
  • the problem from above manifests in a panic when the Fields() is called

This problem is always present, if the leading field in the string representation is removed. Even when more fields are present. Therefore, I decided, that modifying a field value directly by this approach is not possible. The issue lies in the unreliability of RemoveField(...).

Is there another approach to a direct modification? I really want to keep the feature of replacing the original value not just adding the mapped value as a new field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm rewriting the implementation of telegraf.Metric for 1.6 and I'll fix this, I'll let you know when its ready for testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is included in #3924

@danielnelson
Copy link
Contributor

closes #2791

@ikovalyov
Copy link

What if we need to map few strings to the same numeric value?

@KarstenSchnitter
Copy link
Contributor Author

Using this processor you specify a map what string to map to what value. You can easily specify multiple strings mapping to the same value, e.g.:

[[processors.enum]]
  [processors.enum.fields.value_mappings]
    green = 0
    yellow = 1
    red = 1

@KarstenSchnitter
Copy link
Contributor Author

I committed changes addressing the questions by @danielnelson quite some time ago. Can we have another review on the current state of this PR?

@glinton glinton added this to the 1.8.0 milestone Jun 27, 2018
This processor allows substitution of field values of string or bool type with
numeric values. It can be used, e.g. to map status codes such as "red", "amber"
and "green" to values such as 0, 1, 2. Multiple fields can be configured, each
with its own mapping table.

Signed-off-by: Karsten Schnitter <[email protected]>
@KarstenSchnitter
Copy link
Contributor Author

KarstenSchnitter commented Jul 3, 2018

I re-pushed to fix the merge conflict.

@danielnelson danielnelson merged commit 515ff03 into influxdata:master Jul 3, 2018
rgitzel pushed a commit to rgitzel/telegraf that referenced this pull request Oct 17, 2018
otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019
@gutschein
Copy link

gutschein commented Aug 23, 2019

I tried to map MAC-Addresses to hostname, but it doesn't work for me. Could it be possible that handling of more complex strings (containing : or _ ) and hwaddr is not yet covered? Or is therer a limitation for tag-fields? With this setting hostname is not filled with default value nor hostnames to MAC-Addresses.

  [[processors.enum]]
    [[processors.enum.mapping]]
      field = "caps_man_mac_address"
      dest =  "caps_man_hostname"
      default = "na"
      [processors.enum.mapping.value_mappings]
         "AA:11:BB:22:CC:33" = "hostname"
...
    [[inputs.snmp.table.field]]
      name = "caps_man_mac_address"
      oid = ".1.3.6.1.4.1.14988.1.1.1.5.1.1"
      conversion = "hwaddr"
      is_tag = true
    [[inputs.snmp.table.field]]     # doubled to get a taged field , to be replaced by value mapping
      name = "caps_man_hostname"
      oid = ".1.3.6.1.4.1.14988.1.1.1.5.1.1"
      is_tag = true

@KarstenSchnitter
Copy link
Contributor Author

Hi @gutschein,

thanks for trying out the enum processor plugin. Currently, the plugins supports only mapping for fields, but not for tags. If I understand your example correctly, the configuration is_tag = true adds caps_man_mac_address as a tag to the measurement. Therefore, the enum plugin is not able to process the mapping.

You could try to first emit the mac address as a field using is_tag = false, then map the field value with the enum processor plugin using your configuration. If necessary you can use the converter processor plugin to convert the mapped field into a tag. Note, that you may need to specify the order property of the processor plugins to ensure, that the measurements are first processed by the enum plugin and then by the converter plugin.

If your problem persists, please provide an example measurement generated by your input configuration and the desired output from the enum processor plugin. This makes it easier to understand the issue and provide a solution.

Best Regards,
Karsten

@gutschein
Copy link

thanks for writing enum... :) and thanks for your response.

Good idea to change field to tag later usinge the converter. I give it a try after the basic is running. You mentioned an option to order of the processing, too...interessting I wasn't aware of this option.

Back to the basics: I tried it with the destination field as a simple field (not tag) but it stilld doesn't work. Even the default = "na" doesn't work. It seems to me that delimiting at all doesn't work in my tests: I tried other strings (containing e.g. space) instead of the little bit complicated MAC-address as source field but my destination field was unchanged at all.
And beside the target field: Is it allowed that the source field is a tag (in your example e.g status = "green") or do we have the same limitation there like for the destination field?

My target:
getting an MAC-address (e.g. "11:AA:22:BB:33:CC") from my access points in a tag i like to derive the device name (e.g. "Foos iphone") as a additional tag for using in grafana (known devices in WLAN). By the way: This may be a use case for many people if the source (here: my access point) can't resolve the names:
e.g. "11:AA:22:BB:33:CC" = "Foos iphone"

simplified target
I have a tag (pitfall 1?) with values containing space (pitfall 2?) as source to derive a string field (not a tag) (pitfall 3?)
e.g. "VLAN 10" = "media"

It seems to me that enum just accepts strings w/o space (an w/o "-Delimiter) as source and a (int) value as target by now. right?
e.g. green = 1

@KarstenSchnitter
Copy link
Contributor Author

I do not think, that a whitespace character is an issue. I think, that you suffer from pitfall 1: The enum processor plugin can only read fields, but no tags. In your example the plugin would look for a field caps_man_mac_address and replace any value of AA:11:BB:22:CC:33 by hostname and any other value by na. However, your input plugin only provides a tag with that name but no field. Therefore the value remains unchanged.

If you want support for tags in the enum processor plugin, I suggest to create an issue with the corresponding feature request.

Regarding the other pitfalls (2 and 3) it would be nice to have a short example that proves the misbehaviour. Can you provide that? So far I am not convinced, that there is an issue. But I will look into this in the next few days.

@gutschein
Copy link

Ah, ok, thanks for explanation. So...I will try next weekend: Test with field instead of a tag containing a whitespace.

@glinton
Copy link
Contributor

glinton commented Aug 26, 2019

If you want support for tags in the enum processor plugin, I suggest to create an issue with the corresponding feature request.

Tag support for enum was released in 1.11(#5855)

@KarstenSchnitter
Copy link
Contributor Author

@glinton, I missed out on that one. This is a great feature. I guess @gutschein just needs to change field = "caps_man_mac_address" to tag = "caps_man_mac_address" then.

@gutschein
Copy link

gutschein commented Aug 31, 2019

I just tested it...indeed...my example from 7 days ago works perfect with tag = "caps_man_mac_address". Even with destination as tag. Great work! Thank you. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants