-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - "Transform" Processor plugin #2667
Comments
I like it. First version seems best, I think we need captures. I suggest we change a few names:
Some things to think about:
|
Hi Danielsan,
We could select which element (field, tag, or?) we want to "transform" with:
I focused on this particular requirement of having to transform strings but you're right, there's many other transformation requirements that can be thought of. You're right - if the only thing this transform plugin does is to transform strings, it should be named something more specific like "string transform" plugin. Regarding enums, what type of "enum matching" are you thinking of? Could we do a string transform per Enum option? Each transform would try to grab a specific string and if it matches - adds/replaces/inserts the enum string? Other requirements might be mathematical transformations, like "/60", though that doesn't feel really required (we can / should be able to do transformation math on the queries to the database). Regarding the terminology (mind you I'm not a native English speaker):
If we call the element "transformation" it seems to me unclear that the element is referring to how it's going to match/grab. Erm, I don't know whether we're actually thinking the same thing? My reasoning was:
|
An example is mapping strings to ints "green" -> 0, "yellow" -> 1, "red" -> 2. |
I think we should probably scope this to regular expressions, we can create separate processors for enums, type conversions, math, etc. I'm also flip-flopping on backreferences, it seems like its not too bad to not have them or at least I'm not able to come up with a good example to justify the extra config complexity. We could select tags/fields using subtables. Literal replacements could still be done with regex, so we wouldn't need a type option. If we go with that an example config could be: [[processors.regex]]
namepass = ["apache"]
[[processors.regex.tags]]
key = "path"
pattern = '/products/cars/\d+/view/'
replacement = "/products/cars/{id}/view/"
[[processors.regex.fields]]
key = "path"
pattern = '/products/cars/\d+/view/'
replacement = "/products/cars/{id}/view/" |
That could be trivial, it seems:
^ For simplicity that would probably be the best, I agree. There's nothing we can do with "literal" match-strings that we can't do with Regexes.
^ Not sure what "backreferences" are...
^ Why can't we just search for a «column» regardless of whether it is an Influx "Tag" or "Field"? If we can be abstracted from that, that would be ideal...
^ Could we still have the possibility of specifying the match-group? That would allow us to replace only parts of the original string. Use case example: |
On the enum/case example, this might be somewhat slow and somewhat verbose but perhaps it would meet the requirements. If we stick to string replacements you might need to follow it up with a type conversion, so that you get 0i instead of "0".
I was referring to captures groups and the
It turns out you can have a tag and field with the same key:
Yeah I guess we should keep them. Perhaps backreferences in the replacement string could do the job:
|
I honestly don't know this but: wouldn't we need to specify the type anyway, regardless? Meh - just thinking out loud - don't even bother answering me, you know better than me and I'm just raising the question.
Ho, wow, and I thought I knew the gist of everything there was to know about Regexes... I had never heard of backreferences. Live and learn! That sounds really neat and it would solve the trick, indeed - you're right. The only considerations I have about it are that the Also, I'm wondering whether you can access the backreferences caught in the matching from the replacement in whatever Regex libraries (Go STD?) Telegraf is using. Apart from those considerations I like the idea of backreferences - really cool feature of Regexes that I was unaware of. |
I've just been thinking about this operating on strings so far. However, it would be possible to add a type option such as The replacement string wouldn't be a regex, but would use the https://golang.org/pkg/regexp/#Regexp.ReplaceAll function to expand the replacement. I pasted the wrong syntax above, it looks like go format would be |
This improvement could greatly simplify the tracking of IIS / aspnet apps, since they combine usage of IIS Site Name (text) and IIS Site Id (numeric) as tags, and manual mapping is necessary. With such a feature, we could replace IIS Site Id in tags by IIS Site Name (per serveur) to ease the correlation of measurements. |
@tbolon What plugin are you using to capture these stats? Can you give an example of the current and desired schema? |
Currently win_perf_counters. Some counters are returned with an internal id. Exemple:
And the corresponding output:
"instance" tag value can vary from server to server based on the order the websites are created, so, before sending them to influxdb, I could prefer to have a way to transform them to use a better name. I only need a bunch of harcoded replacements in my telegraf config : "_LM_W3SVC_9_ROOT" => "SomeWebsite", etc. I can't do such a thing on my dashboard, since the "_LM_W3SVC_9_ROOT" id can map to different sites based on the host. Other performance counters are already using IIS Site name as instance name:
It will give the following output (mostly redacted):
I hope this helps. |
This would also be useful for the |
I am also interested in this feature. |
Will be included in 1.7, thanks to @44px! I encourage everyone to give the regex processor a shot before the release. |
Feature Request
Requesting a "Transform" processor plugin.
I am trying to import Web access logs into InfluxDB with Telegraf. However, some of the URL PATHs include identifiers (session IDs, product IDs, etc). Ex:
/products/cars/12345/view
/shoppingBasket/1234567890/view
The URL PATH is being shipped as a Tag Value (obviously). I need to to be able to replace those identifiers from the PATH Tag Value before shipping the data to Influx (or whatever other DB) so that they become easily recognizable as the «same» URL PATH for searches and aggregations and to prevent an explosion of "series" in InfluxDB or Graphite.
Proposal:
[[processors.transformer]]
tagpass = "ApacheLog"
tagname = "path"
matcher = "/products/cars/(\d+)/view/"
matchertype = "regex" # "literal"
replaceMatchedIndex = 1 # i0 being whole match. To replace *only* the ID
replacement = "{CarID}"
tagexclude = "ApacheLog"
[[processors.transformer]]
tagpass = "ApacheLog"
tagname = "path"
matcher = "/shoppingBasket/(\\d+)/view"
matchertype = "regex" # literal
replaceMatchedIndex = 1
replacement = "{SessionID}"
tagexclude = "ApacheLog"
Simpler Proposal:
[[processors.transformer]]
tagpass = "ApacheLog"
tagname = "path"
matcher = "/products/cars/\\d+/view/"
matchertype = "regex" # "literal"
# replaceMatchedIndex = 1
replacement = "/products/cars/{CarID}/view/"
tagexclude = "ApacheLog"
SimplerSimpler Proposal:
[[processors.transformer]]
tagpass = "ApacheLog"
tagname = "path"
replaceDigits = 3 # replace all sequences of X+ digits
replaceGuids = true
replaceTrimmedGuids = true # guids stripped of dashes
tagexclude = "ApacheLog"
The text was updated successfully, but these errors were encountered: