Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json Parser wildcard tagkey support #7531

Closed
seanlok opened this issue May 17, 2020 · 4 comments · Fixed by #8579
Closed

Json Parser wildcard tagkey support #7531

seanlok opened this issue May 17, 2020 · 4 comments · Fixed by #8579
Assignees
Labels
feature request Requests for new plugin and for new features to existing plugins good first issue This is a smaller issue suited for getting started in Telegraf, Golang, and contributing to OSS.

Comments

@seanlok
Copy link

seanlok commented May 17, 2020

Feature Request

Opening a feature request kicks off a discussion on json parser to support wildcard tag key. Discussion initially happens on here.

Proposal:

Allow Parser to handle TagKey creation automatically.

Current behavior:

The input json parser doesnt support output from json serializers properly.

Telegraf Setting

[[inputs.kafka_consumer]]
   brokers = ["xxx:9094"]
   topics = ["dev-test"]
   consumer_group = "host"
   insecure_skip_verify = true
   data_format = "json"
   json_name_key = "name"
   json_time_key = "timestamp"
   json_time_format = "unix"
   json_strict = true
   json_string_fields = []
   tag_keys = ["tags"]
   offset = "oldest"

Input Json:

{
    "fields": {
        "value": 2
    },
    "name": "metrics_name.test",
    "tags": {
        "InstanceId": "i-123",
        "app": "test",
        "availability-zone": "ap-southeast-1z",
        "cluster": "dev",
        "env": "development",
        "host": "ip-10.compute.internal",
        "metric_type": "counter"
    },
    "timestamp": 1589376103
}

The metrics output tend to skip all the tag unless it is specified in tag_keys = ["tags_InstanceId", "tags_app", , "tags_availability-zone", "tags_cluster", "tags_env", "tags_host", "tags_metric_type"]

This tend to be inconvenience as configuration need to be updated when new tag are insert. Since the output from serializers formatted this way, parser should made compatible to each other.

Desired behavior:

A new config is probably suggested or * should be allow in tag_keys.

[[inputs.kafka_consumer]]
   ...
   data_format = "json"
   json_name_key = "name"
   json_time_key = "timestamp"
   json_time_format = "unix"
   json_strict = true
   json_string_fields = []
   tag_keys = ["tags_*"]

Use case:

This tend to be inconvenience as configuration need to be updated when new tag are insert. Since the output from serializers formatted this way, parser should made compatible to each other.

@danielnelson danielnelson added feature request Requests for new plugin and for new features to existing plugins good first issue This is a smaller issue suited for getting started in Telegraf, Golang, and contributing to OSS. labels May 26, 2020
@danielnelson danielnelson added this to the Planned milestone May 26, 2020
@hackery
Copy link
Contributor

hackery commented May 29, 2020

I was just about to make the same request. This seems like another facet of the extended JSON parsing support requested in the long-running #1363, but perhaps some of these features are worth implementing independently. I'm tempted to work on this myself, although we have other blockers to using this parser.

I'd suggest that the tag_keys selector should be a GJSON path, rather than the wildcard with underscores representing nesting. I'd prefer that the output were just the deepest field names (e.g. InstanceId rather than tags_InstanceId, but that causes potential clash between fields in different substructures without some mapping syntax; it seems they can't be renamed with the rename processor as it doesn't handle wildcards ("tags_(*)" => "\1").

The underscore referencing seems to be ambiguous in any case - how does a config of tag_keys = ["env_id"] treat this input?

{
  "name": "test",
  "env": {
    "id": "i-123"
  },
  "env_id": "qa"
}

(same applies for json_string_fields and maybe json_name_key and json_time_field)

@seanlok
Copy link
Author

seanlok commented Jun 1, 2020

I feel good with either approach as long as it achieve the objective. We should have native support between the parser and serializers.

@danielnelson
Copy link
Contributor

Let's do the simple glob update on tag_keys first so we don't break compatibility. We could potentially add new options for addition GJSON queries, but let's discuss this on a new issue.

Regarding serializer -> parser roundtrip, I recommend against using JSON for this since it does not differentiate between float and integer types.

For cleaning up the prefixes, I recommend using the strings processor for renaming the tags:

[[processors.strings]]
  [[processors.strings.trim_prefix]]
    tag_key = "*"
    trim_prefix = "tags_"

@hackery
Copy link
Contributor

hackery commented Jun 4, 2020

For cleaning up the prefixes, I recommend using the strings processor for renaming the tags:

Handy, hadn't spotted that. Useful to me elsewhere too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins good first issue This is a smaller issue suited for getting started in Telegraf, Golang, and contributing to OSS.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants