Add dynamic generation topic capabilities to Kafka output plugin #3177

trueneu · 2017-08-27T16:54:20Z

Feature Request

Proposal:

Kafka output plugin to be able to choose topics dynamically based on metric name.

Current behavior:

Kafka output plugin must be provided with a topic name to produce to explicitly. If user wants to produce different metrics to different topics, he or she must create a number of output Kafka plugin configurations to produce certain metrics filtered by name.

Desired behavior:

It would be nice to have an option which would dynamically choose topic name based on metric name, possibly with a static prefix.

Use case:

It's useful in a highly dynamic environment where an unknown beforehand number of different messages (say from completely different services and/or APIs) may be sent to a telegraf instance's input. To push all of them into a single topic would mean to clutter it, and make the future processing difficult in case that different processing pipelines must be used for different data. To filter them with built-in filter.go capabilities would work, but this option lacks any dynamic, as one would have to add more and more filters for all the metrics one receives. This would also clutter telegraf configuration.

The proposed option would solve this problem.

danielnelson · 2017-08-28T23:22:10Z

Have you considered using the routing_tag option and if so what problems did you run into with it?

trueneu · 2017-08-29T09:38:28Z

@danielnelson the thing with routing_tag is that it's used as a message key produced to kafka topic in kafka output plugin. So essentially, in this case, a hash is computed out of routing_tag and it is used to determine which partition in the given topic the message will end up in.

Kafka semantics suppose that you scale with partitions, not separate data using them. So in the end, all the data will end up in a single topic anyway. It is okay if actual data meaning is the same, but if it's coming from totally different services, I'd prefer it to be isolated.

In this case, it's not isolated even across partitions, because there will be a number of keys where hash(key1) % number_of_partitions == hash(key2) % number_of_partitions.

Moreover, if data volumes differ dramatically from one routing_tag to another tag, this will skew data distribution across the partitions.

UPDATE:
Another thing with shoving everything under the same topic, as I've mentioned, is that if you have to process different data streams different ways, you have to implement stream separation logic somewhere on the Kafka consumer side, doing dispatching based on contents. That's over-engineering compared to 'to each logically separated stream its own topic' approach.

danielnelson · 2017-08-30T00:15:12Z

What we have done on other queue outputs, namely amqp, is provide routing based on tags. The code is basically the same as is used for routing_tag only with amqp you can actually route using them. I think generally users are more interested in routing based on tag than on measurement name.

We made a similar change recently in #3170, adding a more extensible format for setting the partition key in Kinesis. In Kinesis it is similar to the routing key in Kafka in that it determines the partition. We could do something like that here, but for topic. It would look someething like:

[[outputs.kafka]]
  ... other stuff ...

  [outputs.kafka.topic_method]
    method = "measurement"
    prefix = "my_beautiful_metrics"

  # can be specified using inline style instead
  topic_method = {method = "measurement", prefix = "foo"}

We would need to keep backwards compatibility if we do this. Let me know what you think.

trueneu · 2017-08-30T15:56:29Z

@danielnelson Sounds good!
I looked into Kinesis plugin sample config. Imagine that in the end we'd have something like:

[[outputs.kafka]]
# ...
[outputs.kafka.topic_method]
method = "static"
key = "howdy"

[outputs.kafka.topic_method]
method = "measurement"
prefix = "foo_"

[outputs.kafka.topic_method]
method = "tag"
key = "host"
prefix = "foo_"

# if a certain tag is not present inside a measurement, an empty string will be used instead
# this should be an equivalent of method == tag if there's only one key
[outputs.kafka.topic_method]
method = "tags"
keys = ["rack", "host", "service"]
key_separator = "_"
prefix = "foo_"

So the backward compatibility will be there, if nothing of above is used, we use plain topic. Any topic_method would override that.

I'm not sure about corner cases though; with Kinesis, it seems to be logical to use empty partitioning key if user specifies "tag" topic_method and the tag is not there. But that's not the case with Kafka, it doesn't make sense to write into a topic with empty name. Do we fallback to plugin-wide topic option? Or simply discard the measurement? And yell in the logs, I suppose?

What do you think?

danielnelson · 2017-08-30T20:39:57Z

That's a really good observation. I think it would be a bad thing to drop the metric, since one of the main reasons you would use kafka is for greater durability.

I think the fallback idea could work and makes sense from a backwards compatibility point of view. Perhaps we rename the new option topic_override to indicate that if it matches then it overrides the normal topic if it matches. One weakness of this is that it really only applies at this time to the tags method.

Another possible option would be to either require a prefix, or call the new option topic_suffix and have it be appending to the existing topic field.

trueneu · 2017-08-30T21:25:43Z

@danielnelson I personally like the topic_suffix option more, it seems straightforward to me, and eliminates the need of explicit falling back logic.
I'll implement it and redo the PR then. Please tell if you have anything else in mind.

danielnelson · 2017-08-30T21:57:10Z

Sounds good

trueneu · 2017-09-04T18:19:03Z

@danielnelson Pushed new PR, as you can see; tests sometimes fail on graphite boolean serialisation, though. (After a few tries I made them pass :) ) So I filed another PR addressing that problem: #3198

danielnelson · 2017-09-19T18:43:22Z

Added in #3196

trueneu mentioned this issue Aug 27, 2017

Add Kafka output plugin dynamic topic name generation option #3178

Closed

3 tasks

trueneu mentioned this issue Sep 4, 2017

Add Kafka output plugin topic_suffix option #3196

Merged

3 tasks

danielnelson added this to the 1.5.0 milestone Sep 6, 2017

danielnelson added the area/kafka label Sep 6, 2017

danielnelson closed this as completed Sep 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dynamic generation topic capabilities to Kafka output plugin #3177

Add dynamic generation topic capabilities to Kafka output plugin #3177

trueneu commented Aug 27, 2017

danielnelson commented Aug 28, 2017

trueneu commented Aug 29, 2017 •

edited

Loading

danielnelson commented Aug 30, 2017

trueneu commented Aug 30, 2017

danielnelson commented Aug 30, 2017

trueneu commented Aug 30, 2017

danielnelson commented Aug 30, 2017

trueneu commented Sep 4, 2017

danielnelson commented Sep 19, 2017

Add dynamic generation topic capabilities to Kafka output plugin #3177

Add dynamic generation topic capabilities to Kafka output plugin #3177

Comments

trueneu commented Aug 27, 2017

Feature Request

Proposal:

Current behavior:

Desired behavior:

Use case:

danielnelson commented Aug 28, 2017

trueneu commented Aug 29, 2017 • edited Loading

danielnelson commented Aug 30, 2017

trueneu commented Aug 30, 2017

danielnelson commented Aug 30, 2017

trueneu commented Aug 30, 2017

danielnelson commented Aug 30, 2017

trueneu commented Sep 4, 2017

danielnelson commented Sep 19, 2017

trueneu commented Aug 29, 2017 •

edited

Loading