Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic generation topic capabilities to Kafka output plugin #3177

Closed
trueneu opened this issue Aug 27, 2017 · 9 comments
Closed

Add dynamic generation topic capabilities to Kafka output plugin #3177

trueneu opened this issue Aug 27, 2017 · 9 comments
Milestone

Comments

@trueneu
Copy link
Contributor

trueneu commented Aug 27, 2017

Feature Request

Proposal:

Kafka output plugin to be able to choose topics dynamically based on metric name.

Current behavior:

Kafka output plugin must be provided with a topic name to produce to explicitly. If user wants to produce different metrics to different topics, he or she must create a number of output Kafka plugin configurations to produce certain metrics filtered by name.

Desired behavior:

It would be nice to have an option which would dynamically choose topic name based on metric name, possibly with a static prefix.

Use case:

It's useful in a highly dynamic environment where an unknown beforehand number of different messages (say from completely different services and/or APIs) may be sent to a telegraf instance's input. To push all of them into a single topic would mean to clutter it, and make the future processing difficult in case that different processing pipelines must be used for different data. To filter them with built-in filter.go capabilities would work, but this option lacks any dynamic, as one would have to add more and more filters for all the metrics one receives. This would also clutter telegraf configuration.

The proposed option would solve this problem.

@danielnelson
Copy link
Contributor

Have you considered using the routing_tag option and if so what problems did you run into with it?

@trueneu
Copy link
Contributor Author

trueneu commented Aug 29, 2017

@danielnelson the thing with routing_tag is that it's used as a message key produced to kafka topic in kafka output plugin. So essentially, in this case, a hash is computed out of routing_tag and it is used to determine which partition in the given topic the message will end up in.

Kafka semantics suppose that you scale with partitions, not separate data using them. So in the end, all the data will end up in a single topic anyway. It is okay if actual data meaning is the same, but if it's coming from totally different services, I'd prefer it to be isolated.

In this case, it's not isolated even across partitions, because there will be a number of keys where hash(key1) % number_of_partitions == hash(key2) % number_of_partitions.

Moreover, if data volumes differ dramatically from one routing_tag to another tag, this will skew data distribution across the partitions.

UPDATE:
Another thing with shoving everything under the same topic, as I've mentioned, is that if you have to process different data streams different ways, you have to implement stream separation logic somewhere on the Kafka consumer side, doing dispatching based on contents. That's over-engineering compared to 'to each logically separated stream its own topic' approach.

@danielnelson
Copy link
Contributor

What we have done on other queue outputs, namely amqp, is provide routing based on tags. The code is basically the same as is used for routing_tag only with amqp you can actually route using them. I think generally users are more interested in routing based on tag than on measurement name.

We made a similar change recently in #3170, adding a more extensible format for setting the partition key in Kinesis. In Kinesis it is similar to the routing key in Kafka in that it determines the partition. We could do something like that here, but for topic. It would look someething like:

[[outputs.kafka]]
  ... other stuff ...

  [outputs.kafka.topic_method]
    method = "measurement"
    prefix = "my_beautiful_metrics"

  # can be specified using inline style instead
  topic_method = {method = "measurement", prefix = "foo"}

We would need to keep backwards compatibility if we do this. Let me know what you think.

@trueneu
Copy link
Contributor Author

trueneu commented Aug 30, 2017

@danielnelson Sounds good!
I looked into Kinesis plugin sample config. Imagine that in the end we'd have something like:

[[outputs.kafka]]
# ...
[outputs.kafka.topic_method]
method = "static"
key = "howdy"

[outputs.kafka.topic_method]
method = "measurement"
prefix = "foo_"

[outputs.kafka.topic_method]
method = "tag"
key = "host"
prefix = "foo_"

# if a certain tag is not present inside a measurement, an empty string will be used instead
# this should be an equivalent of method == tag if there's only one key
[outputs.kafka.topic_method]
method = "tags"
keys = ["rack", "host", "service"]
key_separator = "_"
prefix = "foo_"

So the backward compatibility will be there, if nothing of above is used, we use plain topic. Any topic_method would override that.

I'm not sure about corner cases though; with Kinesis, it seems to be logical to use empty partitioning key if user specifies "tag" topic_method and the tag is not there. But that's not the case with Kafka, it doesn't make sense to write into a topic with empty name. Do we fallback to plugin-wide topic option? Or simply discard the measurement? And yell in the logs, I suppose?

What do you think?

@danielnelson
Copy link
Contributor

That's a really good observation. I think it would be a bad thing to drop the metric, since one of the main reasons you would use kafka is for greater durability.

I think the fallback idea could work and makes sense from a backwards compatibility point of view. Perhaps we rename the new option topic_override to indicate that if it matches then it overrides the normal topic if it matches. One weakness of this is that it really only applies at this time to the tags method.

Another possible option would be to either require a prefix, or call the new option topic_suffix and have it be appending to the existing topic field.

@trueneu
Copy link
Contributor Author

trueneu commented Aug 30, 2017

@danielnelson I personally like the topic_suffix option more, it seems straightforward to me, and eliminates the need of explicit falling back logic.
I'll implement it and redo the PR then. Please tell if you have anything else in mind.

@danielnelson
Copy link
Contributor

Sounds good

@trueneu
Copy link
Contributor Author

trueneu commented Sep 4, 2017

@danielnelson Pushed new PR, as you can see; tests sometimes fail on graphite boolean serialisation, though. (After a few tries I made them pass :) ) So I filed another PR addressing that problem: #3198

@danielnelson danielnelson added this to the 1.5.0 milestone Sep 6, 2017
@danielnelson
Copy link
Contributor

Added in #3196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants