Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept varying topic lengths in MQTT topic parsing configs #10716

Closed
samhld opened this issue Feb 23, 2022 · 10 comments · Fixed by #15528
Closed

Accept varying topic lengths in MQTT topic parsing configs #10716

samhld opened this issue Feb 23, 2022 · 10 comments · Fixed by #15528
Assignees
Labels
area/mqtt feature request Requests for new plugin and for new features to existing plugins

Comments

@samhld
Copy link
Contributor

samhld commented Feb 23, 2022

Telegraf is soon to support multi-segment wildcards (#) in MQTT topics in the topic parsing feature. This is distinct from the + wildcard that maps to exactly one topic segment. The # could match to 0 or more (not just one).

Given this support, it would be nice to use it to our advantage and allow for parsing topics of varying lengths. Below is an example use case.

Imagine I have this topic: sensors/CLE/device5/temp following this schema: sensors/<site>/<device_name>/<field>
Then I update firmware of my devices to a version that changes the target topics for metrics (this is built in and unchangeable by me). The new topics follow this schema: sensors/<site>/<sub-site>/<version>/<device_name>/<field>. If the same device is updated, it now publishes temp data to topic: sensors/CLE/west/v2/device5/temp

If I have devices running multiple versions, I may have multiple topics of varying lengths that ultimately have the same data in them and could be dealt with the same way. Say I don't care about the sub-site and version data. In other words, the newly introduced topic segments aren't meaningful to me.

It would be convenient to apply the same topic parsing configuration to both cases where possible, like this:

[[inputs.mqtt_consumer]]
    ....
    topics = "sensors/#"
    [[inputs.mqtt_consumer.topic_parsing]]
        measurement = "measurement/_/#/_/_"
        tags = "_/site/#/device_name/_"
        fields = "_/_/#/_/field"

The above configuration will accept a topic of any length that matches the pattern of the first two segments and last two segments. Any segments -- whether there are 0 or more -- in between those first two and last two would be ignored.

@samhld samhld added the feature request Requests for new plugin and for new features to existing plugins label Feb 23, 2022
@MyaLongmire MyaLongmire self-assigned this Feb 23, 2022
@srebhan
Copy link
Member

srebhan commented Feb 23, 2022

Hey @samhld, topic parsing supports a topic property

  # [[inputs.mqtt_consumer.topic_parsing]]
  #   topic = ""
  #   measurement = ""
  #   tags = ""
  #   fields = ""

which can be used to filter the topics handled by this topic-parser. In your case, you want to define such a topic_parsing section (there can be multiple per plugin) for each version... So like

    [[inputs.mqtt_consumer.topic_parsing]]
      topic = "sensors/+/+/+"
      measurement = "measurement/_/_/_"
      tags = "_/site/device_name/_"
      fields = "_/_/_/field"

    [[inputs.mqtt_consumer.topic_parsing]]
      topic = "sensors/+/+/+/+"
      measurement = "measurement/_/_/_/_"
      tags = "_/site/sub_site/device_name/_"
      fields = "_/_/_/_/field"

Does that make sense?

@samhld
Copy link
Contributor Author

samhld commented Feb 23, 2022

@srebhan It does and I'm aware of that. I'm just looking to add some sugar to this so that users don't need to repeat themselves. In your suggested configuration, you're actually changing the parsing being done and therefore the line protocol output. In my example, the output is the same. In cases like that, it would be cleaner to keep the configuration less repetitive in my opinion.

This is of course assuming this can be done. I'm thinking it can be done by indexing into the beginning and the end of the topic slice generated by the parser. I believe this would only work if the parts to be ignored are contiguous, but I think that would be the most common form of this case.

@srebhan
Copy link
Member

srebhan commented Mar 2, 2022

I see. So IMO having

    [[inputs.mqtt_consumer.topic_parsing]]
      topic = "sensors/#/+/+"
      measurement = "measurement/__"
      tags = "_/site/__/device_name/_"
      fields = "__/field"

with __ (double low-dash) being "one or more elements", would be what you want is this correct?
The biggest issue is that either the beginning or the end needs to be absolute, i.e. __/A/__/B would be forbidden as we don't know where exactly A would be... That complicates the parsing quite a bit. @MyaLongmire correct me if I'm wrong.

Of course we could also use other placeholders...

@samhld
Copy link
Contributor Author

samhld commented Mar 4, 2022

@srebhan I don't have a strong opinion on syntax. A double underscore works for me. I do think it runs the risk of looking very similar to a single underscore, however. That's why I used "#". The "#" symbol is a concept MQTT users are already familiar with. It would be a slightly different use of it but I think the point would get across. But, just to reiterate, I'm not too passionate about that part.

I think the most common use case would be to simply account for an unknown number of segements at the beginning, middle, or end. I don't know if we need to support two separate unknown lengths as in __/A/__B. Maybe we could lean on documentation that that case is not supported?

@srebhan
Copy link
Member

srebhan commented Mar 11, 2022

@samhld I agree with your view on the syntax, maybe # would be a better choice as this is already familiar to most MQTT people.

Regarding the unknown/dynamic length, we can never support something like /#/A/#/B! Imagine a topic /we/just/got/that/fancy/topic, we can safely say B equals to "topic", but what do you assign to A? It could be any of "just", "got" or "that" and we do not want ambiguous assignments. ;-)

@samhld
Copy link
Contributor Author

samhld commented Mar 11, 2022

@srebhan yep, agreed!

@juha-ylikoski
Copy link

Any update on this? I have a use case where we have a base topic:

<root>/<manufacturer>/<model>/<serial>/data/<format>/#

And would like to be able to match this with topic parsing. A message could have a topic structure:

root/me/this/ffffff/data/json/temperature

or

root/me/this/ffffff/data/json/temperature/sensor1

But because this is tail of the topic depends on other parts of the topic, I would need to define all possible structures instead of putting one topic_parsing with e.g.:

[[inputs.mqtt_consumer.topic_parsing]]
topic = "root/+/+/+/data/+/#"
tags = "_/manufacturer/model/serial/_/data_format/#"

@juha-ylikoski
Copy link

For anyone reading this, this can also be accomplished by creating a processor with e.g. starlark which does this:

[[processors.starlark]]
## The Starlark source can be set as a string in this configuration file, or
## by referencing a file containing the script.  Only one source or script
## should be set at once.
name_override = "topic-extractor"
namepass = ["mqtt_consumer"]
## Source of the Starlark script.
source = '''
# Input topic should be
#   root/+/+/+/data/+/#
def apply(metric):
    parts = metric.tags["topic"].split("/")
    if parts[4] == "data":
        metric.tags["manufacturer"] = parts[1]
        metric.tags["model"] = parts[2]
        metric.tags["serial"] = parts[3]
    return metric
'''

@srebhan
Copy link
Member

srebhan commented Jun 17, 2024

@samhld and @juha-ylikoski please test the binary in PR #15528, available once CI finished the tests, and let me know if this fixes the issue!

@srebhan srebhan added the waiting for response waiting for response from contributor label Jun 17, 2024
@juha-ylikoski
Copy link

@srebhan this seems to have enabled the functionality I needed. I tested with config:


[[inputs.mqtt_consumer.topic_parsing]]
topic = "root/+/+/+/data/+"
tags = "_/manufacturer/model/serial/_/data_format"
[[inputs.mqtt_consumer.topic_parsing]]
topic = "root/+/+/+/data/+/#"
tags = "_/manufacturer/model/serial/_/data_format"

And was able to match and extract the tags for both topics:

root/1/2/3/data/format

  • manufacturer = 1
  • model = 2
  • serial = 3
  • data_format = format

And
root/1/2/3/data/format/foo

  • manufacturer = 1
  • model = 2
  • serial = 3
  • data_format = format

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/mqtt feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants