Accept varying topic lengths in MQTT topic parsing configs #10716

samhld · 2022-02-23T00:38:10Z

Telegraf is soon to support multi-segment wildcards (#) in MQTT topics in the topic parsing feature. This is distinct from the + wildcard that maps to exactly one topic segment. The # could match to 0 or more (not just one).

Given this support, it would be nice to use it to our advantage and allow for parsing topics of varying lengths. Below is an example use case.

Imagine I have this topic: sensors/CLE/device5/temp following this schema: sensors/<site>/<device_name>/<field>
Then I update firmware of my devices to a version that changes the target topics for metrics (this is built in and unchangeable by me). The new topics follow this schema: sensors/<site>/<sub-site>/<version>/<device_name>/<field>. If the same device is updated, it now publishes temp data to topic: sensors/CLE/west/v2/device5/temp

If I have devices running multiple versions, I may have multiple topics of varying lengths that ultimately have the same data in them and could be dealt with the same way. Say I don't care about the sub-site and version data. In other words, the newly introduced topic segments aren't meaningful to me.

It would be convenient to apply the same topic parsing configuration to both cases where possible, like this:

[[inputs.mqtt_consumer]]
    ....
    topics = "sensors/#"
    [[inputs.mqtt_consumer.topic_parsing]]
        measurement = "measurement/_/#/_/_"
        tags = "_/site/#/device_name/_"
        fields = "_/_/#/_/field"

The above configuration will accept a topic of any length that matches the pattern of the first two segments and last two segments. Any segments -- whether there are 0 or more -- in between those first two and last two would be ignored.

The text was updated successfully, but these errors were encountered:

srebhan · 2022-02-23T08:58:05Z

Hey @samhld, topic parsing supports a topic property

  # [[inputs.mqtt_consumer.topic_parsing]]
  #   topic = ""
  #   measurement = ""
  #   tags = ""
  #   fields = ""

which can be used to filter the topics handled by this topic-parser. In your case, you want to define such a topic_parsing section (there can be multiple per plugin) for each version... So like

    [[inputs.mqtt_consumer.topic_parsing]]
      topic = "sensors/+/+/+"
      measurement = "measurement/_/_/_"
      tags = "_/site/device_name/_"
      fields = "_/_/_/field"

    [[inputs.mqtt_consumer.topic_parsing]]
      topic = "sensors/+/+/+/+"
      measurement = "measurement/_/_/_/_"
      tags = "_/site/sub_site/device_name/_"
      fields = "_/_/_/_/field"

Does that make sense?

samhld · 2022-02-23T16:28:52Z

@srebhan It does and I'm aware of that. I'm just looking to add some sugar to this so that users don't need to repeat themselves. In your suggested configuration, you're actually changing the parsing being done and therefore the line protocol output. In my example, the output is the same. In cases like that, it would be cleaner to keep the configuration less repetitive in my opinion.

This is of course assuming this can be done. I'm thinking it can be done by indexing into the beginning and the end of the topic slice generated by the parser. I believe this would only work if the parts to be ignored are contiguous, but I think that would be the most common form of this case.

srebhan · 2022-03-02T20:24:57Z

I see. So IMO having

    [[inputs.mqtt_consumer.topic_parsing]]
      topic = "sensors/#/+/+"
      measurement = "measurement/__"
      tags = "_/site/__/device_name/_"
      fields = "__/field"

with __ (double low-dash) being "one or more elements", would be what you want is this correct?
The biggest issue is that either the beginning or the end needs to be absolute, i.e. __/A/__/B would be forbidden as we don't know where exactly A would be... That complicates the parsing quite a bit. @MyaLongmire correct me if I'm wrong.

Of course we could also use other placeholders...

samhld · 2022-03-04T01:04:29Z

@srebhan I don't have a strong opinion on syntax. A double underscore works for me. I do think it runs the risk of looking very similar to a single underscore, however. That's why I used "#". The "#" symbol is a concept MQTT users are already familiar with. It would be a slightly different use of it but I think the point would get across. But, just to reiterate, I'm not too passionate about that part.

I think the most common use case would be to simply account for an unknown number of segements at the beginning, middle, or end. I don't know if we need to support two separate unknown lengths as in __/A/__B. Maybe we could lean on documentation that that case is not supported?

srebhan · 2022-03-11T09:40:53Z

@samhld I agree with your view on the syntax, maybe # would be a better choice as this is already familiar to most MQTT people.

Regarding the unknown/dynamic length, we can never support something like /#/A/#/B! Imagine a topic /we/just/got/that/fancy/topic, we can safely say B equals to "topic", but what do you assign to A? It could be any of "just", "got" or "that" and we do not want ambiguous assignments. ;-)

samhld · 2022-03-11T21:54:56Z

@srebhan yep, agreed!

juha-ylikoski · 2024-05-30T11:08:39Z

Any update on this? I have a use case where we have a base topic:

<root>/<manufacturer>/<model>/<serial>/data/<format>/#

And would like to be able to match this with topic parsing. A message could have a topic structure:

root/me/this/ffffff/data/json/temperature

or

root/me/this/ffffff/data/json/temperature/sensor1

But because this is tail of the topic depends on other parts of the topic, I would need to define all possible structures instead of putting one topic_parsing with e.g.:

[[inputs.mqtt_consumer.topic_parsing]]
topic = "root/+/+/+/data/+/#"
tags = "_/manufacturer/model/serial/_/data_format/#"

juha-ylikoski · 2024-05-30T11:33:45Z

For anyone reading this, this can also be accomplished by creating a processor with e.g. starlark which does this:

[[processors.starlark]]
## The Starlark source can be set as a string in this configuration file, or
## by referencing a file containing the script.  Only one source or script
## should be set at once.
name_override = "topic-extractor"
namepass = ["mqtt_consumer"]
## Source of the Starlark script.
source = '''
# Input topic should be
#   root/+/+/+/data/+/#
def apply(metric):
    parts = metric.tags["topic"].split("/")
    if parts[4] == "data":
        metric.tags["manufacturer"] = parts[1]
        metric.tags["model"] = parts[2]
        metric.tags["serial"] = parts[3]
    return metric
'''

srebhan · 2024-06-17T22:03:48Z

@samhld and @juha-ylikoski please test the binary in PR #15528, available once CI finished the tests, and let me know if this fixes the issue!

juha-ylikoski · 2024-06-18T08:45:01Z

@srebhan this seems to have enabled the functionality I needed. I tested with config:


[[inputs.mqtt_consumer.topic_parsing]]
topic = "root/+/+/+/data/+"
tags = "_/manufacturer/model/serial/_/data_format"
[[inputs.mqtt_consumer.topic_parsing]]
topic = "root/+/+/+/data/+/#"
tags = "_/manufacturer/model/serial/_/data_format"

And was able to match and extract the tags for both topics:

root/1/2/3/data/format

manufacturer = 1
model = 2
serial = 3
data_format = format

And
root/1/2/3/data/format/foo

manufacturer = 1
model = 2
serial = 3
data_format = format

samhld added the feature request Requests for new plugin and for new features to existing plugins label Feb 23, 2022

telegraf-tiger bot added the area/mqtt label Feb 23, 2022

MyaLongmire self-assigned this Feb 23, 2022

powersj unassigned MyaLongmire Jan 20, 2023

srebhan self-assigned this Jun 6, 2024

srebhan mentioned this issue Jun 17, 2024

feat(inputs.mqtt_consumer): Add variable length topic parsing #15528

Merged

1 task

srebhan added the waiting for response waiting for response from contributor label Jun 17, 2024

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jun 18, 2024

powersj closed this as completed in #15528 Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept varying topic lengths in MQTT topic parsing configs #10716

Accept varying topic lengths in MQTT topic parsing configs #10716

samhld commented Feb 23, 2022 •

edited

Loading

srebhan commented Feb 23, 2022

samhld commented Feb 23, 2022 •

edited

Loading

srebhan commented Mar 2, 2022 •

edited

Loading

samhld commented Mar 4, 2022

srebhan commented Mar 11, 2022

samhld commented Mar 11, 2022

juha-ylikoski commented May 30, 2024

juha-ylikoski commented May 30, 2024

srebhan commented Jun 17, 2024

juha-ylikoski commented Jun 18, 2024

Accept varying topic lengths in MQTT topic parsing configs #10716

Accept varying topic lengths in MQTT topic parsing configs #10716

Comments

samhld commented Feb 23, 2022 • edited Loading

srebhan commented Feb 23, 2022

samhld commented Feb 23, 2022 • edited Loading

srebhan commented Mar 2, 2022 • edited Loading

samhld commented Mar 4, 2022

srebhan commented Mar 11, 2022

samhld commented Mar 11, 2022

juha-ylikoski commented May 30, 2024

juha-ylikoski commented May 30, 2024

srebhan commented Jun 17, 2024

juha-ylikoski commented Jun 18, 2024

samhld commented Feb 23, 2022 •

edited

Loading

samhld commented Feb 23, 2022 •

edited

Loading

srebhan commented Mar 2, 2022 •

edited

Loading