Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability in telegraf to use ID field from the metric, if available #7891

Closed
vipin-agarwal opened this issue Jul 24, 2020 · 5 comments
Closed

Comments

@vipin-agarwal
Copy link

Feature Request

Currently, when we are saving metrics to output plugin (for e.g. Elasticsearch), there is no ability for telegraf to use the ID field from the payload (metrics) as ID of the metric being pushed.
As a result, any metrics pushed to ES will always be taken as "New" document in the absence of ID field.

Proposal:

We provide a JSON parser configuration, similar to json_time_key, named json_id_key.
This will be an optional configuration just like time key, if present, then the ID will be available to output plugins for use and assignment.
If not present, then the current behaviour can continue to work in the absence of ID field.

Use case:

One of the scenario where this is immensely useful is when saving metrics data to Elasticsearch.
In a scenario where the metrics data from an input plugin is duplicated or same metrics is returned multiple times, then in the absence of this ID functionality, ES output plugin sends the metrics without an identifier. This will cause ES to treat it as a new document and create a new ID and insert it into its datastore, hence resulting in duplicates of metrics.
When the metrics is sent with an ID, then the ES will perform the "UPSERT" correctly, modifying any existing documents with an ID or create one if it does not ecxists.

Note - We would also need to make a change in desired output plugins to use the available ID property or not. That could depend on individual plugin choices.

I have made changes to my local telegraf code, and have run the suggested steps to ensure its working fine. Let me know if this is something beneficial to all, then I can raise a PR accordingly.
Thanks.

@vipin-agarwal vipin-agarwal changed the title Ability to telegraf to use ID field from the metric, if available Ability in telegraf to use ID field from the metric, if available Jul 24, 2020
@ssoroka
Copy link
Contributor

ssoroka commented Jul 24, 2020

Thanks for the issue. I have some questions to clarify what you mean:

  1. Does the original data have IDs?
  2. How do you decide what to give a new ID, and what to give an existing ID?

@vipin-agarwal
Copy link
Author

vipin-agarwal commented Jul 24, 2020

Thanks Steven.

  1. Yes. The original data that is ingested into the telegraf will have an identifier for each metric.
    So we can specify in the input-plugin configuration as to which field from the payload is the identifier and should be parsed as an ID field. for e.g. articleID, documentID, metricID....

This is on similar lines of json_time_key

  1. We are not creating any ID as part of the telegraf changes. This ID will be available as part of input payload.
    Telegraf will just simply pick up the specified ID field for each metric from the payload (as explained above) and when calling the elasticsearch it will pass that ID along with the rest of the document data, like the following
    elastic.NewBulkIndexRequest().Id(metric.ID()).Index(indexName).Doc(m)

Now, everything is on the ES side. ES when it receives documents which already has an ID, will internally check to see if any document with that ID exists or not. If it does, then it performs UPDATE otherwise it performs INSERT using that ID as its internal identifier.

Hope this clarifies.

@ssoroka
Copy link
Contributor

ssoroka commented Jul 27, 2020

I think we'd be in support of an additional feature for the elasticsearch output that optionally allowed you to set a field or tag to get the ID from. Does this work properly with the NewBulkIndexRequest() object?

@vipin-agarwal
Copy link
Author

Yes, it does work as expected. I have tested it by running a local copy of telegraf with modifications for ID field.

elastic.NewBulkIndexRequest().Id(metric.ID()).Index(indexName).Doc(m)

the ID() method of the NewBulkIndexRequest() accepts a string ID- if present it will pass that as _id field in JSON to ES, if not then it will omit in the JSON. This is all handled internally by olivere elastic go package.

This availability and fetching of ID from input payload is there at the config level, so if any other plugins wish to leverage it - they can do that easily using metric.ID() method.

Let me know if this sounds beneficial and please guide me to process I need to follow for PR . :-)

@ssoroka
Copy link
Contributor

ssoroka commented Aug 6, 2020

As far as process, fork the repo, make your changes, add a test to make sure it works and stays working, and then create a pull request.

as far as implementation:
I'd probably add something like this to the ElasticSearch struct:

	// field or tag to get ID from
	IDField string `toml:"id_field"`
	IDTag   string `toml:"id_tag"`

then you can modify Write to check for these and include the ID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants