-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability in telegraf to use ID field from the metric, if available #7891
Comments
Thanks for the issue. I have some questions to clarify what you mean:
|
Thanks Steven.
This is on similar lines of
Now, everything is on the ES side. ES when it receives documents which already has an ID, will internally check to see if any document with that ID exists or not. If it does, then it performs UPDATE otherwise it performs INSERT using that ID as its internal identifier. Hope this clarifies. |
I think we'd be in support of an additional feature for the elasticsearch output that optionally allowed you to set a field or tag to get the ID from. Does this work properly with the NewBulkIndexRequest() object? |
Yes, it does work as expected. I have tested it by running a local copy of telegraf with modifications for
the ID() method of the NewBulkIndexRequest() accepts a string ID- if present it will pass that as This availability and fetching of ID from input payload is there at the config level, so if any other plugins wish to leverage it - they can do that easily using Let me know if this sounds beneficial and please guide me to process I need to follow for PR . :-) |
As far as process, fork the repo, make your changes, add a test to make sure it works and stays working, and then create a pull request. as far as implementation:
then you can modify |
…oiding duplicated ES documents, fix influxdata#7891 (influxdata#8019)
…oiding duplicated ES documents, fix influxdata#7891 (influxdata#8019)
Feature Request
Currently, when we are saving metrics to output plugin (for e.g. Elasticsearch), there is no ability for telegraf to use the ID field from the payload (metrics) as ID of the metric being pushed.
As a result, any metrics pushed to ES will always be taken as "New" document in the absence of ID field.
Proposal:
We provide a JSON parser configuration, similar to json_time_key, named json_id_key.
This will be an optional configuration just like time key, if present, then the ID will be available to output plugins for use and assignment.
If not present, then the current behaviour can continue to work in the absence of ID field.
Use case:
One of the scenario where this is immensely useful is when saving metrics data to Elasticsearch.
In a scenario where the metrics data from an input plugin is duplicated or same metrics is returned multiple times, then in the absence of this ID functionality, ES output plugin sends the metrics without an identifier. This will cause ES to treat it as a new document and create a new ID and insert it into its datastore, hence resulting in duplicates of metrics.
When the metrics is sent with an ID, then the ES will perform the "UPSERT" correctly, modifying any existing documents with an ID or create one if it does not ecxists.
Note - We would also need to make a change in desired output plugins to use the available ID property or not. That could depend on individual plugin choices.
I have made changes to my local telegraf code, and have run the suggested steps to ensure its working fine. Let me know if this is something beneficial to all, then I can raise a PR accordingly.
Thanks.
The text was updated successfully, but these errors were encountered: