Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converter Processor - Tag to string Field - chars escapes leads to invalid points #7406

Closed
Trovalo opened this issue Apr 24, 2020 · 5 comments · Fixed by #7427
Closed

Converter Processor - Tag to string Field - chars escapes leads to invalid points #7406

Trovalo opened this issue Apr 24, 2020 · 5 comments · Fixed by #7427
Assignees
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@Trovalo
Copy link
Collaborator

Trovalo commented Apr 24, 2020

Relevant telegraf.conf:

## INPUT ##
[[inputs.sqlserver]]

  servers = [
   "Server=192.168.1.10;Port=1433;User Id=<user>;Password=<pw>;app name=telegraf;log=1;",
  ]

  query_version = 2
  include_query = [ 'SqlRequests' ]

## PROCESSOR ##
[[processors.converter]]
  order = 1
  [processors.converter.tags]
    string = ["statement_text"]

[[processors.strings]]
  order = 2
  [[processors.strings.left]]
    field = "statement_text"
    dest = "statement_text_preview"
    width = 150

## OUTPUT##
[[outputs.influxdb]]
  {etc...}

System info:

  • Windows Server 2016
  • Telegraf 1.14.0
  • influxDB 1.7.10

Steps to reproduce:

Probably it's not easy to reproduce this, in fact, I get 2-3 repèetitive errors every read, out of a total of hundreds of points, I think it's related to the string that gets converted from tag to field (the strings are the "statement_text" which might be very long.

  1. use telegraf with the SQL Server input plugin (only "Sqlrequests" needs to be enabled)
  2. convert the tag "statement_text" from tag to field (using converter processor)
  3. get a preview of the very long "statement_text" using the sting processor (left function) and store it in another field
  4. enjoy...

Expected behavior:

The conversion from tag to field does not cause errors

Actual behavior:

Some of the converted points cause a parsing error

Additional info:

here I have a bunch of errors (all from the same SQL Server Instance), once the "statement_text" tag is converted to a field, it can become a multiline string.
As a tag, it's text is filled with "/t /s /n" (see issue #6976), but once converted to field those chars are decoded in their actual meaning.
Also note that converting this tag into a fieldis the proposed solution for issue #7037.

Case 1
The text which causes the error is part of the "statement_text", but the section reported in the error alone is valid (as a string field).

2020-04-24T10:34:15Z E! [outputs.influxdb] When writing to [http://127.0.0.1:8186]: 400 Bad Request: metric parse error: expected field at 1002:41: "        WHERE (o.schema_id = @SchemaId)" (and 1 other parse error)
2020-04-24T10:34:15Z E! [agent] Error writing to outputs.influxdb: could not write any address

2020-04-24T10:06:50Z E! [outputs.influxdb] When writing to [http://127.0.0.1:8186]: 400 Bad Request: metric parse error: expected field at 855:66: " INNER JOIN @allUpdatesTable AS U ON U.RevisionID=MI.RevisionID "
2020-04-24T10:06:50Z E! [agent] Error writing to outputs.influxdb: could not write any address

Case 2
This reguards the calculated field "statement_text_preview", which just takes the first 150 chars of the filed "statement_text".
As you can see below, there is something wrong with the escape in fact the line protocol should be structured in the following way

sqlserver_requests,{TagSet} {otherFields},statement_text="my text",statement_text_preview="truncated my text"

But instead I get the following (note that the whole "statement_text_preview" field is part of a bigger string

2020-04-24T10:24:17Z E! [outputs.influxdb] When writing to [http://127.0.0.1:8186]: 400 Bad Request: metric parse error: expected field at 2071:180: "   \",statement_text_preview=\"IF (EXISTS (SELECT * FROM CS.SynchronizationBatch WHERE ((SynchronizationBatchStateId = 1) AND (SynchronizationBatchId <> @SynchronizationBatchId))))"
2020-04-24T10:24:17Z E! [agent] Error writing to outputs.influxdb: could not write any address

Case 3
this has been the easier to debug so far but is also the one that makes less sense to me.
here the log reports almost the whole point and not just a section of the string.

As you can see in the field set you get

{otherFields},statement_text=\"COMMIT TRAN"

Which just does not make sense, why is the double-quote that delimits the field value itself escaped?

2020-04-24T10:21:48Z E! [outputs.influxdb] When writing to [http://127.0.0.1:8186]: 400 Bad Request: metric parse error: expected field at 6090:694: "sqlserver_requests,command=COMMIT\\ TRANSACTION,database_name=master,host_name=SRV-MGMFGTFER04,nt_user_name=SRV-MGMFGTFER04$,program_name=WSUS:ClientWS:3436,query_hash=0x0000000000000000,query_plan_hash=0x0000000000000000,session_db_name=SUSDB,sql_instance=SRV-MGMFGTFER01:SQLCM11PRD,status=suspended,stmt_db_name=SUSDB,stmt_object_name=[dbo].[spGetChangeTrackingInformation],transaction_isolation_level=3-Repeatable\\ Read,wait_type=WRITELOG writes=0i,session_id=92i,objectid=568389094i,open_transaction=1i,granted_query_memory_pages=0i,percent_complete=0,total_elasped_time_ms=1i,wait_time_ms=0i,logical_reads=7i,cpu_time_ms=1i,blocking_session_id=0i,request_id=0i,statement_text=\"COMMIT TRAN"
2020-04-24T10:21:48Z E! [agent] Error writing to outputs.influxdb: could not write any address

Help Wanted:

I can provide more data and run additional tests, but I need some help.
So far I've tried to catch the broken points by:

  • using debug logging - no useful info reported, just what you already see above
  • using file output - the broken points are not written to the file, but there are no errors in the log

how can I keep the broken points and save them somewhere else to have the full point structure?

Update
Since I've not found a way to intercept the broken points using telegraf, I've set up a parallel data collection in SQL Server (which replicates Telegraf's one), therefore I should be able to provide the source data soon.

@danielnelson danielnelson self-assigned this Apr 24, 2020
@danielnelson
Copy link
Contributor

We don't have a good way to dump the metrics that are rejected right now, but we could try sticking in some debug logging directly to the influxdb output. Maybe add this in as a starting point?

diff --git a/plugins/outputs/influxdb/influxdb.go b/plugins/outputs/influxdb/influxdb.go
index 1c4af5bc..ac7713ea 100644
--- a/plugins/outputs/influxdb/influxdb.go
+++ b/plugins/outputs/influxdb/influxdb.go
@@ -6,6 +6,7 @@ import (
        "fmt"
        "math/rand"
        "net/url"
+       "strings"
        "time"
 
        "github.com/influxdata/telegraf"
@@ -231,6 +232,22 @@ func (i *InfluxDB) Write(metrics []telegraf.Metric) error {
                }
 
                i.Log.Errorf("When writing to [%s]: %v", client.URL(), err)
+
+               i.Log.Errorf("Begin Batch\n")
+               for _, m := range metrics {
+                       var str strings.Builder
+
+                       str.WriteString(fmt.Sprintf("Name(%q)", m.Name()))
+                       for _, tag := range m.TagList() {
+                               str.WriteString(fmt.Sprintf(" Tag(%q, %q)", tag.Key, tag.Value))
+                       }
+                       for _, field := range m.FieldList() {
+                               str.WriteString(fmt.Sprintf(" Field(%q, %q)", field.Key, field.Value))
+                       }
+                       str.WriteString("\n")
+                       i.Log.Error(str.String())
+               }
+               i.Log.Errorf("End Batch\n")
        }
 
        return errors.New("could not write any address")

@Trovalo
Copy link
Collaborator Author

Trovalo commented Apr 27, 2020

Today I've used the edited version of telegraf, here is an error I got.
I've cut out the most of the rows of the batch (in fact the whole batch is printed out if there is an error in it.

2020-04-27T16:03:00Z E! [outputs.influxdb] When writing to [http://127.0.0.1:8186]: 400 Bad Request: metric parse error: expected field at 537:41: "        WHERE (o.schema_id = @SchemaId)"
2020-04-27T16:03:00Z E! [outputs.influxdb] Begin Batch
{... other rows}
2020-04-27T16:03:00Z E! [outputs.influxdb] Name("sqlserver_requests") Tag("command", "SELECT") Tag("database_name", "master") Tag("host_name", "SRV-MESMGMFGT01") Tag("nt_user_name", "") Tag("program_name", "Management Configuration Service") Tag("query_hash", "0x5FF63C7DEE8859E8") Tag("query_plan_hash", "0x851F0D7AED99C25C") Tag("session_db_name", "OperationsManager") Tag("sql_instance", "SRV-MESMGMFGT01:SQLOM11PRD") Tag("status", "running") Tag("stmt_db_name", "OperationsManager") Tag("stmt_object_name", "[CS].[IndexOptimize]") Tag("transaction_isolation_level", "2-Read Committed") Tag("wait_resource", "") Field("percent_complete", %!q(float64=0)) Field("request_id", '\x02') Field("cpu_time_ms", '\x1a') Field("session_id", '{') Field("objectid", %!q(int64=934450553)) Field("total_elasped_time_ms", '\x1a') Field("logical_reads", '\u087b') Field("open_transaction", '\x00') Field("writes", '\x01') Field("wait_time_ms", '\x00') Field("granted_query_memory_pages", '\x00') Field("blocking_session_id", '\x00') Field("statement_text", "SELECT TOP 1\r\n           @ObjectId = o.object_id\r\n          ,@TableName = o.name\r\n        FROM sys.objects o\r\n        WHERE (o.schema_id = @SchemaId)\r\n          AND (o.type = 'U')\r\n          AND (o.object_id > @ObjectId)\r\n        ORDER BY o.object_id\r\n        \r\n        -- go through all indexes of the table\r\n       ") Field("statement_text_preview", "SELECT TOP 1\r\n           @ObjectId = o.object_id\r\n          ,@TableName = o.name\r\n        FROM sys.objects o\r\n        WHERE (o.schema_id = @SchemaId)\r")
{... other rows}
2020-04-27T16:03:00Z E! [outputs.influxdb] End Batch
2020-04-27T16:03:00Z D! [outputs.influxdb] Buffer fullness: 525 / 25000 metrics
2020-04-27T16:03:00Z E! [agent] Error writing to outputs.influxdb: could not write any address

For what I see, the only difference between that row of the text and the others is the last special char "\r" (carriage return), in all the other rows a new row is "\r\n" (carriage return+newline)

I will keep debugging it tomorrow by manually writing data like those to influxdb (or in genre stings that end with escaped special chars).

Does it make sense? some time ago I've found out that a string tag value cannot end with "" (backslash) even if it tests correctly escaped

If you need something else to check this issue let me know.

@danielnelson
Copy link
Contributor

It looks like you are writing to the influxdb_listener, not to InfluxDB? I can reproduce an error with a simple string ending with \r in influxdb_listener, and will work on a fix. Sending the same line to InfluxDB seems to work correctly.

@Trovalo
Copy link
Collaborator Author

Trovalo commented Apr 27, 2020

Yes, I'm using influxdb_listener as a gateway. (but I forgot to mention that).
there are other breaking chars? like "" maybe?

@danielnelson danielnelson added this to the 1.14.2 milestone Apr 28, 2020
@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Apr 28, 2020
@danielnelson
Copy link
Contributor

It looks like this is a mismatch between what values we allowed in the parser vs the serializer. Using \r was not allowed unless it was followed by \n.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants