Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kubernetes filter] a json log object containing a string value that is the serialization of a json object should be preserved in that form #1130

Closed
naseemkullah opened this issue Feb 22, 2019 · 10 comments
Assignees

Comments

@naseemkullah
Copy link
Contributor

Bug Report

Describe the bug

We serialize log messages as json, and some messages contain fields that are themselves the serialization of a json object. In this case, we are seeing that the escaped quotes of the serialized json are unescaped before being output to elastic search, resulting in a string that is no longer valid json. We believe the problem is in the elasticsearch output, because we do not see an issue when testing fluent-bit and outputting locally.

To Reproduce

  • Example log message if applicable:
{"log":"{\"payload\":\"{\\\"foo\\\":1333,\\\"bar\\\":1,\\\"baz\\\":\\\"cdd32offh-scmddddx\\\",\\\"colorrr\\\":\\\"ffffff\\\",\\\"other_color\\\":\\\"c22829\\\"}\"}","stream":"stdout","time":"2018-06-11T14:37:30.681701731Z"}
  • Steps to reproduce the problem:
    Use json parser on this log and send to Elastic Search using es output

Expected behavior

a json log object containing a string value that is the serialization of a json object should be preserved in that form, as a string that includes escaped quote characters.

Your Environment

  • Version used: v1.0.4
  • Environment name and version (e.g. Kubernetes? What version?): Kubernetes
  • Filters and plugins: docker parser (tail input), kubernetes filter, json parser, es output

Additional context

@naseemkullah
Copy link
Contributor Author

naseemkullah commented Feb 22, 2019

Also as per @Flydiverny on the slack channel

Log from container:
{"msg":"String with \"quotes\""}

Docker log file:
{"log":"{\"msg\":\"String with \\\"quotes\\\"\"}\n","stream":"stdout","time":"2019-02-22T14:55:38.191687326Z"}

Input tail with Docker parser
Kubernetes filter using json parser

What shows up in kibana:
{\"msg\":\"String with \"qoutes\"\"}\n

Which is invalid json:
{ "msg": "String with "qoutes"" }

@naseemkullah
Copy link
Contributor Author

naseemkullah commented Feb 22, 2019

As per @sjentzsch in slack channel:

{"log":"{\"@timestamp\": 1550769485892, \"level\": \"DEBUG\", \"process\": 1, \"thread\": \"139787420716360\", \"logger\": \"cachecontrol.controller\", \"message\": \"Response header has \\\"no-store\\\"\", \"traceId\": null, \"spanId\": null, \"tenant\": \"skill\"}\n","stream":"stderr","attrs":{"io.kubernetes.container.name":"skill-alexa","io.kubernetes.pod.name":"alexa-6f6997b89d-jfn5c","io.kubernetes.pod.namespace":"skill-edge","io.kubernetes.pod.uid":"b9be16c5-3466-11e9-a14d-000d3a289f71"},"time":"2019-02-21T17:18:05.892467619Z"} <--- fails to be parsed, due to \"message\": \"Response header has \\\"no-store\\\"\"

@sjentzsch
Copy link

Indeed I can relate. If our applications log JSON that contains escaped strings or JSON itself (properly escaped!), the parsing of the log field fails.
However, I can not relate to this being an issue with the elasticsearch output. We are using azure output, however, I suspect it's related to the kubernetes filter, namely the Merge_Log functionality.
Would very much love to see this fixed! Unfortunately, that's a show-stopper for our team to use fluent-bit.

@naseemkullah naseemkullah changed the title [es output?] a json log object containing a string value that is the serialization of a json object should be preserved in that form [kubernetes filter] a json log object containing a string value that is the serialization of a json object should be preserved in that form Feb 25, 2019
@naseemkullah
Copy link
Contributor Author

Thanks @sjentzsch , was not sure if it was es ouput or kubernetes filter actually. Title adjusted!

@donbowman
Copy link
Contributor

Seems the error is in flb_unescape_string(), which is called from merge_log_handler when Merge_Log is on.

@naseemkullah
Copy link
Contributor Author

So somewhere here essentially?

/* Unescape application string */
size = o.via.str.size;
unesc_len = flb_unescape_string((char *) o.via.str.ptr,
size, &ctx->unesc_buf);
ctx->unesc_buf_len = unesc_len;
ret = -1;
if (parser) {
ret = flb_parser_do(parser, ctx->unesc_buf, unesc_len,
out_buf, out_size, log_time);
if (ret >= 0) {
if (flb_time_to_double(log_time) == 0) {
flb_time_get(log_time);
}
return MERGE_PARSED;
}
}
else {
ret = flb_pack_json(ctx->unesc_buf, unesc_len,
(char **) out_buf, out_size, &root_type);
if (ret == 0 && root_type != FLB_PACK_JSON_OBJECT) {
flb_debug("[filter_kube] could not merge JSON, root_type=%i",
root_type);
flb_free(*out_buf);
return MERGE_UNESCAPED;
}
}
if (ret == -1) {
flb_debug("[filter_kube] could not merge JSON log as requested");
return MERGE_UNESCAPED;
}
return MERGE_PARSED;
}

@donbowman
Copy link
Contributor

it flows down through flb_pack_json() and then to flb_json_tokenise().

making a 'fix' to flb_unescape, it still ends up w/ a tokenise issue since jsmn doesn't understand nested.

@edsiper
Copy link
Member

edsiper commented Jun 6, 2019

Please check the following comment on #1278 :

#1278 (comment)

@naseemkullah
Copy link
Contributor Author

Thanks @edsiper will switch back to fluent-bit from fluentd soon to test this out

@edsiper
Copy link
Member

edsiper commented Jun 14, 2019

Issue already fixed, ref: #1278 (comment)

@edsiper edsiper closed this as completed Jun 14, 2019
@edsiper edsiper self-assigned this Jun 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants