Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana breaks JSON format of GROK patterns #175753

Closed
ulab opened this issue Jan 27, 2024 · 10 comments · Fixed by #200692
Closed

Kibana breaks JSON format of GROK patterns #175753

ulab opened this issue Jan 27, 2024 · 10 comments · Fixed by #200692
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Ingest Node Pipelines Ingest node pipelines management good first issue low hanging fruit Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more

Comments

@ulab
Copy link

ulab commented Jan 27, 2024

Kibana version: 8.11.4
Elasticsearch version: 8.11.4
Server OS version: Debian 11.8
Browser version: Firefox 121.0.1
Browser OS version: MacOS 14.2.1
Original install method (e.g. download page, yum, from source, etc.): Debian package

Describe the bug:
Kibana converts and breaks JSON pattern definitions for the GROK processor.

Steps to reproduce:

  1. Add a Grok processor to a pipeline
  2. Edit the processor and set the Patterns to ^%{ISSUE}$
  3. Set Pattern definitions to the following
{
    "ISSUE": "aaa\"bbb",
    "ISSUE2": "aaa\\(bbb"
}
  1. Update the processor and edit it again
  2. The Pattern definitions now show invalid JSON:
{
  "ISSUE": """aaa"bbb""",
  "ISSUE2": """aaa\(bbb"""
}

Expected behavior:
The patterns should not have been modified, especially not into something invalid.

@ulab ulab added the bug Fixes for quality problems that affect the customer experience label Jan 27, 2024
@botelastic botelastic bot added the needs-team Issues missing a team label label Jan 27, 2024
@jsanz jsanz added the Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more label Jan 30, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/platform-deployment-management (Team:Deployment Management)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 30, 2024
@alisonelizabeth alisonelizabeth added Feature:Ingest Node Pipelines Ingest node pipelines management good first issue low hanging fruit labels Oct 2, 2024
@SoniaSanzV
Copy link
Contributor

SoniaSanzV commented Nov 8, 2024

I've been working on that an talking to @ElenaStoeva about this and we both came to the conclusion that this is not a bug, but the same behavior that Elasticsearch has. If you do the same request in the DevTools console, Elasticsearch will return you back the same pattern format as Kibana is doing right now. So the processors UI is consistent with the service.

PUT _ingest/pipeline/grok-pipeline
{
  "processors": [
    {
      "grok": {
        "field": "test-field",
        "patterns": [
          "^%{ISSUE}$"
        ],
        "pattern_definitions": {
          "ISSUE": "aaa\"bbb",
          "ISSUE2": "aaa\\(bbb"
        }
      }
    }
  ]
}

// When you run GET _ingest/pipeline/grok-pipeline
{
  "grok-pipeline": {
    "processors": [
      {
        "grok": {
          "field": "test-field",
          "patterns": [
            "^%{ISSUE}$"
          ],
          "pattern_definitions": {
            "ISSUE": """aaa"bbb""",
            "ISSUE2": """aaa\(bbb"""
          }
        }
      }
    ]
  }
}

@ulab
Copy link
Author

ulab commented Nov 8, 2024

So the DevTools console is formatting it wrong too, because it is not happening on Elasticsearch nodes.

curl -k -X PUT "https://localhost:9200/_ingest/pipeline/grok-pipeline" -H 'Content-Type: application/json' -d'{
  "processors": [
    {
      "grok": {
        "field": "test-field",
        "patterns": [
          "^%{ISSUE}$"
        ],
        "pattern_definitions": {
          "ISSUE": "aaa\"bbb",
          "ISSUE2": "aaa\\(bbb"
        }
      }
    }
  ]
}'
{"acknowledged":true}

curl -k -X GET "https://localhost:9200/_ingest/pipeline/grok-pipeline?pretty=true"
{
  "grok-pipeline" : {
    "processors" : [
      {
        "grok" : {
          "field" : "test-field",
          "patterns" : [
            "^%{ISSUE}$"
          ],
          "pattern_definitions" : {
            "ISSUE" : "aaa\"bbb",
            "ISSUE2" : "aaa\\(bbb"
          }
        }
      }
    ]
  }
}

[Edit: added ?pretty=true]

@SoniaSanzV
Copy link
Contributor

Sorry @ulab, you are right. I'll reopen the issue and discuss with the team how we should address this. Just for information, I wasn't aware but I've been told that in the console settings you can disable the triple quotes config.

@SoniaSanzV SoniaSanzV reopened this Nov 8, 2024
@ulab
Copy link
Author

ulab commented Nov 8, 2024

The one thing that confuses me is that there is no triple quotes in the JSON definition, is there? So I don't understand why this is even an option :).

https://www.json.org/json-en.html

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.

The triple quotes string is from Python, but it should just not be used in what's supposed to be JSON output?

@SoniaSanzV
Copy link
Contributor

We used XJSON as a language instead of JSON, that's why triple quotes are allowed

@ElenaStoeva
Copy link
Contributor

Hi @ulab, many Kibana editors (including Console) use XJSON language, which is similar to JSON with one addition - it supports triple quotes. In single-quote strings, we should escape illegal characters, such as \ and ". Triple quotes allow all these illegal characters, without having to escape them. Here is how single quotes would translate to triple quotes:

Single-quote string Equivalent in triple-quote string
"aaa\"bbb" """aaa"bbb"""
"aaa\\bbb" """aaa\bbb"""
"aaa\nbbb" """aaa
bbb"""

Since Elasticsearch only accepts JSON format, the UI translates all triple quotes into single quotes before sending the request to Elasticsearch. You can verify this by checking the "Preview request":

Screen.Recording.2024-11-11.at.09.39.16.mov

With that being said, Kibana does not break your input - it sends the payload in JSON, like you typed it in. It is only the editor in the UI that shows all strings containing illegal characters in triple quotes. I'm not sure what the context behind this functionality is but I imagine it was made like this to help the user see how the string would be decoded. I understand though that this behavior is confusing for the user as they may not be aware that the editor supports XJSON language and that it displays strings in triple quotes if they contain illegal chars.

@SoniaSanzV I think what we should do is make sure that the user's input in the editor is not changed (i.e. it should look exactly as it was typed in). We can still support XJSON language in the editor, allowing triple quotes, but they should be correctly translated to single quotes in the payload to Es. Also, since the related editor component is used in multiple processor forms (e.g. Inference, Foreach, etc), I think we should fix the component itself, so that the behavior is the same for all processors.

@SoniaSanzV
Copy link
Contributor

Hi @ulab, many Kibana editors (including Console) use XJSON language, which is similar to JSON with one addition - it supports triple quotes. In single-quote strings, we should escape illegal characters, such as \ and ". Triple quotes allow all these illegal characters, without having to escape them. Here is how single quotes would translate to triple quotes:

Single-quote string Equivalent in triple-quote string
"aaa\"bbb" """aaa"bbb"""
"aaa\\bbb" """aaa\bbb"""
"aaa\nbbb" """aaa
bbb"""
Since Elasticsearch only accepts JSON format, the UI translates all triple quotes into single quotes before sending the request to Elasticsearch. You can verify this by checking the "Preview request":

Screen.Recording.2024-11-11.at.09.39.16.mov
With that being said, Kibana does not break your input - it sends the payload in JSON, like you typed it in. It is only the editor in the UI that shows all strings containing illegal characters in triple quotes. I'm not sure what the context behind this functionality is but I imagine it was made like this to help the user see how the string would be decoded. I understand though that this behavior is confusing for the user as they may not be aware that the editor supports XJSON language and that it displays strings in triple quotes if they contain illegal chars.

@SoniaSanzV I think what we should do is make sure that the user's input in the editor is not changed (i.e. it should look exactly as it was typed in). We can still support XJSON language in the editor, allowing triple quotes, but they should be correctly translated to single quotes in the payload to Es. Also, since the related editor component is used in multiple processor forms (e.g. Inference, Foreach, etc), I think we should fix the component itself, so that the behavior is the same for all processors.

Thank you for explaining it so comprehensively @ElenaStoeva! I agree with your solution. The only thing that concerns me and we have to keep in mind is that, once the pipeline is saved and the user wants to edit the processor, we cannot know the format in which the pattern was originally written, so it will always be displayed without the triple quotes (in case it originally had them).

@ulab
Copy link
Author

ulab commented Nov 11, 2024

Do you have any links to the XJSON variant of the language? I can only find all kinds of libraries with the name, but not really any definition of it.

One issue with showing it in XJSON is that you can't use the output for anything else. Perhaps I need to modify it in a script or use it with curl directly, etc. I don't think a lot of other tools support it?

Now that I know about it, I might just copy the code from the request preview, but as you said it will be confusing for users.

Maybe you can add a switch to toggle between the two variants in the UI?

Especially since as Sonia noted the dev console also can switch between both, it might result in a mix of different formats otherwise.

@ElenaStoeva
Copy link
Contributor

The only thing that concerns me and we have to keep in mind is that, once the pipeline is saved and the user wants to edit the processor, we cannot know the format in which the pattern was originally written, so it will always be displayed without the triple quotes (in case it originally had them).

I think that's alright.

Do you have any links to the XJSON variant of the language? I can only find all kinds of libraries with the name, but not really any definition of it.

I'm not sure if it's documented somewhere, it might be internal for Kibana. It is defined here.

One issue with showing it in XJSON is that you can't use the output for anything else. Perhaps I need to modify it in a script or use it with curl directly, etc. I don't think a lot of other tools support it?

If we implement the solution that I proposed in my previous comment, you would not have this issue since single quotes would not be transformed into triple quotes.

Maybe you can add a switch to toggle between the two variants in the UI?

I'm not sure why we would need the toggle as you can simply utilize the format that you prefer, and it won't be modified (after we implement the fix).

Especially since as Sonia noted the dev console also can switch between both, it might result in a mix of different formats otherwise.

I think there was some misunderstanding here - Console has the "Triple quotes in output" setting that can only turn on/off triple quotes in the output editor (the response from Es) but not in the input editor.

SoniaSanzV added a commit that referenced this issue Nov 27, 2024
Closes [#175753](#175753)

## Summary
When a user creates a pipeline with a processor that has an JSON editor
field [Foreach, Grok, Inference, Redact and Script], our editor convert
the input to XJSON. This can be confused to the user, who see their
input modified without understanding the reason. For that, this PR
creates a new editor that handles both XJSON and JSON format and does
not changes the content that the user introduced while editing. Once the
pipeline is saved, it will always retrieve the parsed data.

I've created a whole new editor instead of modifying `XJsonEditor`
because the original one is still in use in the `Custom processor`. This
is because I've created [a
list](https://github.com/SoniaSanzV/kibana/blob/d7d5ecafa7dbae96fe52c2e37394520b6353bd92/x-pack/plugins/ingest_pipelines/public/application/components/pipeline_editor/utils.ts#L151-L157)
of the processor fields that needs to be serialized in
`convertProcessorInternalToProcessor`. Since we don't know the name of
the custom processor, we can't apply this workflow. I'm not super happy
with the idea of adding manually the names of the processors to a list.
I would like to have something more agnostic but I haven't come up with
a solution that allows me to do that without breaking other fields. So,
any suggestions are super welcome!

### How to test
Create one of the following processors: Foreach, Grok, Inference, Redact
or Script. In the JSON field, add a JSON with both with and without
escaped strings. While you are editing the processor before saving the
pipeline, you should see the processor as you typed it. In the `Show
Request` flyout, you should see the properly parsed JSON.

The pipeline should save with the correct parsed values and, once saved,
if you click update, you only will get parsed values.

### Demo


https://github.com/user-attachments/assets/1f9681df-2fb4-4ed5-ac30-03f2937abfe9
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 27, 2024
Closes [elastic#175753](elastic#175753)

## Summary
When a user creates a pipeline with a processor that has an JSON editor
field [Foreach, Grok, Inference, Redact and Script], our editor convert
the input to XJSON. This can be confused to the user, who see their
input modified without understanding the reason. For that, this PR
creates a new editor that handles both XJSON and JSON format and does
not changes the content that the user introduced while editing. Once the
pipeline is saved, it will always retrieve the parsed data.

I've created a whole new editor instead of modifying `XJsonEditor`
because the original one is still in use in the `Custom processor`. This
is because I've created [a
list](https://github.com/SoniaSanzV/kibana/blob/d7d5ecafa7dbae96fe52c2e37394520b6353bd92/x-pack/plugins/ingest_pipelines/public/application/components/pipeline_editor/utils.ts#L151-L157)
of the processor fields that needs to be serialized in
`convertProcessorInternalToProcessor`. Since we don't know the name of
the custom processor, we can't apply this workflow. I'm not super happy
with the idea of adding manually the names of the processors to a list.
I would like to have something more agnostic but I haven't come up with
a solution that allows me to do that without breaking other fields. So,
any suggestions are super welcome!

### How to test
Create one of the following processors: Foreach, Grok, Inference, Redact
or Script. In the JSON field, add a JSON with both with and without
escaped strings. While you are editing the processor before saving the
pipeline, you should see the processor as you typed it. In the `Show
Request` flyout, you should see the properly parsed JSON.

The pipeline should save with the correct parsed values and, once saved,
if you click update, you only will get parsed values.

### Demo

https://github.com/user-attachments/assets/1f9681df-2fb4-4ed5-ac30-03f2937abfe9
(cherry picked from commit 8839894)
kibanamachine added a commit that referenced this issue Nov 27, 2024
# Backport

This will backport the following commits from `main` to `8.x`:
- [Handle both JSON and XJSON in processor editor
(#200692)](#200692)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Sonia Sanz
Vivas","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-11-27T06:42:30Z","message":"Handle
both JSON and XJSON in processor editor (#200692)\n\nCloses
[#175753](https://github.com/elastic/kibana/issues/175753)\r\n\r\n##
Summary\r\nWhen a user creates a pipeline with a processor that has an
JSON editor\r\nfield [Foreach, Grok, Inference, Redact and Script], our
editor convert\r\nthe input to XJSON. This can be confused to the user,
who see their\r\ninput modified without understanding the reason. For
that, this PR\r\ncreates a new editor that handles both XJSON and JSON
format and does\r\nnot changes the content that the user introduced
while editing. Once the\r\npipeline is saved, it will always retrieve
the parsed data.\r\n\r\nI've created a whole new editor instead of
modifying `XJsonEditor`\r\nbecause the original one is still in use in
the `Custom processor`. This\r\nis because I've created
[a\r\nlist](https://github.com/SoniaSanzV/kibana/blob/d7d5ecafa7dbae96fe52c2e37394520b6353bd92/x-pack/plugins/ingest_pipelines/public/application/components/pipeline_editor/utils.ts#L151-L157)\r\nof
the processor fields that needs to be serialized
in\r\n`convertProcessorInternalToProcessor`. Since we don't know the
name of\r\nthe custom processor, we can't apply this workflow. I'm not
super happy\r\nwith the idea of adding manually the names of the
processors to a list.\r\nI would like to have something more agnostic
but I haven't come up with\r\na solution that allows me to do that
without breaking other fields. So,\r\nany suggestions are super
welcome!\r\n\r\n### How to test\r\nCreate one of the following
processors: Foreach, Grok, Inference, Redact\r\nor Script. In the JSON
field, add a JSON with both with and without\r\nescaped strings. While
you are editing the processor before saving the\r\npipeline, you should
see the processor as you typed it. In the `Show\r\nRequest` flyout, you
should see the properly parsed JSON.\r\n\r\nThe pipeline should save
with the correct parsed values and, once saved,\r\nif you click update,
you only will get parsed values.\r\n\r\n###
Demo\r\n\r\n\r\nhttps://github.com/user-attachments/assets/1f9681df-2fb4-4ed5-ac30-03f2937abfe9","sha":"8839894646451144a9534157c08cde18727aa240","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Kibana
Management","release_note:skip","v9.0.0","Feature:Ingest Node
Pipelines","backport:prev-minor"],"title":"Handle both JSON and XJSON in
processor
editor","number":200692,"url":"https://github.com/elastic/kibana/pull/200692","mergeCommit":{"message":"Handle
both JSON and XJSON in processor editor (#200692)\n\nCloses
[#175753](https://github.com/elastic/kibana/issues/175753)\r\n\r\n##
Summary\r\nWhen a user creates a pipeline with a processor that has an
JSON editor\r\nfield [Foreach, Grok, Inference, Redact and Script], our
editor convert\r\nthe input to XJSON. This can be confused to the user,
who see their\r\ninput modified without understanding the reason. For
that, this PR\r\ncreates a new editor that handles both XJSON and JSON
format and does\r\nnot changes the content that the user introduced
while editing. Once the\r\npipeline is saved, it will always retrieve
the parsed data.\r\n\r\nI've created a whole new editor instead of
modifying `XJsonEditor`\r\nbecause the original one is still in use in
the `Custom processor`. This\r\nis because I've created
[a\r\nlist](https://github.com/SoniaSanzV/kibana/blob/d7d5ecafa7dbae96fe52c2e37394520b6353bd92/x-pack/plugins/ingest_pipelines/public/application/components/pipeline_editor/utils.ts#L151-L157)\r\nof
the processor fields that needs to be serialized
in\r\n`convertProcessorInternalToProcessor`. Since we don't know the
name of\r\nthe custom processor, we can't apply this workflow. I'm not
super happy\r\nwith the idea of adding manually the names of the
processors to a list.\r\nI would like to have something more agnostic
but I haven't come up with\r\na solution that allows me to do that
without breaking other fields. So,\r\nany suggestions are super
welcome!\r\n\r\n### How to test\r\nCreate one of the following
processors: Foreach, Grok, Inference, Redact\r\nor Script. In the JSON
field, add a JSON with both with and without\r\nescaped strings. While
you are editing the processor before saving the\r\npipeline, you should
see the processor as you typed it. In the `Show\r\nRequest` flyout, you
should see the properly parsed JSON.\r\n\r\nThe pipeline should save
with the correct parsed values and, once saved,\r\nif you click update,
you only will get parsed values.\r\n\r\n###
Demo\r\n\r\n\r\nhttps://github.com/user-attachments/assets/1f9681df-2fb4-4ed5-ac30-03f2937abfe9","sha":"8839894646451144a9534157c08cde18727aa240"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/200692","number":200692,"mergeCommit":{"message":"Handle
both JSON and XJSON in processor editor (#200692)\n\nCloses
[#175753](https://github.com/elastic/kibana/issues/175753)\r\n\r\n##
Summary\r\nWhen a user creates a pipeline with a processor that has an
JSON editor\r\nfield [Foreach, Grok, Inference, Redact and Script], our
editor convert\r\nthe input to XJSON. This can be confused to the user,
who see their\r\ninput modified without understanding the reason. For
that, this PR\r\ncreates a new editor that handles both XJSON and JSON
format and does\r\nnot changes the content that the user introduced
while editing. Once the\r\npipeline is saved, it will always retrieve
the parsed data.\r\n\r\nI've created a whole new editor instead of
modifying `XJsonEditor`\r\nbecause the original one is still in use in
the `Custom processor`. This\r\nis because I've created
[a\r\nlist](https://github.com/SoniaSanzV/kibana/blob/d7d5ecafa7dbae96fe52c2e37394520b6353bd92/x-pack/plugins/ingest_pipelines/public/application/components/pipeline_editor/utils.ts#L151-L157)\r\nof
the processor fields that needs to be serialized
in\r\n`convertProcessorInternalToProcessor`. Since we don't know the
name of\r\nthe custom processor, we can't apply this workflow. I'm not
super happy\r\nwith the idea of adding manually the names of the
processors to a list.\r\nI would like to have something more agnostic
but I haven't come up with\r\na solution that allows me to do that
without breaking other fields. So,\r\nany suggestions are super
welcome!\r\n\r\n### How to test\r\nCreate one of the following
processors: Foreach, Grok, Inference, Redact\r\nor Script. In the JSON
field, add a JSON with both with and without\r\nescaped strings. While
you are editing the processor before saving the\r\npipeline, you should
see the processor as you typed it. In the `Show\r\nRequest` flyout, you
should see the properly parsed JSON.\r\n\r\nThe pipeline should save
with the correct parsed values and, once saved,\r\nif you click update,
you only will get parsed values.\r\n\r\n###
Demo\r\n\r\n\r\nhttps://github.com/user-attachments/assets/1f9681df-2fb4-4ed5-ac30-03f2937abfe9","sha":"8839894646451144a9534157c08cde18727aa240"}}]}]
BACKPORT-->

Co-authored-by: Sonia Sanz Vivas <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Dec 12, 2024
Closes [elastic#175753](elastic#175753)

## Summary
When a user creates a pipeline with a processor that has an JSON editor
field [Foreach, Grok, Inference, Redact and Script], our editor convert
the input to XJSON. This can be confused to the user, who see their
input modified without understanding the reason. For that, this PR
creates a new editor that handles both XJSON and JSON format and does
not changes the content that the user introduced while editing. Once the
pipeline is saved, it will always retrieve the parsed data.

I've created a whole new editor instead of modifying `XJsonEditor`
because the original one is still in use in the `Custom processor`. This
is because I've created [a
list](https://github.com/SoniaSanzV/kibana/blob/d7d5ecafa7dbae96fe52c2e37394520b6353bd92/x-pack/plugins/ingest_pipelines/public/application/components/pipeline_editor/utils.ts#L151-L157)
of the processor fields that needs to be serialized in
`convertProcessorInternalToProcessor`. Since we don't know the name of
the custom processor, we can't apply this workflow. I'm not super happy
with the idea of adding manually the names of the processors to a list.
I would like to have something more agnostic but I haven't come up with
a solution that allows me to do that without breaking other fields. So,
any suggestions are super welcome!

### How to test
Create one of the following processors: Foreach, Grok, Inference, Redact
or Script. In the JSON field, add a JSON with both with and without
escaped strings. While you are editing the processor before saving the
pipeline, you should see the processor as you typed it. In the `Show
Request` flyout, you should see the properly parsed JSON.

The pipeline should save with the correct parsed values and, once saved,
if you click update, you only will get parsed values.

### Demo


https://github.com/user-attachments/assets/1f9681df-2fb4-4ed5-ac30-03f2937abfe9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Ingest Node Pipelines Ingest node pipelines management good first issue low hanging fruit Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more
Projects
None yet
6 participants