Kibana breaks JSON format of GROK patterns #175753

ulab · 2024-01-27T12:38:10Z

Kibana version: 8.11.4
Elasticsearch version: 8.11.4
Server OS version: Debian 11.8
Browser version: Firefox 121.0.1
Browser OS version: MacOS 14.2.1
Original install method (e.g. download page, yum, from source, etc.): Debian package

Describe the bug:
Kibana converts and breaks JSON pattern definitions for the GROK processor.

Steps to reproduce:

Add a Grok processor to a pipeline
Edit the processor and set the Patterns to ^%{ISSUE}$
Set Pattern definitions to the following

{
    "ISSUE": "aaa\"bbb",
    "ISSUE2": "aaa\\(bbb"
}

Update the processor and edit it again
The Pattern definitions now show invalid JSON:

{
  "ISSUE": """aaa"bbb""",
  "ISSUE2": """aaa\(bbb"""
}

Expected behavior:
The patterns should not have been modified, especially not into something invalid.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-01-30T14:15:53Z

Pinging @elastic/platform-deployment-management (Team:Deployment Management)

SoniaSanzV · 2024-11-08T06:33:31Z

I've been working on that an talking to @ElenaStoeva about this and we both came to the conclusion that this is not a bug, but the same behavior that Elasticsearch has. If you do the same request in the DevTools console, Elasticsearch will return you back the same pattern format as Kibana is doing right now. So the processors UI is consistent with the service.

PUT _ingest/pipeline/grok-pipeline
{
  "processors": [
    {
      "grok": {
        "field": "test-field",
        "patterns": [
          "^%{ISSUE}$"
        ],
        "pattern_definitions": {
          "ISSUE": "aaa\"bbb",
          "ISSUE2": "aaa\\(bbb"
        }
      }
    }
  ]
}

// When you run GET _ingest/pipeline/grok-pipeline
{
  "grok-pipeline": {
    "processors": [
      {
        "grok": {
          "field": "test-field",
          "patterns": [
            "^%{ISSUE}$"
          ],
          "pattern_definitions": {
            "ISSUE": """aaa"bbb""",
            "ISSUE2": """aaa\(bbb"""
          }
        }
      }
    ]
  }
}

ulab · 2024-11-08T07:22:44Z

So the DevTools console is formatting it wrong too, because it is not happening on Elasticsearch nodes.

curl -k -X PUT "https://localhost:9200/_ingest/pipeline/grok-pipeline" -H 'Content-Type: application/json' -d'{
  "processors": [
    {
      "grok": {
        "field": "test-field",
        "patterns": [
          "^%{ISSUE}$"
        ],
        "pattern_definitions": {
          "ISSUE": "aaa\"bbb",
          "ISSUE2": "aaa\\(bbb"
        }
      }
    }
  ]
}'
{"acknowledged":true}

curl -k -X GET "https://localhost:9200/_ingest/pipeline/grok-pipeline?pretty=true"
{
  "grok-pipeline" : {
    "processors" : [
      {
        "grok" : {
          "field" : "test-field",
          "patterns" : [
            "^%{ISSUE}$"
          ],
          "pattern_definitions" : {
            "ISSUE" : "aaa\"bbb",
            "ISSUE2" : "aaa\\(bbb"
          }
        }
      }
    ]
  }
}

[Edit: added ?pretty=true]

SoniaSanzV · 2024-11-08T10:55:22Z

Sorry @ulab, you are right. I'll reopen the issue and discuss with the team how we should address this. Just for information, I wasn't aware but I've been told that in the console settings you can disable the triple quotes config.

ulab · 2024-11-08T12:02:20Z

The one thing that confuses me is that there is no triple quotes in the JSON definition, is there? So I don't understand why this is even an option :).

https://www.json.org/json-en.html

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.

The triple quotes string is from Python, but it should just not be used in what's supposed to be JSON output?

SoniaSanzV · 2024-11-08T13:37:54Z

We used XJSON as a language instead of JSON, that's why triple quotes are allowed

ElenaStoeva · 2024-11-11T09:55:04Z

Hi @ulab, many Kibana editors (including Console) use XJSON language, which is similar to JSON with one addition - it supports triple quotes. In single-quote strings, we should escape illegal characters, such as \ and ". Triple quotes allow all these illegal characters, without having to escape them. Here is how single quotes would translate to triple quotes:

Single-quote string	Equivalent in triple-quote string
`"aaa\"bbb"`	"""aaa"bbb"""
`"aaa\\bbb"`	"""aaa\bbb"""
`"aaa\nbbb"`	"""aaa bbb"""

Since Elasticsearch only accepts JSON format, the UI translates all triple quotes into single quotes before sending the request to Elasticsearch. You can verify this by checking the "Preview request":

Screen.Recording.2024-11-11.at.09.39.16.mov

With that being said, Kibana does not break your input - it sends the payload in JSON, like you typed it in. It is only the editor in the UI that shows all strings containing illegal characters in triple quotes. I'm not sure what the context behind this functionality is but I imagine it was made like this to help the user see how the string would be decoded. I understand though that this behavior is confusing for the user as they may not be aware that the editor supports XJSON language and that it displays strings in triple quotes if they contain illegal chars.

@SoniaSanzV I think what we should do is make sure that the user's input in the editor is not changed (i.e. it should look exactly as it was typed in). We can still support XJSON language in the editor, allowing triple quotes, but they should be correctly translated to single quotes in the payload to Es. Also, since the related editor component is used in multiple processor forms (e.g. Inference, Foreach, etc), I think we should fix the component itself, so that the behavior is the same for all processors.

SoniaSanzV · 2024-11-11T12:22:57Z

Hi @ulab, many Kibana editors (including Console) use XJSON language, which is similar to JSON with one addition - it supports triple quotes. In single-quote strings, we should escape illegal characters, such as \ and ". Triple quotes allow all these illegal characters, without having to escape them. Here is how single quotes would translate to triple quotes:

Single-quote string Equivalent in triple-quote string
"aaa\"bbb" """aaa"bbb"""
"aaa\\bbb" """aaa\bbb"""
"aaa\nbbb" """aaa
bbb"""
Since Elasticsearch only accepts JSON format, the UI translates all triple quotes into single quotes before sending the request to Elasticsearch. You can verify this by checking the "Preview request":

Screen.Recording.2024-11-11.at.09.39.16.mov
With that being said, Kibana does not break your input - it sends the payload in JSON, like you typed it in. It is only the editor in the UI that shows all strings containing illegal characters in triple quotes. I'm not sure what the context behind this functionality is but I imagine it was made like this to help the user see how the string would be decoded. I understand though that this behavior is confusing for the user as they may not be aware that the editor supports XJSON language and that it displays strings in triple quotes if they contain illegal chars.

@SoniaSanzV I think what we should do is make sure that the user's input in the editor is not changed (i.e. it should look exactly as it was typed in). We can still support XJSON language in the editor, allowing triple quotes, but they should be correctly translated to single quotes in the payload to Es. Also, since the related editor component is used in multiple processor forms (e.g. Inference, Foreach, etc), I think we should fix the component itself, so that the behavior is the same for all processors.

Thank you for explaining it so comprehensively @ElenaStoeva! I agree with your solution. The only thing that concerns me and we have to keep in mind is that, once the pipeline is saved and the user wants to edit the processor, we cannot know the format in which the pattern was originally written, so it will always be displayed without the triple quotes (in case it originally had them).

ulab · 2024-11-11T12:50:40Z

Do you have any links to the XJSON variant of the language? I can only find all kinds of libraries with the name, but not really any definition of it.

One issue with showing it in XJSON is that you can't use the output for anything else. Perhaps I need to modify it in a script or use it with curl directly, etc. I don't think a lot of other tools support it?

Now that I know about it, I might just copy the code from the request preview, but as you said it will be confusing for users.

Maybe you can add a switch to toggle between the two variants in the UI?

Especially since as Sonia noted the dev console also can switch between both, it might result in a mix of different formats otherwise.

ElenaStoeva · 2024-11-11T15:38:10Z

The only thing that concerns me and we have to keep in mind is that, once the pipeline is saved and the user wants to edit the processor, we cannot know the format in which the pattern was originally written, so it will always be displayed without the triple quotes (in case it originally had them).

I think that's alright.

Do you have any links to the XJSON variant of the language? I can only find all kinds of libraries with the name, but not really any definition of it.

I'm not sure if it's documented somewhere, it might be internal for Kibana. It is defined here.

One issue with showing it in XJSON is that you can't use the output for anything else. Perhaps I need to modify it in a script or use it with curl directly, etc. I don't think a lot of other tools support it?

If we implement the solution that I proposed in my previous comment, you would not have this issue since single quotes would not be transformed into triple quotes.

Maybe you can add a switch to toggle between the two variants in the UI?

I'm not sure why we would need the toggle as you can simply utilize the format that you prefer, and it won't be modified (after we implement the fix).

Especially since as Sonia noted the dev console also can switch between both, it might result in a mix of different formats otherwise.

I think there was some misunderstanding here - Console has the "Triple quotes in output" setting that can only turn on/off triple quotes in the output editor (the response from Es) but not in the input editor.

Closes [#175753](#175753) ## Summary When a user creates a pipeline with a processor that has an JSON editor field [Foreach, Grok, Inference, Redact and Script], our editor convert the input to XJSON. This can be confused to the user, who see their input modified without understanding the reason. For that, this PR creates a new editor that handles both XJSON and JSON format and does not changes the content that the user introduced while editing. Once the pipeline is saved, it will always retrieve the parsed data. I've created a whole new editor instead of modifying `XJsonEditor` because the original one is still in use in the `Custom processor`. This is because I've created [a list](https://github.com/SoniaSanzV/kibana/blob/d7d5ecafa7dbae96fe52c2e37394520b6353bd92/x-pack/plugins/ingest_pipelines/public/application/components/pipeline_editor/utils.ts#L151-L157) of the processor fields that needs to be serialized in `convertProcessorInternalToProcessor`. Since we don't know the name of the custom processor, we can't apply this workflow. I'm not super happy with the idea of adding manually the names of the processors to a list. I would like to have something more agnostic but I haven't come up with a solution that allows me to do that without breaking other fields. So, any suggestions are super welcome! ### How to test Create one of the following processors: Foreach, Grok, Inference, Redact or Script. In the JSON field, add a JSON with both with and without escaped strings. While you are editing the processor before saving the pipeline, you should see the processor as you typed it. In the `Show Request` flyout, you should see the properly parsed JSON. The pipeline should save with the correct parsed values and, once saved, if you click update, you only will get parsed values. ### Demo https://github.com/user-attachments/assets/1f9681df-2fb4-4ed5-ac30-03f2937abfe9

Closes [elastic#175753](elastic#175753) ## Summary When a user creates a pipeline with a processor that has an JSON editor field [Foreach, Grok, Inference, Redact and Script], our editor convert the input to XJSON. This can be confused to the user, who see their input modified without understanding the reason. For that, this PR creates a new editor that handles both XJSON and JSON format and does not changes the content that the user introduced while editing. Once the pipeline is saved, it will always retrieve the parsed data. I've created a whole new editor instead of modifying `XJsonEditor` because the original one is still in use in the `Custom processor`. This is because I've created [a list](https://github.com/SoniaSanzV/kibana/blob/d7d5ecafa7dbae96fe52c2e37394520b6353bd92/x-pack/plugins/ingest_pipelines/public/application/components/pipeline_editor/utils.ts#L151-L157) of the processor fields that needs to be serialized in `convertProcessorInternalToProcessor`. Since we don't know the name of the custom processor, we can't apply this workflow. I'm not super happy with the idea of adding manually the names of the processors to a list. I would like to have something more agnostic but I haven't come up with a solution that allows me to do that without breaking other fields. So, any suggestions are super welcome! ### How to test Create one of the following processors: Foreach, Grok, Inference, Redact or Script. In the JSON field, add a JSON with both with and without escaped strings. While you are editing the processor before saving the pipeline, you should see the processor as you typed it. In the `Show Request` flyout, you should see the properly parsed JSON. The pipeline should save with the correct parsed values and, once saved, if you click update, you only will get parsed values. ### Demo https://github.com/user-attachments/assets/1f9681df-2fb4-4ed5-ac30-03f2937abfe9 (cherry picked from commit 8839894)

# Backport This will backport the following commits from `main` to `8.x`: - [Handle both JSON and XJSON in processor editor (#200692)](#200692)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: Sonia Sanz Vivas <[email protected]>

Closes [elastic#175753](elastic#175753) ## Summary When a user creates a pipeline with a processor that has an JSON editor field [Foreach, Grok, Inference, Redact and Script], our editor convert the input to XJSON. This can be confused to the user, who see their input modified without understanding the reason. For that, this PR creates a new editor that handles both XJSON and JSON format and does not changes the content that the user introduced while editing. Once the pipeline is saved, it will always retrieve the parsed data. I've created a whole new editor instead of modifying `XJsonEditor` because the original one is still in use in the `Custom processor`. This is because I've created [a list](https://github.com/SoniaSanzV/kibana/blob/d7d5ecafa7dbae96fe52c2e37394520b6353bd92/x-pack/plugins/ingest_pipelines/public/application/components/pipeline_editor/utils.ts#L151-L157) of the processor fields that needs to be serialized in `convertProcessorInternalToProcessor`. Since we don't know the name of the custom processor, we can't apply this workflow. I'm not super happy with the idea of adding manually the names of the processors to a list. I would like to have something more agnostic but I haven't come up with a solution that allows me to do that without breaking other fields. So, any suggestions are super welcome! ### How to test Create one of the following processors: Foreach, Grok, Inference, Redact or Script. In the JSON field, add a JSON with both with and without escaped strings. While you are editing the processor before saving the pipeline, you should see the processor as you typed it. In the `Show Request` flyout, you should see the properly parsed JSON. The pipeline should save with the correct parsed values and, once saved, if you click update, you only will get parsed values. ### Demo https://github.com/user-attachments/assets/1f9681df-2fb4-4ed5-ac30-03f2937abfe9

ulab added the bug Fixes for quality problems that affect the customer experience label Jan 27, 2024

botelastic bot added the needs-team Issues missing a team label label Jan 27, 2024

jsanz added the Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more label Jan 30, 2024

botelastic bot removed the needs-team Issues missing a team label label Jan 30, 2024

alisonelizabeth added Feature:Ingest Node Pipelines Ingest node pipelines management good first issue low hanging fruit labels Oct 2, 2024

alisonelizabeth assigned SoniaSanzV Oct 21, 2024

SoniaSanzV mentioned this issue Oct 24, 2024

Not expand Json Literal Strings in Grok processor #197639

Closed

SoniaSanzV closed this as completed Nov 8, 2024

SoniaSanzV reopened this Nov 8, 2024

SoniaSanzV mentioned this issue Nov 19, 2024

Handle both JSON and XJSON in processor editor #200692

Merged

SoniaSanzV closed this as completed in #200692 Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kibana breaks JSON format of GROK patterns #175753

Kibana breaks JSON format of GROK patterns #175753

ulab commented Jan 27, 2024

elasticmachine commented Jan 30, 2024

SoniaSanzV commented Nov 8, 2024 •

edited

Loading

ulab commented Nov 8, 2024 •

edited

Loading

SoniaSanzV commented Nov 8, 2024

ulab commented Nov 8, 2024

SoniaSanzV commented Nov 8, 2024

ElenaStoeva commented Nov 11, 2024

SoniaSanzV commented Nov 11, 2024

ulab commented Nov 11, 2024

ElenaStoeva commented Nov 11, 2024

Kibana breaks JSON format of GROK patterns #175753

Kibana breaks JSON format of GROK patterns #175753

Comments

ulab commented Jan 27, 2024

elasticmachine commented Jan 30, 2024

SoniaSanzV commented Nov 8, 2024 • edited Loading

ulab commented Nov 8, 2024 • edited Loading

SoniaSanzV commented Nov 8, 2024

ulab commented Nov 8, 2024

SoniaSanzV commented Nov 8, 2024

ElenaStoeva commented Nov 11, 2024

SoniaSanzV commented Nov 11, 2024

ulab commented Nov 11, 2024

ElenaStoeva commented Nov 11, 2024

SoniaSanzV commented Nov 8, 2024 •

edited

Loading

ulab commented Nov 8, 2024 •

edited

Loading