Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not expand Json Literal Strings in Grok processor #197639

Conversation

SoniaSanzV
Copy link
Contributor

Closes #175753

Summary

When a new pipeline was created and a Grok processor was selected, the pattern definitions were being parsed. This was only happening in the visualization and not in the data being saved. The reason was that it was using XJsonEditor as a component, which in turn was using useXJsonMode. This hooks contains a method that parses JSON-like strings which caused some characters not to be rendered because they were considered to be escaped. Something that this is not wanted in this specific editor.

So for this case I've created a component that uses the same editor option as XJsonEditor but uses the value without parsing so the processor pattern does not change.

Now, if we introduce this pattern, update the processor and edit again, the pattern displayed will look the same

{
    "ISSUE": "aaa\"bbb",
    "ISSUE2": "aaa\\(bbb"
}
Grok.mov

@SoniaSanzV SoniaSanzV self-assigned this Oct 24, 2024
@SoniaSanzV SoniaSanzV added Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more release_note:skip Skip the PR/issue when compiling release notes v9.0.0 backport:prev-minor Backport to (9.0) the previous minor version (i.e. one version back from main) v8.17.0 Feature:Ingest Node Pipelines Ingest node pipelines management labels Oct 24, 2024
@SoniaSanzV SoniaSanzV marked this pull request as ready for review October 24, 2024 14:14
@SoniaSanzV SoniaSanzV requested a review from a team as a code owner October 24, 2024 14:14
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-management (Team:Kibana Management)

@ElenaStoeva ElenaStoeva self-requested a review October 25, 2024 13:42
Copy link
Contributor

@ElenaStoeva ElenaStoeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @SoniaSanzV! I tested locally and it works as expected now. I have a suggestion about reusing the XJsonEditor component instead of creating a new one.

lineNumbers: 'off',
};

export const UnexpandedJsonEditor: FunctionComponent<Props> = ({ field, editorProps }) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we really need to introduce this new component given that it reuses most of the logic in the XJsonEditor component - also, technically it also uses the xjson language so the new naming might be confusing. Wouldn't it be better to add some prop to the XJsonEditor component that tweaks the logic and determines whether we want to expand the strings or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is also a good idea. And you are so right about the name, so I will implement what you say.

@SoniaSanzV SoniaSanzV force-pushed the ingestPipelines/grokProcessor_#175753 branch from 501df34 to 22dbf1f Compare October 25, 2024 15:01
@SoniaSanzV SoniaSanzV enabled auto-merge (squash) October 25, 2024 15:41
@SoniaSanzV SoniaSanzV force-pushed the ingestPipelines/grokProcessor_#175753 branch from 22dbf1f to 622553a Compare October 28, 2024 07:01
Copy link
Contributor

@ElenaStoeva ElenaStoeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring the changes @SoniaSanzV! I tested again and realised that the current behavior in this PR may not be entirely correct. With the current logic, we don't allow triple-quote strings, but they are allowed by Es. For example, the following request works in Console:

PUT _ingest/pipeline/test-pipeline
{
  "processors": [
    {
      "grok": {
        "field": "test",
        "patterns": [
          "^%{ISSUE}$"
        ],
        "pattern_definitions": {
          "ISSUE": """aaa\"bbb""",
          "ISSUE2": "aaa\\(bbb"
        }
      }
    }
  ]
}

But if you type in this pattern definition in the UI, it says it's invalid:
Screenshot 2024-10-28 at 20 18 06

Also, I found something else - Es seems to automatically expand the strings if they contain quotes inside. For example, if you run GET _ingest/pipeline/test-pipeline, the response is:

{
  "test-pipeline": {
    "processors": [
      {
        "grok": {
          "field": "test",
          "patterns": [
            "^%{ISSUE}$"
          ],
          "pattern_definitions": {
            "ISSUE": """aaa\"bbb""",
            "ISSUE2": """aaa\(bbb"""
          }
        }
      }
    ]
  }
}

(the second pattern definition now uses triple-quote even if we set it to use a single-quote).

I think we should still allow triple-quotes in the UI but make sure the value that is in the editor is not expanded/collapsed unnecessarily and is shown in the way the user typed it in/or Es returned it.

@SoniaSanzV
Copy link
Contributor Author

SoniaSanzV commented Nov 5, 2024

I've been working around with that and I'm quite lost with which is the correct solution. I'm started to think that the original approach is the right one, since we are displaying the same pattern format that ES returns.
What I don't think it makes sense is to mix escaped values with unescaped values. I mean, if the user introduces the following pattern:

{
    "ISSUE": "aaa\"bbb",
    "ISSUE2": "aaa\\(bbb",
    "ISSUE3": """aaa\"bbb"""
} 

we can treat the JSON as escaped or unescaped, but not both. Actually, once the pipeline is created, the user click update processor and we fetch the data from ES, we don't have a way to know if the user original input was "ISSUE2": "aaa\\(bbb", or "ISSUE2": """aaa\(bbb"""
Whether we escape the quotation marks or not, I believe that what we show has to be consistent and cohesive.

@SoniaSanzV SoniaSanzV force-pushed the ingestPipelines/grokProcessor_#175753 branch from 622553a to 564f8da Compare November 5, 2024 11:35
@SoniaSanzV
Copy link
Contributor Author

SoniaSanzV commented Nov 5, 2024

I have uploaded a new proposal (we can discuss it). In the end I have gone with the following: the user can enter any value they want as long as, once parsed, it is a valid JSON. Note that in the video, when “ISSUE3”: “”“aaa ‘bbb’”" is entered, the color of the editor is not red since the quotes have not yet been escaped and therefore it is not a proper JSON.
Once the processor is saved, it is always displayed with escaped values.

GROK.mov

The invalid JSON validator keeps working as expected

invalid

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
ingestPipelines 405.2KB 405.3KB +66.0B

History

  • 💛 Build #246001 was flaky 22dbf1f69f5e256919e724f7bad82c3f1866ef58
  • 💚 Build #245621 succeeded 501df3466fdae1ae05c474572d87a87e5b75610a

cc @SoniaSanzV

Copy link
Contributor

@ElenaStoeva ElenaStoeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my feedback @SoniaSanzV. I tested again and I noticed that now we have the opposite problem - triple quotes get transformed into single quotes, which is also something that users might complain about.

Before these changes:

Screen.Recording.2024-11-07.at.19.06.43.mov

With these Changes:

Screen.Recording.2024-11-07.at.19.04.53.mov

Overall, I don't really think that the initial behavior was really a bug since Kibana was simply transforming the strings into another format. Also, if you send the same request in Console, you will see that even Elasticsearch transforms single quotes with escape chars into triple quotes:

Screenshot 2024-11-07 at 18 54 35

I think we should either do no transformation at all (i.e. the final request should be exactly as the user types it in) or we should leave it and let Kibana do the transformation as Elasticsearch does it.

@SoniaSanzV
Copy link
Contributor Author

Yep, that's what I mentioned in my last comments. I don't think that the initial implementation was a bug either. My last implementation was just another option to have. But, as I said, maybe we can have an input that mixes escaped and unescaped jsons before the user saves the processor; but there is no way to know with the ES response what format they were in at the beginning, so we will always have to go with one of the two formats. And with that in mind, I think the one that matches the dev tools is better.

@SoniaSanzV SoniaSanzV closed this Nov 8, 2024
auto-merge was automatically disabled November 8, 2024 06:22

Pull request was closed

@SoniaSanzV SoniaSanzV deleted the ingestPipelines/grokProcessor_#175753 branch November 8, 2024 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (9.0) the previous minor version (i.e. one version back from main) Feature:Ingest Node Pipelines Ingest node pipelines management release_note:skip Skip the PR/issue when compiling release notes Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kibana breaks JSON format of GROK patterns
3 participants