Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parser): reimplement parser in langium instead of jison #4401

Open
Yokozuna59 opened this issue May 15, 2023 · 22 comments · May be fixed by #5892
Open

feat(parser): reimplement parser in langium instead of jison #4401

Yokozuna59 opened this issue May 15, 2023 · 22 comments · May be fixed by #5892

Comments

@Yokozuna59
Copy link
Member

Yokozuna59 commented May 15, 2023

From this conversion. It would be great to reimplement the parser in langium instead of jison the following reasons:

  1. The current jison package is unmaintained.
  2. Langium supports LSP.
  3. Langium has cleaner and easier syntax.
  4. Langium has typescript integration with a fully typed AST.
  5. Langium has better error reporting.

And much more!

I'm currently working to create a separate package using it here Yokozuna59/mermaid-lint. Anyone is welcome to contribute and help me! Even thoughts would be fine.

Here is an example I've built using Langium for pie charts (toggle the icon next to the link to show the AST). playground

Note: It's not finished; it's just an example.

related issues:

@github-actions github-actions bot added the Status: Triage Needs to be verified, categorized, etc label May 15, 2023
@sidharthv96
Copy link
Member

This is exciting @Yokozuna59 !
Once you have a working parser, can you create a PR like #3432, so that we can see how much the size would increase with Langium?

I'd love to see the existing parsers modernised!

@sidharthv96 sidharthv96 added Area: Development Status: In progress enhancement and removed Status: Triage Needs to be verified, categorized, etc labels May 16, 2023
@Yokozuna59
Copy link
Member Author

Once you have a working parser, can you create a PR like #3432, so that we can see how much the size would increase with Langium?

Sure! I'm currently planning to support the pie chart since it's the easiest one. But I think I should support the common syntax and configuration first: comments, directives, accessible titles and descriptions, etc.

I've got to improve the current comments and directives regexes since they aren't covering some weird cases; e.g., in #1645, if text has been wrapped inside {} it won't be skipped and shown as text instead.

classDiagram
    class Cat {
        +int age %% not skipped comment 
    }
Loading

@Yokozuna59
Copy link
Member Author

@sidharthv96 It doesn't look like Langium or Chevrotain are customizable enough in our use case.

I was trying to match the diagram title (and accessible title and description in the same way), so I wanted to match everything after title: and set the value of it as title, but it doesn't seem that there is a clean way using Langium or Chevrotain from this discussion.

@sidharthv96
Copy link
Member

@Yokozuna59
Copy link
Member Author

I think I was able to do that in chevrotain.

https://github.com/mermaid-js/mermaid/pull/3432/files#diff-d7ef899ed988dee0393145afe276d4821390b2d40f4bac039bea0f129e4338afR102-R106

Although not in the same way, I was able to do it too.

I've finished accTitle and single and multi-line accDesc with their unit tests. But I haven't committed yet; some minor changes might be needed. Then I'm planning to support directives, which might take a lot of time because of the weird style and number of valid values.

@Yokozuna59
Copy link
Member Author

I think accTitle and accDescr are done with their test cases. (Some weird test cases that are not handled have .todo).

Can some verify I haven't forgotten anything? link

@aloisklink
Copy link
Member

aloisklink commented May 24, 2023

Then I'm planning to support directives, which might take a lot of time because of the weird style and number of valid values.

Hmmmm, this is kinda a hack, so sticking it langium is probably the ideal way of doing this, but if you're feeling lazy, I wonder if it's easier to do this a pre-proccess step, e.g. something like:

const diagram = `%%{init: { **insert argument here**}}%%
  graph TD
    A-->B`;
const directiveRegex = /%%(\{[\w\W]*\})%%/gmU; // use U-flag to make `*` lazy
const directives = diagram.matchAll(directiveRegex);
const diagramWithoutDirectives = diagram.replace(directiveRegex, "");

const parsedDiagram = yourLangiumParser.parse(diagramWithoutDirectives);

It seems like directives are more like a metadata thing that applies to all diagrams, rather than something part of each diagram, so it might make sense to split it away from the diagram parser 🤷

Edit:

Just thought of something else, all mermaid diagram definitions support using a YAML front-matter before the diagram, e.g. like:

```mermaid
---
# some YAML here
title: "Hi"
---
pie
    "Rats" : 15
```

The way this is currently implemented is by doing a pre-processing step on all diagrams, so doing the same thing with directives wouldn't be too weird.

Can some verify I haven't forgotten anything? link

From a quick look, it looks great! Great work 👍! The it.todo tests are probably the most important unfortunately, otherwise it would break existing diagrams.

@sidharthv96
Copy link
Member

I agree with @aloisklink, yaml and init directives can be handled externally for now as they aren't super important for our core usecase which is formatting and parsing the actual diagrams.

@Yokozuna59
Copy link
Member Author

Hmmmm, this is kinda a hack, so sticking it langium is probably the ideal way of doing this...

I can create a general directive parser (still not easy), but it won't be strictly typed. So it won't suggest anything if the user tries to write some directives, unlike the strictly which would suggest according to what the user has typed using LSP. For example:

image

...but if you're feeling lazy, I wonder if it's easier to do this a pre-proccess step, e.g. something like:

I tried tbh, but it would hard to preform with my experience with langium. It can be removed using hidden terminal rule:

hidden terminal DIRECTIVE: "%%{" -> "}%%\r?\n";

It seems like directives are more like a metadata thing that applies to all diagrams, rather than something part of each diagram, so it might make sense to split it away from the diagram parser shrug

Not really sure tbh, but I won't think about it for now and will focus on other stuff.

Just thought of something else, all mermaid diagram definitions support using a YAML front-matter before the diagram, e.g. like:

---
# some YAML here
title: "Hi"
---
pie
    "Rats" : 15

The way this is currently implemented is by doing a pre-processing step on all diagrams, so doing the same thing with directives wouldn't be too weird.

I think it won't be that hard to implement this since it's wrapped differently from comments (%%) and only appears at the top of the file, but I'm not really sure what possible values can be there since I couldn't find them in the docs.

From a quick look, it looks great! Great work +1! The it.todo tests are probably the most important unfortunately, otherwise it would break existing diagrams.

I was trying to create a general regex that make sure that there is no accTitle or accDescr before title but appears that it won't be easy. You can try it here and please inform if you could come up with one. https://regex101.com/r/rGuOBM/1

@Yokozuna59
Copy link
Member Author

I figured out the solution for those it.todo test cases; it appears that I was using Langium in the wrong way. However, there is a limitation on the Langium side that I couldn't solve completely. If this PR merged eclipse-langium/langium#1067, then I would be able to continue working on the parser.

@sidharthv96
Copy link
Member

@Yokozuna59 will be developing this package in their repo, and we'll move it into mermaid once it's stable for production use. This will replace mermaid's current JISON parser once the team decides.

I wonder if there's any way to verify that the new parser is backwards compatible with the syntax supported by the old parser?

@Yokozuna59
Copy link
Member Author

I wonder if there's any way to verify that the new parser is backwards compatible with the syntax supported by the old parser?

I'm currently creating unit tests for common stuff, for example, title, accTitle, accDescr, etc. And will add unit tests for each diagram. Currently, only pie chart and the above-listed common stuff have test cases. And I add more test cases for each edge case and valid syntax while developing. I could take the test cases for the current parser and add them to mine to validate that they're working fine.

What do you suggest?

@Yokozuna59
Copy link
Member Author

Yokozuna59 commented Jun 1, 2023

Should I support multiline comments #1281 with it? I came across it while a ago. Please mention me for similar stuff if you could since there isn't a parser label in mermaid issues. I would suggest adding one if you can! Maybe adding other label like renderer and lsp would be helpful for other contributors.

@sidharthv96
Copy link
Member

sidharthv96 commented Jun 2, 2023

@Yokozuna59, let's keep your implementation as close to the current parser as possible. We can add new features and fix bugs once we make the switch. Otherwise it'll be difficult to test.


Regarding testing & verification, unit tests and visual tests will definitely be required and helpful, but they won't cover all the usecases.

Two ideas I had (which seems a bit over engineered) were,

  • Fuzzing the current parsers to find many valid edge case unit tests, and then running that suite on the new parser.
  • Verifying syntax diagrams. But this requires syntax diagrams to be generated from the current jison files, which might be difficult.
image

Anything else?

@sidharthv96
Copy link
Member

If your pie implementation is ready, can you raise a PR like #3432, so we can test the size increase?

@Yokozuna59
Copy link
Member Author

@Yokozuna59, let's keep your implementation as close to the current parser as possible. We can add new features and fix bugs once we make the switch. Otherwise it'll be difficult to test.

Okay, I'll remove the multi line comment for now.

Regarding testing & verification, unit tests and visual tests will definitely be required and helpful, but they won't cover all the usecases.

Two ideas I had (which seems a bit over engineered) were,

  • Fuzzing the current parsers to find many valid edge case unit tests, and then running that suite on the new parser.
  • Verifying syntax diagrams. But this requires syntax diagrams to be generated from the current jison files, which might be difficult.
image Anything else?

I'm not really sure how I can help with that. Before implementing the parser in langium, I tried looking at those jison files, but I wasn't able to understand much or get edge cases. If you point them out I would be more than happy to verify and test them.

@Yokozuna59
Copy link
Member Author

If your pie implementation is ready, can you raise a PR like #3432, so we can test the size increase?

The Langium PR haven't been merged yet, so the title and accessibilities would give wrong result:

- awesome title
+ title awesome title

There is an alternative way to implement it; thanks to @msujew for pointing it out, I'll implement it then create a PR.

@sidharthv96
Copy link
Member

@Yokozuna59 you can focus on langium for now. Don't worry about the verification part. I was sharing that to get input from the community.

@Yokozuna59

This comment was marked as resolved.

@Yokozuna59
Copy link
Member Author

Never mind, all of the above issues were solved in [email protected].

I guess Pie Chart is now completely supported (without directives or yaml).

image

@NicolasCwy
Copy link
Contributor

NicolasCwy commented Mar 22, 2024

@sidharthv96 I would like to convert the Git Graph parser to use Langium, can I tag along this issue or should I create a new one? Perhaps we can add a checklist of the remaining parsers and make this an epic of sorts

@sidharthv96
Copy link
Member

sidharthv96 commented Mar 22, 2024

@NicolasCwy you can start with a draft PR directly, no need to create an issue. :)

The flow what we've seen till now is usually

Stage 1

  • Convert DB to TypeScript
  • Optimise the DB functions to read from AST (if required)
  • Tweak any unit tests to fit the new DB.

Stage 2

  • Add langium parser in the parser package
  • Add a parser file in mermaid that interfaces with the parser package and populates the DB
  • Do not modify any visual tests.

Stage 1 and 2 can be separate PRs, so we can get stuff in easily.

Few reference PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants