-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown support and whitespace handling #404
Comments
This means that applying a process where we parse both the grammar and sample text with commonmark before handing the text over to the current Cicero parser will fail in some cases. this can be seen on the current branch for markdown support at: https://github.com/accordproject/cicero/tree/ds-markdown-support |
Some possible workaround:
|
This sounds like an generalised instance of this issue, #197 It should be straightforward to relax the Cicero parser to be tolerant of leading and trailing whitespace, right? |
It may depend on the definition of straightforward... In the example above, the "trailing" whitespace in the grammar is somewhere in between |
Could we not just run |
It's possible, my concern is how do those interact with whitespace rules from commonmark. Hypothetically, the inline clause could have a paragraph creation, something like:
With a corresponding text that looks as follows:
In which case, we may not want to simply trim the end of the inline clause. Although of course we could argue this is not a well formed example. In other words, it would be nice if whatever rules we chose would play nice with markdown rules. |
Also: we will have to apply a symmetric |
A couple of other idea we could try to improve the situation:
Note: I believe we use code blocks in the rich text editor to identify nested clause text, so we might want to apply a whitespace normalization strategy which is consistent with that. |
I am going to experiment with some new code that generates a grammar direct from the commonmark AST. I think this will give us much more control over how we define the grammar to align whitespace handling. It will also allow us to handle commonmark lists properly, so that the parser is aware of them. |
I now have code that can parse Commonmark to a JSON and validate the JSON using a CTO model (CommonMark Model): The pipeline is:
The CommonMark CTO model is also published here:
Using this CTO-based intermediate model will make the core The next step is to In an upstream project we can then create a visitor that converts an instance of the CommonMark Model to a Slate DOM. |
I've merged the changes into the master branch for Next I will start work on a new Nearley parser generator in |
The Nearley parser generation has raised some interesting questions for lists. E..g If we have a static list in the
vs. a list that references an array type:
When In the second example the generated parser could make the list optional and allow any number of entries, binding each entry to the array. |
This might be an instance where the kind of "whitespace control" offered by Handlebars help: https://handlebars-draft.knappi.org/guide/expressions.html#whitespace-control I'm still unsure whether there isn't a simpler way to provide for the right kind of whitespace behaviour without adding that level of complexity. |
Can we close this now @jeromesimeon ? |
I think we can consider this resolved in the new parser, which follows strict commonmark whitespace parsing rules. |
There are a number of issues with whitespace occurring when trying to parse grammar and templates through commonmark.
The commonmark parser applies (arguably arcane) whitespace rules when parsing the source text containing markdown.
For instance, the whitespace at the end of a paragraph is stripped. This can be seen on the CommonMark "Dingus"
From the snippet (note the trailing whitespace after
here.
):the corresponding commonmark AST will remove the whitespace:
This can be problematic when trying to apply the same rules inside a template grammar. An example of that can be found with the
copyright-notice
which has a nested clause with some trailing whitespace:Will be parsed by commonmark as follows:
The relevant part is the whitespace just before the end of the
paymentClause
:Which is not stripped, while the corresponding sample will have that whitespace stripped.
returns the AST:
Notice the
bank transfer.</text>
which does not have whitespace after the.
The text was updated successfully, but these errors were encountered: