Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support MkDocs fenced codeblock attributes and dot prefixed language #153

Open
reenberg opened this issue Dec 20, 2023 · 1 comment · May be fixed by #154
Open

Support MkDocs fenced codeblock attributes and dot prefixed language #153

reenberg opened this issue Dec 20, 2023 · 1 comment · May be fixed by #154

Comments

@reenberg
Copy link

The following three examples of fenced code blocks are valid in MkDocs, acording to the docs: https://github.com/Python-Markdown/markdown/blob/master/docs/extensions/fenced_code_blocks.md

However currently only the first one is highlighted in VS Code, as python code:

If the language is the only attribute, then the dot prefixing and curly braces may 
be omitted.  

``` python hl_lines="1-2 4" title="My title"
range(1..2)
range(1..2)
range(1..2)
range(1..2)
```

The rest of them are not highlighted currently:

Technically the key/value pairs should not be allowed outside of the curly
braces, as I read the docs, but its not really explicit on this.  MkDocs
produces valid output for this example both without and with curly braces.

``` .python hl_lines="1-2 4" title="My title"
range(1..2)
range(1..2)
range(1..2)
range(1..2)
```

When embraced in curly braces then MkDocs dictates that the language must be
prefixed with a dot, but then an HTML `id` can be added and multiple `class`
attributes including key/value pairs.

``` { .python #id .class hl_lines="1-2 4" title="My title" }
range(1..2)
range(1..2)
range(1..2)
range(1..2)
```

If the space is removed between the start curly brace and the dot prefixed language attribute in the last example, then it is matched, due to #57 which added support for Codebraid style Pandoc attributes.

I have been playing around with an updated RegEx that will properly match the above by 1) allowing languages to be dot prefixed, and 2) generalising the Codebraid contribution by removing it as an identifier of the few supported languages and including it in the RegEx so all languages can be surrounded by curly braces:

(^|\\G)(\\s*)([\`~]{3,})\\s*(?i:(?:\\{\\s*\\.?(?<LANG>${identifiers.join('|')})(?<ATTR>(?:\\s+|:|,|\\{|\\?)[^\`\\r\\n]*?)?\\s*\\})|(?:\\.?(\\g<LANG>)(\\g<ATTR>)?))$

I decided to use named scopes in the regex such that I could back reference them in the second scenarios. I don't know if this makes the RegEx slower, compared to explicitly inserting the language and attribute specification twice.

Currently this updated RegEx only changes the test/colorize-results/pr-57_md.json , as it no longer assigns the language scope to the entire sting: "{ .python .cb.nb jupyter_kernel=python3 }", but now it discards the braces and the dot, assigns the language scope to the string "python" (as one would expect), and assigns the attribute scope to the rest: " .cb.nb jupyter_kernel=python3".

The downside currently seems to be that it includes the space in the beginning of the attribute part. However I'm not sure if this is worth using more energy on, as the attributes is not really used for anything as far as I can see, at least now it actually assigns the attribute scope to that example.

@reenberg
Copy link
Author

This RegEx seems to do the job of getting rid of the prefixed white space on attributes, and it removes the use of named groups.
This however requires that the new 7th group also be given the attribute scope. The new 6th group doesn't need anything added, as it references the 4th group and thus gets those scopes automatically.

(^|\\G)(\\s*)([\`~]{3,})\\s*(?i:(?:\\{\\s*\\.?(${identifiers.join('|')})(?:\\}|\\s+([^\`\\r\\n]*?)?\\s*\\}))|(?:\\.?(\\g<4>)((?:\\s+|:|,|\\{|\\?)[^\`\\r\\n]*?)?))$

Running the test with this regex yields the following changed files:

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   build.js
        modified:   syntaxes/markdown.tmLanguage
        modified:   test/colorize-results/issue-153-1_md.json
        modified:   test/colorize-results/issue-153-2_md.json
        modified:   test/colorize-results/pr-57_md.json

reenberg added a commit to reenberg/vscode-markdown-tm-grammar that referenced this issue Dec 21, 2023
In microsoft#57 support for Codebraid syntax was added, which essentially is just
Pandoc attribute syntax, but with a specific class attribute added.

The support was added as an extra `identifier` in the list of languages,
for which Codebraid has support, such as for python:
`\\{\\.python.+?\\}`.

The below example would give the following scope: "text.html.markdown
markup.fenced_code.block.markdown fenced_code.block.language.markdown"
to the entire line:

```{.python .cb.nb jupyter_kernel=python3}
```

However the "language scope" should only be given to the "python" part,
and the current support doesn't allow spaces between the curly braces,
and it lacks support for all languages.

MkDocs allows a few ways to annotate fenced code blocks, but if
additional classes, id or key/value pairs are used, then the curly
braces must be used and the language must be prefixed with a dot.  In
simple cases where only the language is specified, then the curly braces
and the dot may be omitted.  The following are quick examples:

``` { .python #id .class title="My Title"}
```

or

``` python
```

This change removes the Codebraid support from the specific languages as
an `identifier` attribute, and moved into the RegEx by defining it as
two alternative cases: surrounded by curly braces or allowing them after
the language:

1. The case where the entire line after the code fence is wrapped in
   curly braces.  In this case the curly braces is not part of the
   language and attribute scope.
2. The case where the attributes follows the language specification in
   all sorts of ways (I'm specifically thinking of you Gatsby microsoft#62).  In
   this case the curly braces are included in the attribute scope as it
   is not trivial to handle all the various ways it may be used, and
   since this is the current behavior.

@microsoft-github-policy-service agree

Closes microsoft#153
Refs: https://github.com/Python-Markdown/markdown/blob/master/docs/extensions/fenced_code_blocks.md
reenberg added a commit to reenberg/vscode-markdown-tm-grammar that referenced this issue Dec 21, 2023
In microsoft#57 support for Codebraid syntax was added, which essentially is just
Pandoc attribute syntax, but with a specific class attribute added.

The support was added as an extra `identifier` in the list of languages,
for which Codebraid has support, such as for python:
`\\{\\.python.+?\\}`.

The below example would give the following scope: "text.html.markdown
markup.fenced_code.block.markdown fenced_code.block.language.markdown"
to the entire line:

```{.python .cb.nb jupyter_kernel=python3}
```

However the "language scope" should only be given to the "python" part,
and the current support doesn't allow spaces between the curly braces,
and it lacks support for all languages.

MkDocs allows a few ways to annotate fenced code blocks, but if
additional classes, id or key/value pairs are used, then the curly
braces must be used and the language must be prefixed with a dot.  In
simple cases where only the language is specified, then the curly braces
and the dot may be omitted.  The following are quick examples:

``` { .python #id .class title="My Title"}
```

or

``` python
```

This change removes the Codebraid support from the specific languages as
an `identifier` attribute, and moved into the RegEx by defining it as
two alternative cases: surrounded by curly braces or allowing them after
the language:

1. The case where the entire line after the code fence is wrapped in
   curly braces.  In this case the curly braces is not part of the
   language and attribute scope.
2. The case where the attributes follows the language specification in
   all sorts of ways (I'm specifically thinking of you Gatsby microsoft#62).  In
   this case the curly braces are included in the attribute scope as it
   is not trivial to handle all the various ways it may be used, and
   since this is the current behavior.

@microsoft-github-policy-service agree

Closes microsoft#153
Refs: https://github.com/Python-Markdown/markdown/blob/master/docs/extensions/fenced_code_blocks.md
reenberg added a commit to reenberg/vscode-markdown-tm-grammar that referenced this issue Dec 22, 2023
In microsoft#57 support for Codebraid syntax was added, which essentially is just
Pandoc attribute syntax, but with a specific class attribute added.

The support was added as an extra `identifier` in the list of languages,
for which Codebraid has support, such as for python:
`\\{\\.python.+?\\}`.

The below example would give the following scope: "text.html.markdown
markup.fenced_code.block.markdown fenced_code.block.language.markdown"
to the entire line:

```{.python .cb.nb jupyter_kernel=python3}
```

However the "language scope" should only be given to the "python" part,
and the current support doesn't allow spaces between the curly braces,
and it lacks support for all languages.

MkDocs allows a few ways to annotate fenced code blocks, but if
additional classes, id or key/value pairs are used, then the curly
braces must be used and the language must be prefixed with a dot.  In
simple cases where only the language is specified, then the curly braces
and the dot may be omitted.  The following are quick examples:

``` { .python #id .class title="My Title"}
```

or

``` python
```

This change removes the Codebraid support from the specific languages as
an `identifier` attribute, and moved into the RegEx by defining it as
two alternative cases: surrounded by curly braces or allowing them after
the language:

1. The case where the entire line after the code fence is wrapped in
   curly braces.  In this case the curly braces is not part of the
   language and attribute scope.
2. The case where the attributes follows the language specification in
   all sorts of ways (I'm specifically thinking of you Gatsby microsoft#62).  In
   this case the curly braces are included in the attribute scope as it
   is not trivial to handle all the various ways it may be used, and
   since this is the current behavior.

@microsoft-github-policy-service agree

Closes microsoft#153
Refs: https://github.com/Python-Markdown/markdown/blob/master/docs/extensions/fenced_code_blocks.md
@reenberg reenberg linked a pull request Dec 22, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant