Support MkDocs fenced codeblock attributes and dot prefixed language #153

reenberg · 2023-12-20T23:38:30Z

The following three examples of fenced code blocks are valid in MkDocs, acording to the docs: https://github.com/Python-Markdown/markdown/blob/master/docs/extensions/fenced_code_blocks.md

However currently only the first one is highlighted in VS Code, as python code:

If the language is the only attribute, then the dot prefixing and curly braces may 
be omitted.  

``` python hl_lines="1-2 4" title="My title"
range(1..2)
range(1..2)
range(1..2)
range(1..2)
```

The rest of them are not highlighted currently:

Technically the key/value pairs should not be allowed outside of the curly
braces, as I read the docs, but its not really explicit on this.  MkDocs
produces valid output for this example both without and with curly braces.

``` .python hl_lines="1-2 4" title="My title"
range(1..2)
range(1..2)
range(1..2)
range(1..2)
```

When embraced in curly braces then MkDocs dictates that the language must be
prefixed with a dot, but then an HTML `id` can be added and multiple `class`
attributes including key/value pairs.

``` { .python #id .class hl_lines="1-2 4" title="My title" }
range(1..2)
range(1..2)
range(1..2)
range(1..2)
```

If the space is removed between the start curly brace and the dot prefixed language attribute in the last example, then it is matched, due to #57 which added support for Codebraid style Pandoc attributes.

I have been playing around with an updated RegEx that will properly match the above by 1) allowing languages to be dot prefixed, and 2) generalising the Codebraid contribution by removing it as an identifier of the few supported languages and including it in the RegEx so all languages can be surrounded by curly braces:

(^|\\G)(\\s*)([\`~]{3,})\\s*(?i:(?:\\{\\s*\\.?(?<LANG>${identifiers.join('|')})(?<ATTR>(?:\\s+|:|,|\\{|\\?)[^\`\\r\\n]*?)?\\s*\\})|(?:\\.?(\\g<LANG>)(\\g<ATTR>)?))$

I decided to use named scopes in the regex such that I could back reference them in the second scenarios. I don't know if this makes the RegEx slower, compared to explicitly inserting the language and attribute specification twice.

Currently this updated RegEx only changes the test/colorize-results/pr-57_md.json , as it no longer assigns the language scope to the entire sting: "{ .python .cb.nb jupyter_kernel=python3 }", but now it discards the braces and the dot, assigns the language scope to the string "python" (as one would expect), and assigns the attribute scope to the rest: " .cb.nb jupyter_kernel=python3".

The downside currently seems to be that it includes the space in the beginning of the attribute part. However I'm not sure if this is worth using more energy on, as the attributes is not really used for anything as far as I can see, at least now it actually assigns the attribute scope to that example.

The text was updated successfully, but these errors were encountered:

reenberg · 2023-12-21T01:21:18Z

This RegEx seems to do the job of getting rid of the prefixed white space on attributes, and it removes the use of named groups.
This however requires that the new 7th group also be given the attribute scope. The new 6th group doesn't need anything added, as it references the 4th group and thus gets those scopes automatically.

(^|\\G)(\\s*)([\`~]{3,})\\s*(?i:(?:\\{\\s*\\.?(${identifiers.join('|')})(?:\\}|\\s+([^\`\\r\\n]*?)?\\s*\\}))|(?:\\.?(\\g<4>)((?:\\s+|:|,|\\{|\\?)[^\`\\r\\n]*?)?))$

Running the test with this regex yields the following changed files:

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   build.js
        modified:   syntaxes/markdown.tmLanguage
        modified:   test/colorize-results/issue-153-1_md.json
        modified:   test/colorize-results/issue-153-2_md.json
        modified:   test/colorize-results/pr-57_md.json

In microsoft#57 support for Codebraid syntax was added, which essentially is just Pandoc attribute syntax, but with a specific class attribute added. The support was added as an extra `identifier` in the list of languages, for which Codebraid has support, such as for python: `\\{\\.python.+?\\}`. The below example would give the following scope: "text.html.markdown markup.fenced_code.block.markdown fenced_code.block.language.markdown" to the entire line: ```{.python .cb.nb jupyter_kernel=python3} ``` However the "language scope" should only be given to the "python" part, and the current support doesn't allow spaces between the curly braces, and it lacks support for all languages. MkDocs allows a few ways to annotate fenced code blocks, but if additional classes, id or key/value pairs are used, then the curly braces must be used and the language must be prefixed with a dot. In simple cases where only the language is specified, then the curly braces and the dot may be omitted. The following are quick examples: ``` { .python #id .class title="My Title"} ``` or ``` python ``` This change removes the Codebraid support from the specific languages as an `identifier` attribute, and moved into the RegEx by defining it as two alternative cases: surrounded by curly braces or allowing them after the language: 1. The case where the entire line after the code fence is wrapped in curly braces. In this case the curly braces is not part of the language and attribute scope. 2. The case where the attributes follows the language specification in all sorts of ways (I'm specifically thinking of you Gatsby microsoft#62). In this case the curly braces are included in the attribute scope as it is not trivial to handle all the various ways it may be used, and since this is the current behavior. @microsoft-github-policy-service agree Closes microsoft#153 Refs: https://github.com/Python-Markdown/markdown/blob/master/docs/extensions/fenced_code_blocks.md

reenberg linked a pull request Dec 22, 2023 that will close this issue

Generalized the Codebraid support to MkDocs #154

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support MkDocs fenced codeblock attributes and dot prefixed language #153

Support MkDocs fenced codeblock attributes and dot prefixed language #153

reenberg commented Dec 20, 2023

reenberg commented Dec 21, 2023

Support MkDocs fenced codeblock attributes and dot prefixed language #153

Support MkDocs fenced codeblock attributes and dot prefixed language #153

Comments

reenberg commented Dec 20, 2023

reenberg commented Dec 21, 2023