Non-matching backticks in a code span should produce literal backticks #18

lostenderman · 2022-12-12T20:20:49Z

See https://spec.commonmark.org/0.30/#example-347

Witiko · 2022-12-31T00:17:14Z

The corresponding unit test is testfiles/CommonMark_0.30/code_spans/020.test:

%   ---RESULT--- "example": 347,
%   
%   <p>```foo``</p>
%   
%   ---\RESULT---

<<<
```foo``
>>>
documentBegin
documentEnd

Here is the result of running git checkout commonmark; cd tests; ./test.sh "testfiles/CommonMark_0.30/code_spans/020.test":

Testfile testfiles/CommonMark_0.30/code_spans/020.test
  Format templates/plain/
    Template templates/plain/input.tex.m4
      Command pdftex   --shell-escape                  --interaction=nonstopmode  test.tex
*** test-expected.log	2022-12-22 11:58:53.043894103 +0100
--- test-actual.log	2022-12-22 11:58:59.643838308 +0100
***************
*** 1,2 ****
--- 1,3 ----
  documentBegin
+ codeSpan: foo
  documentEnd

The issue seems related to the PEG patterns for parsing code spans (1, 2), which use the rules for Gruber's Markdown rather than CommonMark.

Since the function captures_equal_length() should ensure that only backticks of equal length are matched, I was curious where the extra backtick went and I investigated the TeX output:

$ cat _markdown_test/1a7d570185be4954038cebd9eb645730.md.tex
\markdownRendererDocumentBegin
`\markdownRendererCodeSpan{foo}\markdownRendererDocumentEnd

This indicates that the leading backtick is interpreted as a line character and does not prevent any upcoming backticks from being part of a code span.

The walkable_syntax.Inline rule will try matching the parsers.Str PEG pattern first and will only match the parsers.Symbol PEG pattern as a last resort. The parsers.Str PEG pattern matches all non-special characters, whereas the parsers.Symbol pattern will match special characters, including backticks. In the above example, the leading backtick was matched using the parsers.Symbol PEG pattern. Therefore, it seems that we need to add an exception to the parsers.Symbol PEG pattern and have it consume any number of backticks rather than just a single backtick, which should prevent any immediately following code span from being matched.

This issue has inspired pull request Witiko#239, which improves the speed of parsing. Unless I overlooked something, the fix suggested above should not interact with this change and should work regardless of whether you merge the latest upstream main branch into the commonmark branch or not.

lostenderman added the code spans label Dec 12, 2022

lostenderman mentioned this issue Mar 2, 2023

Code spans #128

Merged

lostenderman closed this as completed May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-matching backticks in a code span should produce literal backticks #18

Non-matching backticks in a code span should produce literal backticks #18

lostenderman commented Dec 12, 2022

Witiko commented Dec 31, 2022

Non-matching backticks in a code span should produce literal backticks #18

Non-matching backticks in a code span should produce literal backticks #18

Comments

lostenderman commented Dec 12, 2022

Witiko commented Dec 31, 2022