Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-matching backticks in a code span should produce literal backticks #18

Closed
lostenderman opened this issue Dec 12, 2022 · 1 comment
Closed

Comments

@lostenderman
Copy link
Owner

See https://spec.commonmark.org/0.30/#example-347

@Witiko
Copy link
Collaborator

Witiko commented Dec 31, 2022

The corresponding unit test is testfiles/CommonMark_0.30/code_spans/020.test:

%   ---RESULT--- "example": 347,
%   
%   <p>```foo``</p>
%   
%   ---\RESULT---

<<<
```foo``
>>>
documentBegin
documentEnd

Here is the result of running git checkout commonmark; cd tests; ./test.sh "testfiles/CommonMark_0.30/code_spans/020.test":

Testfile testfiles/CommonMark_0.30/code_spans/020.test
  Format templates/plain/
    Template templates/plain/input.tex.m4
      Command pdftex   --shell-escape                  --interaction=nonstopmode  test.tex
*** test-expected.log	2022-12-22 11:58:53.043894103 +0100
--- test-actual.log	2022-12-22 11:58:59.643838308 +0100
***************
*** 1,2 ****
--- 1,3 ----
  documentBegin
+ codeSpan: foo
  documentEnd

The issue seems related to the PEG patterns for parsing code spans (1, 2), which use the rules for Gruber's Markdown rather than CommonMark.

Since the function captures_equal_length() should ensure that only backticks of equal length are matched, I was curious where the extra backtick went and I investigated the TeX output:

$ cat _markdown_test/1a7d570185be4954038cebd9eb645730.md.tex
\markdownRendererDocumentBegin
`\markdownRendererCodeSpan{foo}\markdownRendererDocumentEnd

This indicates that the leading backtick is interpreted as a line character and does not prevent any upcoming backticks from being part of a code span.

The walkable_syntax.Inline rule will try matching the parsers.Str PEG pattern first and will only match the parsers.Symbol PEG pattern as a last resort. The parsers.Str PEG pattern matches all non-special characters, whereas the parsers.Symbol pattern will match special characters, including backticks. In the above example, the leading backtick was matched using the parsers.Symbol PEG pattern. Therefore, it seems that we need to add an exception to the parsers.Symbol PEG pattern and have it consume any number of backticks rather than just a single backtick, which should prevent any immediately following code span from being matched.

This issue has inspired pull request Witiko#239, which improves the speed of parsing. Unless I overlooked something, the fix suggested above should not interact with this change and should work regardless of whether you merge the latest upstream main branch into the commonmark branch or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants