Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any initial spaces or tabs beyond four spaces of indentation will be included in the content, even in interior blank lines #41

Closed
lostenderman opened this issue Dec 12, 2022 · 1 comment

Comments

@lostenderman
Copy link
Owner

See https://spec.commonmark.org/0.30/#example-112

@Witiko
Copy link
Collaborator

Witiko commented Dec 31, 2022

The corresponding unit test is testfiles/CommonMark_0.30/indented_code_blocks/006.test:

%   ---RESULT--- "example": 112,
%   
%   <pre><code>chunk1
%     
%     chunk2
%   </code></pre>
%   
%   ---\RESULT---

<<<
    chunk1
      
      chunk2
>>>
documentBegin
inputVerbatim: ./_markdown_test/d6bc1998e8030f4c5bc9338f5ed4bbf4.verbatim
documentEnd

Here is the result of running git checkout commonmark; cd tests; ./test.sh "testfiles/CommonMark_0.30/indented_code_blocks/006.test":

Testfile testfiles/CommonMark_0.30/indented_code_blocks/006.test
  Format templates/plain/
    Template templates/plain/input.tex.m4
      Command luatex                                   --interaction=nonstopmode  test.tex
*** test-expected.log	2022-12-30 18:57:24.224756791 +0100
--- test-actual.log	2022-12-30 18:57:30.844704748 +0100
***************
*** 1,3 ****
  documentBegin
! inputVerbatim: ./_markdown_test/d6bc1998e8030f4c5bc9338f5ed4bbf4.verbatim
  documentEnd
--- 1,3 ----
  documentBegin
! inputVerbatim: ./_markdown_test/baeeb37d6c9237faa1bcd55e3102e279.verbatim
  documentEnd

On the first glance, this issue seems related to issue #42, where the util.cache_verbatim() function uses the string:gsub('[\r\n%s]*$', '') command to remove trailing newlines and spaces. However, since string is the entire code block and $ corresponds to "the end of the subject string", this does not account for the removal of spaces in the second line of the example code block.

On the second glance, this issue seems related to the parsers.Verbatim PEG pattern for parsing code blocks, specifically the parsers.indentedline PEG pattern. However, the pattern only strips indent from the beginning of lines, not any trailing spaces.

Therefore, I looked into file ./_markdown_test/baeeb37d6c9237faa1bcd55e3102e279.verbatim:

$ cat _markdown_test/baeeb37d6c9237faa1bcd55e3102e279.verbatim | sed 's/ /<space>/g'
chunk1

<space><space>chunk2

This indicates that the issue is with the parsers.blanklines PEG pattern used in patterns.Verbatim, which replaces any blank line, together with spaces, with \n. On the first glance, the solution seems to be to remove parsers.blanklines from patterns.Verbatim and use just parsers.indentedline. However, this would make any non-indented blank lines break a code block, breaking e.g. example 111. Therefore, we may need to replace parsers.blanklines in patterns.Verbatim with the parsers.skipblanklines PEG pattern, which is equivalent to parsers.blanklines but does not do any replacements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants