-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser does not preserve whitespaces when parsing nested code blocks. #177
Comments
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Still valid.
…On Tue, 26 Jan 2021 at 17:37, stale[bot] ***@***.***> wrote:
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#177 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVA3OYMW7DNMKI3G6HYAQ3S334XFANCNFSM4VKXATUQ>
.
|
It will actually necessitate huge refactor :( Basically, it treats tabs as spaces, at one point in program it counts the number of spaces, takes tabs as spaces, and then on other point it puts the spaces back, incorrectly. If I get the flow right. The way I look at the code, when it reads the text in text/reader.go, it first detects the padding (by util/util.go IndentPosition), and then it puts the padding "back" in text/segment.go Value, where it adds Padding number of spaces to left. This breaks however in code blocks, where there is a difference between tabs and spaces. maybe I will be able to convince IndentPosition to ignore the \t paddings? but then it will randomly break when someone uses tabs for indenting the items... |
however I am a bit confused why that doesn't happen without the item list. |
oh, the util... gets called just in listItemParser. OK |
Yeah I got why 2 spaces instead of tab now Goldmark treats tabs like 4 spaces, but normally in text (not in code block) because normally if you have
those are at same width. so anyway when you have Ugh. No idea how to even approach this |
I guess there can be a way for the code block parser to "cheat", and to look if the padding spaces are actually not spaces and they are tabs? We still have the buffer, we can look at the "padding positions" if they aren't actually tabs?
|
And btw, what would happen if following happens:
what even should be in the code block? .... I will look at commonmark, how it handles this tab/space mess |
Thanks for debugging! So: There is a way to hack markdownfmt to.. hide this? 🤔 |
I don't even know what should the correct behavior be :D |
Oh that's easy: |
I think it will require change in goldmark, possibly not backwards compatible (requiring new version), I am not sure :D |
Although I am not sure why it is not deterministic in markdownfmt, but I guess that's a different story xD (sorry @yuin for all the spam) |
Note that this is a breaking change and will require new goldmark major version. I have tried to fix problem with leading tabs in fenced code blocks (and probably normal code blocks too). Important note - tabs do not behave like "just 4 spaces". They "finish" 4 space columns. So tab can behave like anything between 1 space to 4 spaces, depending on position. If you have MD like this (. represents space, [tb] , [t] or [] tabs) ``` *.some.text ..``` ..foo ..[]foo ..``` ``` you expect the tab to be kept in the code. This did not work properly in goldmark and I fixed that. However, if you have a code like this ``` *.some.text ..``` ..foo .[t]foo ..``` ``` what should happen? I decided that it should be two spaces, as the tab is not "completely" in the code block. Similarly, what should happen in this case ``` *.some.text ..``` ..foo .[t][tb]foo ..``` ``` I decided that it should be first three spaces and then tab. Not sure what even is the correct solution here... The crux of the fix is - text segments don't have just padding, but also remember what chars is the padding and then print that, if they are called to do so in the code blocks. In other cases, the paddingChars are ignored. This should fix yuin#177 .
Note that this is a breaking change and will require new goldmark major version. I have tried to fix problem with leading tabs in fenced code blocks (and probably normal code blocks too). Important note - tabs do not behave like "just 4 spaces". They "finish" 4 space columns. So tab can behave like anything between 1 space to 4 spaces, depending on position. If you have MD like this (. represents space, [tb] , [t] or [] tabs) ``` *.some.text ..``` ..foo ..[]foo ..``` ``` you expect the tab to be kept in the code. This did not work properly in goldmark and I fixed that. However, if you have a code like this ``` *.some.text ..``` ..foo .[t]foo ..``` ``` what should happen? I decided that it should be two spaces, as the tab is not "completely" in the code block. Similarly, what should happen in this case ``` *.some.text ..``` ..foo .[t][tb]foo ..``` ``` I decided that it should be first three spaces and then tab. Not sure what even is the correct solution here... The crux of the fix is - text segments don't have just padding, but also remember what chars is the padding and then print that, if they are called to do so in the code blocks. In other cases, the paddingChars are ignored. This should fix yuin#177 .
I'm afraid to say, I'm up to my neck in work every day. So I can not have time for this project. I promise to see this issue in the future. |
Note that this is a breaking change and will require new goldmark major version. I have tried to fix problem with leading tabs in fenced code blocks (and probably normal code blocks too). Important note - tabs do not behave like "just 4 spaces". They "finish" 4 space columns. So tab can behave like anything between 1 space to 4 spaces, depending on position. If you have MD like this (. represents space, [tb] , [t] or [] tabs) ``` *.some.text ..``` ..foo ..[]foo ..``` ``` you expect the tab to be kept in the code. This did not work properly in goldmark and I fixed that. However, if you have a code like this ``` *.some.text ..``` ..foo .[t]foo ..``` ``` what should happen? I decided that it should be two spaces, as the tab is not "completely" in the code block. Similarly, what should happen in this case ``` *.some.text ..``` ..foo .[t][tb]foo ..``` ``` I decided that it should be first three spaces and then tab. Not sure what even is the correct solution here... The crux of the fix is - text segments don't have just padding, but also remember what chars is the padding and then print that, if they are called to do so in the code blocks. In other cases, the paddingChars are ignored. This should fix yuin#177 .
Note that this is a breaking change and will require new goldmark major version. I have tried to fix problem with leading tabs in fenced code blocks (and probably normal code blocks too). Important note - tabs do not behave like "just 4 spaces". They "finish" 4 space columns. So tab can behave like anything between 1 space to 4 spaces, depending on position. If you have MD like this (. represents space, [tb] , [t] or [] tabs) ``` *.some.text ..``` ..foo ..[]foo ..``` ``` you expect the tab to be kept in the code. This did not work properly in goldmark and I fixed that. However, if you have a code like this ``` *.some.text ..``` ..foo .[t]foo ..``` ``` what should happen? I decided that it should be two spaces, as the tab is not "completely" in the code block. Similarly, what should happen in this case ``` *.some.text ..``` ..foo .[t][tb]foo ..``` ``` I decided that it should be first three spaces and then tab. Not sure what even is the correct solution here... The crux of the fix is - text segments don't have just padding, but also remember what chars is the padding and then print that, if they are called to do so in the code blocks. In other cases, the paddingChars are ignored. This should fix yuin#177 .
I fixed it in #187 However it is a breaking change because I needed to change edit: hm, seems I fixed only one of the two bugs. edit2: opened a second PR for the second bug |
Fix for independent issue in yuin#177 Not sure why is this needed, but it works :)
Fix for independent issue in yuin#177 Not sure why is this needed, but it works :)
Fix for independent issue in yuin#177
Fix for independent issue in yuin#177 The next parser should have the whitespace removed, even when it's blank
@karelbilek @bwplotka Could you confirm this issue is fixed? |
OK then! thanks. Then we can close. (or maybe add those test cases too but
as you wish).
Thanks again.
…On Sun, 7 Feb 2021 at 18:33 Yusuke Inuzuka ***@***.***> wrote:
I've tested test cases in #188 <#188>
with current head(2ffadce
<2ffadce>).
It passes all test cases.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#177 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAZT4KVM2TAHKMGJSZDRE3S5Z275ANCNFSM4VKXATUQ>
.
|
@karelbilek, Thanks for your contribution! |
Amazing! |
Fix for independent issue in yuin#177 The next parser should have the whitespace removed, even when it's blank
@yuin that branch is still needed. The tests were wrong, sorry. (It's hard to check as there is whitespace....) Now I fixed the tests so they are actually testing the bug. |
Thank you for the amazing project! 🤗
I think I found a small bug, which is a bit annoying in our markdown formatting project. Particular problem is showcased in this draft PR
v1.1.24
and latest6c741ae251abd461bb7b5ce28e7df7a9306bd005
go version go1.15 linux/amd64
go version go1.15 linux/amd64
Parsed nested code block (valid markdown):
(Note strict whitespace in above md, especially line
<space><space>\t@$(GOIMPORTS) <args>
)goldmark
renderer.Renderer.Render(...)
method'sn ast.Node
has correct structure. Howeverlines
inast.FencedCodeBlock
has wrong whitespace (somehow codeblock being fenced affects things).Particularly: Lines in the node should have exactly the same bytes, so for example line 3 should be
<space><space>\t@$(GOIMPORTS) <args>
See repro test below.
Line 1 and 3 has some semi-random spaces instead what provided in parsed markdown.
Particularly line 3 has
<space><space>@$(GOIMPORTS) <args>
. See repro test below.YES
Repro go test:
I am pretty sure it's somehow easy to fix (:
The text was updated successfully, but these errors were encountered: