Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

link text containing backtick code doesn't work for []::() #1209

Closed
ericseppanen opened this issue Jan 8, 2022 · 12 comments · Fixed by #1210
Closed

link text containing backtick code doesn't work for []::() #1209

ericseppanen opened this issue Jan 8, 2022 · 12 comments · Fixed by #1210
Labels
bug Bug report. confirmed Confirmed bug report or approved feature request.

Comments

@ericseppanen
Copy link

With Markdown 3.3.6, the following markdown link doesn't render properly (no output is generated):

[Hello `[T]::world()` link](https://example.com)

If I remove the inner right-bracket ] or the following colon :, or the open parenthesis (, then it does render something.

commonmark 0.9.1 is able to render this link as expected:

<a href="https://example.com">Hello <code>[T]::world()</code> link</a>
@facelessuser
Copy link
Collaborator

@ericseppanen
Copy link
Author

I found a bunch of examples of this bug in the "This Week in Rust" repository, which has a lot of unordered lists containing only links, so the link is the only thing in the enclosing tag (an <li>).

@ericseppanen
Copy link
Author

I noticed another example, where there are no parentheses:

  • broken: [Hello `[T]::world`](https://example.com)
  • works: [Hello `[T]::world `](https://example.com) (note space before closing backtick)

@facelessuser
Copy link
Collaborator

This is a weird issue for sure, and I doubt it is something as simple as the link pattern needs improvement because it does work, depending on unrelated context. I suspect that something else is causing the content to get dropped, but it will require a deep dive to debug what is going on.

@facelessuser
Copy link
Collaborator

facelessuser commented Jan 9, 2022

Well, I know what is causing the issue now, I'm just not sure how to fix it yet. The block processor called ReferenceProcessor is responsible for swallowing this content before the link processor even has a chance.

I'll have to try and figure out how we can work around this. I probably won't have a solution this weekend, but this is a good starting point as we now have a root cause.

@facelessuser facelessuser added bug Bug report. confirmed Confirmed bug report or approved feature request. labels Jan 9, 2022
@waylan
Copy link
Member

waylan commented Jan 10, 2022

This is a very specific edge case which requires the internal square brackets, the colon and (optionally) the internal parentheses. Without any one of those specific characters in that specific order, this would not be an issue. For the record, the example ([Hello `[T]::world()` link](https://example.com)) gets parsed into these parts:

Reference: [Hello `[T]:
URL: :world
Title: ()` link](https://example.com)

And the second example ([Hello `[T]::world`](https://example.com)) into these parts:

Reference: [Hello `[T]:
URL: :world`](https://example.com)

I should note that between the square brackets of the reference we do not allow a closing square bracket (regex: \[([^\]]*)\]:). If this is correct (needs verification), then we should simply not allow either opening or closing square brackets (\[([^\[\]]*)\]:), which would be an easy fix.

However, if it is wrong to exclude nested square brackets from references, then I see a few possible workarounds:

  1. Make it backtick aware. It should ignore anything between backtick pairs within the reference.
  2. Make it nested bracket aware. It should recognize that a closing bracket is closing the child opening bracket, not the parent bracket.
  3. Only match valid URLs as URLs (we currently only match any non-whitespace characters).

Any of those solutions would drastically complicate the regex we use and/or require us to not use a single regex for the entire match.

@waylan
Copy link
Member

waylan commented Jan 10, 2022

Whether square brackets are allowed within references seems to be mixed (see Babelmark).

I find the strangest result to be the reference implementation (markdown.pl) which does not recognize the link as pointing to the reference, but it does recognize the reference and remove it from the document. It is clear that there is a bug there, but is the bug with the link or the reference? It is impossible to say for sure.

Also of interest is the large number of implementations which error out with this very simple input. It seems that for the most part it is not expected that square brackets be nested inside references. The few implementations which do support it are the ones which already have the reputation of being more forgiving that average. Therefore, I am comfortable excluding the opening square bracket. We already disallow the closing square bracket anyway, so this is consistent with our existing behavior.

@ericseppanen
Copy link
Author

I'm not sure if This Week in Rust is an extreme outlier, but we use code backticks inside link text quite a lot (which often includes square brackets and other interesting characters), and it looks like we've been using them (mostly) successfully since about 2013.

Today, it works about 99% of the time (because ]: is somewhat rare.) Removing support for [] entirely would have a pretty significant effect on our website & newsletter workflow.

If we're going to be collateral damage, then I guess that's the way it goes, but I hope you understand that this would be a significant breaking change for us.

@facelessuser
Copy link
Collaborator

Today, it works about 99% of the time (because ]: is somewhat rare.) Removing support for [] entirely would have a pretty significant effect on our website & newsletter workflow.

I think you are misunderstanding. The change would apply to the reference pattern, not the link pattern.

  • References: [id]: link
  • LInk: [content](link)

We would restrict brackets in the id within references.

@ericseppanen
Copy link
Author

I think you are misunderstanding. The change would apply to the reference pattern, not the link pattern.

Ah, thank you for the clarification! Sorry for my misunderstanding.

@waylan
Copy link
Member

waylan commented Jan 10, 2022

The change would apply to the reference pattern, not the link pattern.

Yes, that is correct. Of note is the fact that reference ids should generally be rather plain. While we do support the shortcut where the id is the link label ([label]) we also support the full syntax ([label][id]), which can be used when a label is more complex. Therefore, we don't loose any functionality with this restriction. In case it's not clear, I'm speaking of the syntax of the link in the document body which would point to a reference.

waylan added a commit to waylan/markdown that referenced this issue Jan 10, 2022
We already disallow right square brackets. This also disallows left
square brackets, which ensures link references will be less likely
to collide with standard links in some weird edge cases. Fixes Python-Markdown#1209.
waylan added a commit that referenced this issue Jan 10, 2022
We already disallow right square brackets. This also disallows left
square brackets, which ensures link references will be less likely
to collide with standard links in some weird edge cases. Fixes #1209.
@ericseppanen
Copy link
Author

I verified that the committed fix works on all of the problematic examples in the TWiR repository.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report. confirmed Confirmed bug report or approved feature request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants