Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for broken full reference links in Markdown #456

Open
norswap opened this issue Jan 11, 2022 · 10 comments
Open

Check for broken full reference links in Markdown #456

norswap opened this issue Jan 11, 2022 · 10 comments
Labels
enhancement New feature or request

Comments

@norswap
Copy link

norswap commented Jan 11, 2022

Currently lychee does not check for bad "link references" (not sure if this is the proper name), e.g.

This is a [link text][link-ref].

[link-ref-with-typo]: https://nyan.cat

It would be great if that was caught!

@lebensterben
Copy link
Member

lebensterben commented Jan 11, 2022

First, this is not the correct syntax for link reference. See
https://github.github.com/gfm/#link-reference-definitions

It should be

[foo]: /url "title"

[foo]

Second, we relies on pulldown-cmark to parse hyperlinks in markdown documents.
Thankfully, it's able to find broken links via https://docs.rs/pulldown-cmark/latest/pulldown_cmark/struct.Parser.html#method.new_with_broken_link_callback

But if you want to identify possible 'typo's, the job would be harder as we may need to calculate some text distances.

@mre
Copy link
Member

mre commented Jan 11, 2022

Oh, I did not know about new_with_broken_link_callback! Thanks for mentioning it.
We could start by printing a warning for broken links? That would already be a step into the right direction.

@norswap
Copy link
Author

norswap commented Jan 11, 2022

@lebensterben These types of links have been in Markdown since the very start (https://daringfireball.net/projects/markdown/syntax#link) and have always been supported by Github. They are in fact specified in the document you linked (https://github.github.com/gfm/#example-535). What you linked is the reference for the link definition (the [ref]: link part).

@mre A warning meaning it does not cause the program to return a non-zero code?

@lebensterben
Copy link
Member

@norswap
Thanks for pointing that out. It's a valid full reference link. https://github.github.com/gfm/#full-reference-link

I suggest to change the title of this issue accordingly.

@mre
Copy link
Member

mre commented Jan 12, 2022

@mre A warning meaning it does not cause the program to return a non-zero code?

Good point. We can actually treat it as an error. After all, the link is broken.

@norswap norswap changed the title Check for invalid link references Check for invalid full reference links Jan 12, 2022
@mre mre added the enhancement New feature or request label Feb 4, 2022
@mre mre changed the title Check for invalid full reference links Check for broken full reference links in Markdown Jun 22, 2022
@nuke-web3
Copy link

Would love to see this supported https://spec.commonmark.org/0.31.2/#full-reference-link has the full variety of syntax that should/must be supported.

Example of another link checker that uses new_with_broken_link_callback for reference: https://github.com/becheran/mlc/blob/b0cb310fda856cf4a7734bfa6bca20029ffcf89b/src/link_extractors/markdown_link_extractor.rs#L12-L20 (incomplete impl though )

In minimal testing so far, I can see that any [text][id] in the main text without a matching [id]: ... in the input will hit the callback, but will completely skip any unreferenced [not-used-in-doc]: ... reference. Not sure if that is the behavior we want... as a lint I would love to get a warning if that is the case as I probably meant to use the link or deleted where it was used. Checking if the link works would be good too (perhaps as a CLI/config option)

@nuke-web3
Copy link

Hitting this issue again in a different way: I want to use a single source of truth for all links in a dedicated page & use that in rendering the HTML from mdBook

Example:

<!-- 
This file contains all links - internal and external - used in the Book, and thus serves as the master reference link source for all files.

It needs to be postfixed on all pages:

{{#include reference/links.md:15:}}

For more info, see:
- Markdown Reference Links: https://markdownguide.offshoot.io/basic-syntax/#reference-style-links
- Including files in mdBook: https://rust-lang.github.io/mdBook/format/mdbook.html#including-files for more info
-->

[term-boundless]: ./glossary.md#boundless-market
[glossary]: ./glossary.md
[reference]: ./reference.md

But none of these links are checked at all, as they are not used in the page itself, only latter at render time when they are injected into a page that requires the links

@mre
Copy link
Member

mre commented Nov 11, 2024

Haven't tested it, but did you try lychee --include-verbatim -vvv?

@nuke-web3
Copy link

nuke-web3 commented Nov 11, 2024

lychee --include-verbatim -vvv run on a file with the above content:

 lychee --include-verbatim -vvv l.md
     [200] https://rust-lang.github.io/mdBook/format/mdbook.html#including-files
     [200] https://markdownguide.offshoot.io/basic-syntax/#reference-style-links

🔍 2 Total (in 0s) ✅ 2 OK 🚫 0 Errors

Thus all reference links are omitted [reference]: ./reference.md and others. Also tried [reference]: www.example.com and that is omitted too

@mre
Copy link
Member

mre commented Nov 12, 2024

Okay, I see the problem now. Maybe we need to start by setting pulldown_cmark::Options::ENABLE_FOOTNOTES. We could check what it returns then. Probably needs some more tweaking after that to detect/include those links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants