Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-define lychee as both a link checker and a link linter #269

Closed
lebensterben opened this issue Jun 24, 2021 · 5 comments
Closed

Re-define lychee as both a link checker and a link linter #269

lebensterben opened this issue Jun 24, 2021 · 5 comments

Comments

@lebensterben
Copy link
Member

Currently, lychee is just a link checker, which conceptually has the following components:

  1. It parses a config file or CLI argument, and turns it into an internal configuration
  2. An overseer is created given the configurations. It serves several purposes:
    • Detection: It detects/scrape HTML links and EMAIL addresses from the input file(s)
    • Filtering: It checks whether the link should be checked or skipped
    • Dispatcher: It sends URLs to workers who verify whether a given URL is valid by checking whether it's accessible.
    • Logging: It keeps an record of URL it encounters and the status of each URL. (Valid, Ignored, Timeout, etc)
  3. The result is returned either as CLI output or to a log file.

We can extend lychee so that it's also a link linter.

Note that the overseer hands the job to its workers, and workers are only responsible for validating the URLs. The linting facility should also be done by overseer.

Conceptually, linting could happen before and/or after the overseer dispatches the job. For example:

  • Before: If the user wants to deny a certain pattern in URL, the overseer don't need to send URLs that match the pattern to the workers, instead it directly logs it.
  • After: If the user wants to avoid the use of absolute/relative links, the overseer first verify whether a link is valid, and if it is, then it check whether it's in the desired style before logging.
  • Around (before and after): If the user wants to use HTTPS links whenever possible, the overseer first sends an HTTP link to workers, and if it's valid the overseer resends it HTTPS counterpart the workers, and logs the result accordingly.
@untitaker
Copy link
Collaborator

This is labelled as design-feedback because it already goes into the nitty-gritty of which code unit does what, but from what i can tell what is actually proposed here is to expand lychee's feature scope such that it can be used to enforce policies on HTML, such as the already mentioned "need to use HTTPs links everywhere" and "links can't match this pattern".

You can already enforce some basic pattern-policies today: Use --dump and a wrapper shell script. What can't be done is the HTTPs enforcement in the way you imagine it.

Is the list of policies in your OP exhaustive?

@mre
Copy link
Member

mre commented Feb 4, 2022

@lebensterben any thoughts?

@untitaker
Copy link
Collaborator

prior art: https://github.com/wjdp/htmltest

@mre
Copy link
Member

mre commented Feb 4, 2022

Link validation is a whole other use-case with a lot of design decisions to consider along the way. We have to be careful to keep the scope manageable. I guess we can commit to the following:

Outside of that, I'd probably defer to other tools (e.g. htmltest that @untitaker mentioned) or workarounds using --dump for now.

@untitaker
Copy link
Collaborator

I checked OP again, what you currently cannot do is hook into before/after link traversal for linting, or define your own link extraction logic. --dump is insufficient there.

eg you may want to lint a link and based on the linting decide whether to follow the link.

I wonder if OP meant to build a scripting platform on top of lychee where the user could hook custom logic into any of those stages, and that's why internals are discussed so much in detail.

@lycheeverse lycheeverse locked and limited conversation to collaborators Dec 19, 2022
@mre mre converted this issue into discussion #880 Dec 19, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants