Skip to content
This repository has been archived by the owner on Jul 5, 2021. It is now read-only.

Linkchecker #27

Closed
wants to merge 1 commit into from
Closed

Conversation

dylanPowers
Copy link
Contributor

I found a tool in python that checks for dead or invalid links. I went ahead and wrote up some directions on how to use it while also setting up a small configuration file.

This fixes #13

## Linkchecker
The "linkchecker" verifies that no dead links are present on the website.
`linkchecker` is a python app that can be acquired through pip by running
`pip install LinkChecker`. To verify that there aren't any dead links present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof. how complicated is it? could we have a go replacement?

also, ipfs is reliable enough to start "vendoring" deps to ipfs itself.

(having to manually run pip is frustrating and complicates usage and CI)

@harlantwood
Copy link

Hm, how bout a nodejs solution, as the website is already built by node tools?

Two options from a quick google:

@jbenet
Copy link
Contributor

jbenet commented Jun 8, 2015

@harlantwood those sounds good to me. thoughts @dylanPowers ?

@dylanPowers
Copy link
Contributor Author

I checked out the node implementations first and for the most part they weren't very feature complete and buggy.

This python implementation is the most mature and feature complete at this point in time. It's also a non-trivial app https://github.com/wummel/linkchecker/tree/master/linkcheck

Using go would still introduce another dependency so I'm not sure that's a great solution either, but there are some options there as well that could be looked at.

@whyrusleeping
Copy link
Contributor

python is installed by default on pretty much every linux machine. I dont really think having it as a dependency is all that bad

@harlantwood
Copy link

FWIW this is the patch to get broken-link-checker installing -- just posted to his repo --

For now you can put this in your package.json to get the head from github:

"broken-link-checker": "stevenvachon/broken-link-checker"

@harlantwood
Copy link

But admittedly its not mature. I agree that python dependency is not so bad, tho it would be elegant to have a node solution.

@jbenet
Copy link
Contributor

jbenet commented Jun 10, 2015

I don't want to depend on pip. it can be a nightmare. can we check the deps we need into ipfs and seed them? (assume python, just the modules)

@dylanPowers
Copy link
Contributor Author

Update on the node options:

@harlantwood
Copy link

Maybe this will help? From the simplecrawler docs:

crawler.filterByDomain - Specifies whether the crawler will restrict queued requests to a given domain/domains.

@dylanPowers
Copy link
Contributor Author

From my interpretation of the docs, filterByDomain means that the crawler will crawl the entire external website. Also mentioned in simplecrawler/simplecrawler#114. A pull request has been made to enable the intended functionality we want in simplecrawler/simplecrawler#74 (not quite what is wanted). I'm going to go ahead and fork + merge that pull request and see if it works.
I'll also look into that bug I'm running into. At the moment, it looks like it will be easier to do that than try to get rid of pip in the python app.

Update:
https://www.npmjs.com/package/grunt-link-checker simply isn't going to work. node-simple crawler isn't a good use for a link checker because it's messy in the way it finds links. It's using regexs rather than actually parsing the html so it creates a lot of false positives and doesn't understand things like base tags (simplecrawler/simplecrawler#57). So back to the python app.

@jbenet jbenet closed this Sep 14, 2015
@jbenet jbenet removed the codereview label Sep 14, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hotfix: Add tests to check for broken links
4 participants