Skip to content

Commit

Permalink
Fixed #7: some pages incorrectly marked not valid HTML
Browse files Browse the repository at this point in the history
  • Loading branch information
fallax committed Nov 20, 2024
1 parent 956e022 commit d997f86
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 6 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,12 @@ To check all the links within a live web page, skipping over internal links:
curl https://test.com | sed -n 's/.*href="\(h[^"]*\).*/\1/p' | coroner -s test.com
```

Check all links within markdown files in a folder or its subfolders recursively:

```
find . -name "*.md" -not \( -name .svn -prune -o -name .git -prune \) -type f -print0 | xargs -0 sed -n 's/.*(\(http[^)]*\).*/\1/p' | coroner
```

## Options

Options are:
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "coroner",
"version": "1.0.5",
"version": "1.0.6",
"type": "commonjs",
"exports": "./index.js",
"description": "more useful dead link detection",
Expand Down
10 changes: 5 additions & 5 deletions tests.js
Original file line number Diff line number Diff line change
Expand Up @@ -160,11 +160,11 @@ const tests = [
statusCodes: [200, 304],
name: 'validHTML', // Check the response body looks like a valid HTML document
test: (input, options, response, responseText) =>
!responseText.slice(0, 30).trim().toLowerCase().startsWith("<!doctype html") &&
!responseText.slice(0, 30).trim().toLowerCase().startsWith("<html") &&
!responseText.slice(0, 30).trim().toLowerCase().startsWith("<!--") &&
!responseText.slice(0, 60).trim().toLowerCase().startsWith("<?xml version=\"1.0\" encoding=\"utf-8\"?><!doctype html"),
reason: (input, options, response, responseText) => "Does not look like a valid HTML file: first characters are " + responseText.slice(0, 60).trim().toLowerCase()
!responseText.slice(0, 100).trim().toLowerCase().startsWith("<!doctype html") &&
!responseText.slice(0, 100).trim().toLowerCase().startsWith("<html") &&
!responseText.slice(0, 100).trim().toLowerCase().startsWith("<!--") &&
!responseText.slice(0, 100).trim().toLowerCase().startsWith("<?xml version=\"1.0\" encoding=\"utf-8\"?><!doctype html"),
reason: (input, options, response, responseText) => "Does not look like a valid HTML file: first characters are '" + responseText.slice(0, 60).trim().toLowerCase() + "'"
},
{
phase: 'post',
Expand Down

0 comments on commit d997f86

Please sign in to comment.