Skip to content

Standalone (batch- and command-line) html sanity checker - detects missing images, dead links, duplicate bookmarks.

License

Notifications You must be signed in to change notification settings

Per-Starke/htmlSanityCheck

 
 

Repository files navigation

Html-SC Html Sanity Check

This project provides some basic sanity checking on html files.

It can be helpful in case of html generated from e.g. Asciidoctor, Markdown or other formats - as converters usually don’t check for missing images or broken links.

It can be used as Gradle plugin. Standalone Java and graphical UI are planned for future releases.

:license ccsa4 green HtmlSanityCheck gradle plugin issues download

Installation

Use the following snippet inside a Gradle build file:

build.gradle
buildscript {
    repositories {
        jcenter()
    }

    dependencies {
        classpath 'org.aim42:HtmlSanityCheck-gradle-plugin:0.8.0-SNAPSHOT'
    }
}

apply plugin: 'org.aim42.HtmlSanityCheck-gradle-plugin'

Usage

The plugin adds a new task named htmlSanityCheck.

This task exposes a few properties as part of its configuration:

sourceDir

(mandatory) directory where the html files are located. Type: File. Default: build/docs.

sourceDocuments

(optional) an override to process several source files, which may be a subset of all files available in ${sourceDir}. Type: Set. Defaults to all files in ${sourceDir}.

checkingResultsDir

(optional) directory where the checking results written to. Defaults to ${sourceDir}/report/htmlchecks/

checkExternalLinks

(optional, planned) if set to "true", external references are checked too. Defaults to false (currently not implemented)

build.gradle (real-world example)
apply plugin: 'org.aim42.HtmlSanityCheck-gradle-plugin'

htmlSanityCheck {
    sourceDir = new File( "$buildDir/docs" )

    // files to check - in Set-notation
    sourceDocuments = [ "one-file.html", "another-file.html", "index.html"]

    // where to put results of sanityChecks...
    checkingResultsDir = new File( "$buildDir/report/htmlchecks" )
    checkExternalLinks = false
}

Types of Sanity Checks

Finds all '<a href="XYZ">' where XYZ is not defined.

src/broken.html
<a href="#missing>internal anchor</a>
...
<h2 id="missinG">Bookmark-Header</h2>

In this example, the bookmark is misspelled.

Missing Images Files

Images, referenced in '<img src="XYZ"…​' tags, refer to external files. The existence of these files is checked by the plugin.

Multiple Definitions of Bookmarks or ID’s

If any is defined more than once, any anchor linking to it will be confused :-)

Missing Local Resources

All files (e.g. downloads) referenced from html.

Missing Alt-tags in Images

Image-tags should contain an alt-attribute that the browser displays when the original image file cannot be found or cannot be rendered. Having alt-attributes is good and defensive style.

planned

Although external links might suffer from network issues or site maintenance, checks are still possible…​

Technical Documentation

In addition to checking HTML, this project serves as an example for arc42.

Fundamentals

This tiny piece rests on incredible groundwork:

  • Jsoup HTML parser and analysis toolkit - robust and easy-to-use.

  • IntelliJ IDEA - my (Gernot) best (programming) friend.

  • Of course, Groovy, Gradle, JUnit and Spockframework.

Ideas and Origin

  • The plugin heavily relies on code provided by the Gradle project.

  • Inspiration on code organization, implementation and testing of the plugin came from the Asciidoctor-Gradle-Plugin by [@AAlmiray].

  • Code for string similarity calculation by Ralph Rice.

  • Initial implementation, maintenance and documentation by Gernot Starke.

Development

Several sources provided help during development:

Contributing

Please report issues or suggestions.

Want to improve the plugin: Fork our repository and send a pull request.

Licence

Currently code is published under the Apache-2.0 licence, documentation under Creative-Commons-Sharealike-4.0.

Some day I’ll unify that :-)

About

Standalone (batch- and command-line) html sanity checker - detects missing images, dead links, duplicate bookmarks.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Groovy 78.2%
  • Java 20.7%
  • CSS 1.1%