Building Block View

Whitebox HtmlSanityChecker

whitbox hsc level 1
Rationale

We used functional decomposition to separate responsibilities:

  • CheckerCore shall encapsulate checking logic and Html parsing/processing.

  • all kinds of outputs (console, html-file, graphical) shall be handled in a separate component (Reporter)

  • Implementation of Gradle specific stuff shall be encapsulated.

Contained Blackboxes
Table 1. HtmlSanityChecker building blocks

HSC Core

hsc core: html parsing and sanity checking, configuration, reporting.

HSC Gradle Plugin

integrates the Gradle build tool with HSC, enabling arbitrary gradle builds to use HSC functionality.

HSC Maven Plugin

 (planned, not yet implemented)

HSC Graphical Interface

(planned, not implemented)

Interfaces
Table 2. HtmlSanityChecker internal interfaces
Interface Description

usage via shell

an (arc42) user uses a command line shell to call HSC

Buildsystem

Currently restricted to Gradle: The build system uses HSC as configured in the buildscript.

Local filesystem

HSC needs access to several local files, especially the html page to be checked and to the corresponding image directories.

External websites

to check external links, HSC needs to access external sites via http HEAD or GET requests.

HSC Core (Blackbox)

Intent/Responsibility

HSC Core contains the core functions to perform the various sanity checks. It parses the html file into a DOM-like in-memory representation, which is then used to perform the actual checks.

Interfaces
Table 3. HSC Core Interfaces
Interface (From-To) Description

Command Line Interface → Checker

 Uses the #AllChecksRunner class.

Gradle Plugin → Checker

Exposes HSC via a standard Gradle plugin, as described in the Gradle user guide.

Files
  • org.aim42.htmlsanitycheck.AllChecksRunner

  • org.aim42.htmlsanitycheck.HtmlSanityCheckGradlePlugin

Building Blocks - Level 2

HSC Core (Whitebox)

Whitebox
Figure 1. HSC Core (Whitebox)
Rationale

This structures follows a strictly functional decomposition:

  • parsing and handling html input

  • checking

  • collecting checking results

Contained Blackboxes
Table 4. HSC Core building blocks

Checker

Abstract class, used in form of the template-pattern. Shall be subclassed for all checking algorithms.

AllChecksRunner

Facade to the different Checker instances. Provides a (parameter-driven) command-line interface.

ResultsCollector (Whitebox)

Collects all checking results. Its interface Results is contained in the whitebox description

Reporter

Reports checking results to either console or an html file.

HtmlParser

Encapsulates html parsing, provides methods to search within the (parsed) html.

Suggester

In case of checking issues, suggests alternatives by comparing the faulty element to the one present in the html file. Currently not implemented

Checker and xyzChecker Subclasses

The abstract Checker provides a uniform interface (public void check()) to different checking algorithms. It is based upon the extensible concept for checking algorithms.

Building Blocks - Level 3

ResultsCollector (Whitebox)

Whitebox
Figure 2. Results Collector (Whitebox)
Rationale

This structures follows the hierarchy of checks - namely managing results for:

  1. a number of pages/documents, containing:

  2. a single page, each containing many

  3. single checks within a page

Contained Blackboxes
Table 5. ResultsCollector building blocks

Per-Run Results

results for potentially many Html pages/documents.

Single-Page-Results

results for a single page

Single-Check-Results

results for a single type of check (e.g. missing-images check)

Finding

a single finding, (e.g., "image 'logo.png' missing"). Can hold suggestions and (planned for future releases) the responsible html element.

Interface Results

The Result interface is used by all clients (especially Reporter subclasses, graphical and command-line clients) to access checking results. It consists of three distinct APIs for overall RunResults, single-page results (PageResults) and single-check results (CheckResults). See the interface definitions below - taken from the Groovy- source code:

Interface RunResults
public interface RunResults {

    // returns results for all pages which have been checked
    List<SinglePageResults> getResultsForAllPages();

    // how many pages were checked in this run?
    int nrOfPagesChecked();

    // how many checks were performed in all?
    int nrOfChecksPerformedOnAllPages();

    // how many findings (errors and issues) were found in all?
    int nrOfFindingsOnAllPages();

    // how long took checking (in milliseconds)?
    Long checkingTookHowManyMillis();
}
Interface PageResults
public interface PageResults {

    // what's the title of this page?
    String getPageTitle();

    // what's the filename and path?
    String getPageFileName();

    String getPageFilePath();

    // how many items have been checked?
    int nrOfItemsCheckedOnPage();

    // how many problems were found on this page?
    int nrOfFindingsOnPage();

    // how many different checks have run on this page?
    int howManyCheckersHaveRun();
}
Interface CheckResults
public interface CheckResults {

    // return a description of what is checked
    // (e.g. "Missing Images Checker" or "Broken Cross-References Checker"
    String description();

    // returns all findings/problems found during this check
    List<Finding> getFindings();
}