WorkPlan

EPUBCheck status and work items

You can find meeting minutes for your monthly meeting at this "Meeting Minutes" wiki page.

Join the EPUB 3 Community Group at no cost and/or subscribe to the additional EPUBCheck Mailing List ([email protected]) to get notified for meeting dates. Sign up by sending an email with subject "subscribe" to [email protected] or manage your subscription at this page.

4.1.0 maintenance release

Resolve the pending issues in the 4.1.0 milestone. This is regular bug fixing work and low hanging fruits, a work in progress from Tobias and Romain.
Resolve even more issues which are not (yet) assigned to the 4.1.0 milestone. Pick from the list of open issues labeled "bug" or "missing".

Day-to-day work / Issue triaging

We always need first responders to new issues who get in touch with the author and try to break down and reproduce the problem
In case it's a bug, we need a sample EPUB file containing solely the offending issue. The sample file should be based on our minimal EPUB test sample and can be uploaded to the issue as a ZIP file.
We have quite a backlog of old issues reaching back to EPUBCheck versions 2 and 3. Most of them are labeled "needs review" in the meantime and people could go and grab an issue and try to reproduce in latest EPUBCheck builds and come back with results.

Refactoring

Reorganize and clean up the test suite:

Status: the test suite has grown organically, it consists of both packaged and unpackaged EPUB content, with no naming convention, and barely organized.
Required work: major overhaul of the test suite. A new test suite should be built (based off the existing suite), with a proper organization and naming convention. Each test should only focus on one specific feature. Test content should be reduced to be as minimalist as possible.
Scope of testing:
- there are very few failure cases
- limited SVG testing
- little testing of interactivity or animation (JavaScript or CSS)
- No testing of WebGL
Benefit: Far better maintainability, lower risk of regression bugs.
Downside if not done: Often times, adding a new feature or changing a bug make several tests break, and "fixing" those ends up being very time consuming.
Dictionary of the messages needs some TLC as well
- Some are too complex
- Several different message files
- Different encodings, no less!
IP and Legal
- Many many contributors over the years
- Most undocumented
- Just forget it and start over?

API clean up:

Status: EPUBCheck's API is somewhat messy (e.g. duplicate entry points) and undocumented.
Required work: Revamp the API, notably to create an EPUBCheck instance, configure EPUBCheck, launch the validation. The reporting API would need to be cleaned up too. All public API should be properly documented.
Benefit: Make EPUBCheck more easy to use and be integrated in other projects. Be more welcoming to adopters. Provide a more stable public API.
Downside if not done: Not critical, but the currently API can be difficult to understand and evolve. The lack of documentation doesn't make the project very welcoming.

Code documentation

Status: it's currently difficult to navigate in EPUBCheck’s code base for newcomers, and it requires a lot of guesswork.
Required work: write an architectural overview, and add Javadoc at least at the beginning of each Java class to explain its role.

Removal / Integration of the Nook contributions

Status: Nook people contributed a large chunk of code, back in 2013. In order to reduce the risk of interferring with the main code base, the grouped most of their changes in a single package (com.adobe.epubcheck.ctc), and reimplemented another parsing/checking process. They also added a whole new set of testing mechanisms which is quite rigid and difficult to maintain. A significant number of bugs and maintenance issues that we had in the past years came from this integration.
Required work: remove Nook's package, discard all the tests that are useless to EPUBCheck and integrate the other to the core checking process. Remove all the testing that is based on output comparison (for maintainability reasons).
Benefit: Have a much more understandable code base, and greatly simplify the execution (EPUB is parsed once, not twice). Improve the maintainability.
Downside if not done: Messier code base (hence more difficult to understand for new developers). Risks of bugs.

General code improvement:

Status: The code base has grown organically for many years. Some parts are very old and would benefit from being refactored with more modern data structures or programming patterns.
Required work: Hard to define. This can be done gradually.
Benefit: Better code readability. Better performance.
Downside if not done: Not critical.

New features

HTML validation:

Status: EPUBCheck currently checks HTML with custom RelaxNG and Schematron schemata. We need to maintain and update these schemata when there are changes in HTML. A better option would be to integrate W3C's HTML validator (from Henri Sivonen's nuvalidator project). This is especially relevant with the non-dated specs now being referenced in EPUB 3.1.
Required work: significant change in the HTML checking logic.
Benefit: Moves the responsibility of HTML checking to a well-maintained 3d party project, which follows the changes in the HTML spec. Ultimately means higher quality and lower maintenance cost for EPUBCheck.
Downside if not done: Need to regularily update the schemata in EPUBCheck. Miss the checks that are availalble in the HTML Validator but not in EPUBCheck.

EPUB 3.1:

Status: EPUBCheck does not support EPUB 3.1
Required work: Implement the changes in EPUB 3.1 (summarized in http://www.idpf.org/epub/31/spec/epub-changes.html). Simple schema changes are pretty straighforward. Some other changes may need more advanced change in the checking logic.

Better checking of Media overlays:

Status: EPUBCheck only checks for schema compliance, but does not check other requirements like duration coherence, or reading order.
Required work: Add custom Java logic to implement the missing parts, in addition to the existing schema validation.

WebPub Manifest

If the WebPub manifest format gains traction, it may need to be supported

Provide feedback

Saved searches

Use saved searches to filter your results more quickly