Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epubcheck 4 doesn't compare identifier in OPF and NCX anymore #669

Closed
tofi86 opened this issue Jan 12, 2016 · 9 comments · Fixed by #798
Closed

epubcheck 4 doesn't compare identifier in OPF and NCX anymore #669

tofi86 opened this issue Jan 12, 2016 · 9 comments · Fixed by #798
Assignees
Labels
status: has PR The issue is being processed in a pull request type: bug The issue describes a bug
Milestone

Comments

@tofi86
Copy link
Collaborator

tofi86 commented Jan 12, 2016

When checking EPUB 2 eBooks, epubcheck 4[.0.1] doesn't compare the identifier value in OPF and NCX anymore.

This is the result in epubcheck 3.0.1:

> java -jar epubcheck-3.0.1.jar "C:\Users\Tobias Fischer\Desktop\epub2.epub"
Epubcheck Version 3.0.1

Validating against EPUB version 2.0
WARNING: C:/Users/Tobias Fischer/Desktop/epub2.epub/EPUB/lorem.ncx: meta@dtb:uid content 'urn:uuid:550e8400-e29b-41d4-a716-4466674412314--edit' should conform to unique-identifier in content.opf: 'urn:uuid:550e8400-e29b-41d4-a716-4466674412314'

Check finished with warnings or errors

And this is the result of epubcheck 4[.0.1]:

> java -Duser.language=en -jar epubcheck.jar "C:\Users\Tobias Fischer\Desktop\epu
b2.epub"
Validating using EPUB version 2.0.1 rules.
No errors or warnings detected.
epubcheck completed

I understand that comparing identifiers in EPUB 3 documents isn't required anymore, but I think the WARNING should still be issued when checking EPUB 2 files...

@tofi86
Copy link
Collaborator Author

tofi86 commented Jan 12, 2016

@tofi86 tofi86 added the type: bug The issue describes a bug label May 16, 2016
@vincent-gros
Copy link
Contributor

vincent-gros commented May 31, 2016

@tofi86, @rdeltour
The NCX file should not be checked the same way in EPUB 2 and EPUB 3?
Why comparing identifiers is not required anymore in EPUB 3 files?

@tofi86
Copy link
Collaborator Author

tofi86 commented Jan 5, 2017

Okay, this is still working! It IS currently reported, but only when you run EpubCheck with ReportLevel USAGE (commandline argument --usage):

USAGE(NCX-001): /Users/tofi/Downloads/epubcheck-669_epub2-invalid.epub/EPUB/lorem.ncx(-1,-1): NCX identifier ('urn:uuid:550e8400-e29b-41d4-a716-4466674412314--2') does not match OPF identifier ('urn:uuid:550e8400-e29b-41d4-a716-4466674412314').

This behaviour was changed by @rdeltour with cc349b0 to fix #329 and he set the severity to USAGE intentionally:

I had a fresh look. It doesn't seem the spec actually requires the NCX metadata to match the OPF's. The ID equality check was already removed (or not working) in the 4.0 branch.

In any case, I re-implemented the ID equality check, now also detecting trailing and leading whitespace, and report them as USAGE-level messages. …

Source: #329 (comment)

However, I think @rdeltour is wrong here... Since the NCX structure is defined through the DTBook standard I had a look there and read in section 8.4.1:

dtb:uid

Content: The globally unique identifier for the DTB. The value is the same as that of the dc:Identifier element referenced by the unique-identifier attribute on the package file's package element. See section 3.1, "Package Identity."

Source: http://www.niso.org/workrooms/daisy/Z39-86-2005.html#NavMeta

Therefore I would suggest the following:

  • raise the severity to WARNING or ERROR
  • also check this in EPUB 3 books when an NCX file is present

Romain, what do you think?

@tofi86 tofi86 self-assigned this Jan 5, 2017
@rdeltour rdeltour self-assigned this Feb 11, 2017
@tofi86 tofi86 removed their assignment Feb 11, 2017
@tofi86 tofi86 added the status: waiting for feedback The development team needs feedback from the issue’s creator label Sep 26, 2017
@tofi86
Copy link
Collaborator Author

tofi86 commented Sep 26, 2017

@rdeltour Romain, can you please give a short feedback on this issue? I already reviewed the code and wrote down two suggestions. Based on your feedback someone else can take over the issue and work on it. Thanks! 👍

@rdeltour
Copy link
Member

NCX structure is defined through the DTBook standard I had a look there and read in section 8.4.1

This spec you quote is for DAISY’s Digital Talking Book, which do borrow OPF from IDPF (or rather, the Open eBook Forum as it was known back in the time) and define NCX and DTBook grammars. However, this spec itself, and its statements on OPF, can't be considered normative for EPUB which just imports the DTD grammars.

In other words, the only normative statements are those defined in the EPUB specs, plus any requirements stemming from the DTDs.

As far as I understand, EPUB 2.0.1 doesn't strictly require the NCX’s identifier to be the same as the OPF’s, so we're on the a blurry line between requirements and interpretation.

I think an ERROR might be too strong however, I would suggest a WARNING when checking EPUB 2.0.1.
As for checking EPUB 3.x, since NCX is superseded I still think it's OK to not warn but simply raise a USAGE message. It's not strictly a SHOULD in the spec, so it shouldn't be a WARNING in EpubCheck. That said, if this ID mismatch create issues with legacy reading systems, I wouldn't object raising it as a WARNING (after all, if you include an NCX, you should be ready to pay the price with extra warnings 😈 ).

@mattgarrish @TzviyaSiegman thoughts?

@mattgarrish
Copy link
Member

mattgarrish commented Sep 27, 2017

It's a requirement for the NCX dtb:uid to match the package identifier per the second last bullet of 2.4.2 in OPF: (emphasis mine)

The list of required metadata provided in http://www.niso.org/workrooms/daisy/Z39-86-2005.html#NavMeta does not apply to EPUB; the only required meta is that which contains a content reference to the OPF unique ID. For backwards compatibility reasons, the value of the name of that meta remains dtb:id.

(Note that dtb:id is a longstanding error in the spec.)

This would be true for both EPUB 2 and 3, since 3 inherits the NCX requirement from 2. If you're including an NCX.

@rdeltour
Copy link
Member

Oh, @mattgarrish is right of course. Then it's definitely an ERROR!

@tofi86
Copy link
Collaborator Author

tofi86 commented Sep 27, 2017

Okay, thanks tp both of you for clarifying this!

I'm raising the severity level to ERROR for both EPUB 2 and EPUB 3 validation.

The current behaviour was introduced by @rdeltour in cc349b0 and looking at the commit, Romain also introduced USAGE message NCX_004=NCX identifier ('dtb:uid' metadata) should not contain leading or trailing whitespace.

Since the identifier now has to match the book id, leading or trailing whitespace may be an error anyway, shall we also raise severity level of NCX004 to ERROR or at least WARNING?

@tofi86 tofi86 added status: has PR The issue is being processed in a pull request and removed status: waiting for feedback The development team needs feedback from the issue’s creator labels Sep 27, 2017
@tofi86 tofi86 self-assigned this Sep 27, 2017
@tofi86
Copy link
Collaborator Author

tofi86 commented Sep 27, 2017

Alright, I got this working. Although most of the work was fixing thos goddamn JSON tests 😡

Please have a look at PR #798.

And give me feedback whether we should also raise the severity level of NCX_004. See comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: has PR The issue is being processed in a pull request type: bug The issue describes a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants