Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saxes fails to correctly capture some DTDs #19

Closed
lddubeau opened this issue Jun 25, 2019 · 0 comments
Closed

saxes fails to correctly capture some DTDs #19

lddubeau opened this issue Jun 25, 2019 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@lddubeau
Copy link
Owner

lddubeau commented Jun 25, 2019

Versions affected

All versions up and including 3.1.9

Steps to reproduce

Save this file:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE root [
<!-- I'm a test. -->
]>
<root/>

Try to parse it with the null parser:

node ./examples/null-parser.js [path to file]

Expected results

No error.

Actual results

Error: undefined:6:0: document must contain a root element.
    at SaxesParser.fail (/home/ldd/src/git-repos/saxes/lib/saxes.js:492:18)
    at SaxesParser.end (/home/ldd/src/git-repos/saxes/lib/saxes.js:1692:12)
    at SaxesParser.write (/home/ldd/src/git-repos/saxes/lib/saxes.js:547:23)
    at SaxesParser.close (/home/ldd/src/git-repos/saxes/lib/saxes.js:557:17)
    at Object.<anonymous> (/home/ldd/src/git-repos/saxes/examples/null-parser.js:30:10)
    at Module._compile (internal/modules/cjs/loader.js:774:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:785:10)
    at Module.load (internal/modules/cjs/loader.js:641:32)
    at Function.Module._load (internal/modules/cjs/loader.js:556:12)
    at Function.Module.runMain (internal/modules/cjs/loader.js:837:10)
Error: undefined:6:0: unexpected end.
    at SaxesParser.fail (/home/ldd/src/git-repos/saxes/lib/saxes.js:492:18)
    at SaxesParser.end (/home/ldd/src/git-repos/saxes/lib/saxes.js:1701:12)
    at SaxesParser.write (/home/ldd/src/git-repos/saxes/lib/saxes.js:547:23)
    at SaxesParser.close (/home/ldd/src/git-repos/saxes/lib/saxes.js:557:17)
    at Object.<anonymous> (/home/ldd/src/git-repos/saxes/examples/null-parser.js:30:10)
    at Module._compile (internal/modules/cjs/loader.js:774:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:785:10)
    at Module.load (internal/modules/cjs/loader.js:641:32)
    at Function.Module._load (internal/modules/cjs/loader.js:556:12)
    at Function.Module.runMain (internal/modules/cjs/loader.js:837:10)
Parsing time: 8

Notes

The issue here is inherited from sax and carried over into saxes. I tried with the latest sax, it raises no errors because it does not check whether the document is actually well-formed XML. However, it does not generate events for the root element.

saxes currently aims to just capture the DTD, without well-formedness checks, but this is not easy to do:

  • You cannot just skip ahead to the string ]> because this string can legally appear in an entity declaration.

  • sax fixed that issue by keeping track of quotes in the DTD: upon encountering a quote, the state changes and sax looks for the end of the quote. Effectively this causes ]> appearing between quotes (e.g. like in an entity declaration) to be ignored.

  • The problem though is that sax ignores processing instructions and comments in the DTD. In the problematic file above, the single quote appearing in the comment makes the parser get into the quote state and this is a quote that never terminates. The parser needs to keep track of comments and processing instructions to avoid interpreting quotes that appear in them as quotes. (They are just unstructured comment or processing instruction content.)

@lddubeau lddubeau added the bug Something isn't working label Jun 25, 2019
@lddubeau lddubeau self-assigned this Jun 25, 2019
@lddubeau lddubeau changed the title saxes fails to correctly skip some DTDs saxes fails to correctly capture some DTDs Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant