Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DomException when simplifying nested elements caused by invalid attribute names #918

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dbowring
Copy link

@dbowring dbowring commented Oct 19, 2024

Copies the same fix for copying invalid property names (such as ") used in _setNodeTag to _simplifyNestedElements.

You can reproduce this error running the following code in node (with jsdom installed):

const { JSDOM } = require("jsdom");
const Readability = require("./Readability.js");

const sampleHTML = '<div "=""><div><div>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</div></div></div>';
const dom = new JSDOM(sampleHTML, { contentType: "text/html" });
const readability = new Readability(dom.window.document);
const parsed = readability.parse();
console.log(parsed.textContent);

Without the fix:

$ node demo.js  

./readability/node_modules/jsdom/lib/jsdom/living/helpers/validate-names.js:10
    throw DOMException.create(globalObject, [`"${name}" did not match the Name production`, "InvalidCharacterError"]);
    ^
[DOMException [InvalidCharacterError]: """ did not match the Name production]

Node.js v20.18.0

With the fix:

$ node demo.js
Lorem ipsum dolor sit amet, consectetur adipiscing elit.

dbowring added a commit to spaceduck-com/readability-cli that referenced this pull request Oct 20, 2024
dbowring added a commit to spaceduck-com/readability-cli that referenced this pull request Oct 20, 2024
@dbowring
Copy link
Author

dbowring commented Oct 21, 2024

Added the demo from the original comment as a test (70cedd6), but it also requires changes the way attributes are compared for the tests (to ignore invalid attributes, if they are there they are fine, but if we need to remove them that should also be fine), so you may wish to drop that commit if the test changes are not acceptable.

@dbowring dbowring force-pushed the fix-simplify-invalid-attributes branch from 402e5de to f48f5e3 Compare October 21, 2024 00:55
@dbowring dbowring force-pushed the fix-simplify-invalid-attributes branch from f48f5e3 to 70cedd6 Compare October 21, 2024 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant