-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DOMParser and XMLSerializer classes #1368
Comments
Sure, it would be great to have help implementing this! Let's do this one at a time: one PR for DOMParser, one for XMLSerializer. For DOMParser, the spec is at https://w3c.github.io/DOM-Parsing/#the-domparser-interface. You'd need to introduce new IDL (probably a new folder, domparsing, next to window/ and nodes/ and friends). Then write the impl class. You can peruse recent commits that introduced Location and History in the same fashion to see how that's done. The actual implementation does indeed look not too complicated. We'd ideally want to do things a bit more in depth than your version though. I think the method will end up looking similar to jsdom.jsdom, actually! in lib/jsdom.js. |
Thanks for the pointers! For XMLSerializer the specs are also there: https://w3c.github.io/DOM-Parsing/#the-xmlserializer-interface For the DOMParser, I am not sure I understand your last comment. With jsdom.jsdom, did you mean |
Yep! The behavior is actually quite similar: take a string, and depending on another passed argument, create a document parsed as XML or a document parsed as HTML. |
Alright, sounds good! BTW, I just realized that this is more according to specs: DOMParser.prototype.parseFromString = function(string, contenType) {
// Create a new document, since we're supposed to always return one.
var doc = document.implementation.createHTMLDocument(''),
body = doc.body,
last;
// Set the body's HTML, then change the DOM according the specs.
body.innerHTML = string;
// Remove all top-level children (<html><head/><body/></html>)
while (last = doc.lastChild)
doc.removeChild(last);
// Insert the first child of the body at the top.
doc.appendChild(body.firstChild);
return doc;
}; |
This may be helpful: https://developer.mozilla.org/en-US/docs/Web/API/DOMParser It's under the section "DOMParser HTML extension for other browsers". Tell me how I can help on this. I'm trying to write some test cases that involve DOMParser. |
@kahwee I tried to outline how to do a pull request for DOMParser in #1368 (comment). You might want to coordinate with @lehni to make sure you two don't duplicate work. |
@domenic I haven't started work on this. I find the harder piece is XMLSerializer which It is better to divide the task into 2 pull request. @lehni How's progress on this? Thank you both. |
So DOMParser is now implemented and will be in the next release. I've looked at a bunch of XML serializers on npm and nobody seems to have put together one that matches the spec. In the meantime https://github.com/cburgmer/xmlserializer seems closest, so we could use that I guess. I opened cburgmer/xmlserializer#8 to get more spec compliance. |
@domenic many thanks for implementing this! And apologies for dropping the ball on this one. Been swamped with work here... |
@domenic Does DOMParser in jsdom support parsing XML yet? |
It supports it as well as jsdom does, which is pretty OKish |
I attemped to parse a SOAP response with it and it seemed to choke a bit, giving back undefined as a few of the properties on the Document object. I assume this probably has to do with namespaces? |
Hmm, possibly. Could you give a small-ish testcase? |
@domenic I'm not sure how to write a full test case here, but this example highlights the var data =
`<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Header>
<ns2:ResponseHeader xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509" xmlns="https://adwords.google.com/api/adwords/cm/v201509">
<requestId>1234554321</requestId>
<serviceName>ManagedCustomerService</serviceName>
<methodName>get</methodName>
<operations>1</operations>
<responseTime>115</responseTime>
</ns2:ResponseHeader>
</soap:Header>
<soap:Body>
<ns2:getResponse xmlns="https://adwords.google.com/api/adwords/cm/v201509" xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509">
<ns2:rval>
<totalNumEntries>2</totalNumEntries>
<Page.Type>ManagedCustomerPage</Page.Type>
<ns2:entries>
<ns2:name>Test1</ns2:name>
<ns2:customerId>1234566789</ns2:customerId>
</ns2:entries>
<ns2:entries>
<ns2:name>Test2</ns2:name>
<ns2:customerId>987654321</ns2:customerId>
</ns2:entries>
</ns2:rval>
</ns2:getResponse>
</soap:Body>
</soap:Envelope>`;
var document = (new window.DOMParser()).parseFromString(data, "text/xml") |
And what document object properties are undefined when you do that? |
The two types which are have undefined keys are: It should be noted that the way I required the files looks like this: // is there a better way?
global.DOMParser = window.DOMParser = require('jsdom/lib/jsdom/living/domparsing/DOMParser-impl').implementation; |
Oh, yeah, that won't work. You need to use an actual jsdom, e.g. const DOMParser = jsdom.jsdom().defaultView.DOMParser; Anyway, I still don't understand what properties you're talking about, exactly. |
Weird. But I'd again really appreciate some code. Are you saying that |
Yes exactly! |
Great. So then it can't be true that |
Here's a test case showing that this is indeed happening: https://tonicdev.com/57b2082567f249120021c588/57b2082567f249120021c589/branches/master I'll open a separate issue. |
Oh, no, that's not what's happening: https://tonicdev.com/57b2082567f249120021c588/57b2082567f249120021c589/branches/master The issue is that both |
Yes, I think you are right. Here is a list of all the properties I have when looping through I think my original diagnoses was just wrong 😢 Here is my super simple test case: var data =
`<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Header>
<ns2:ResponseHeader xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509" xmlns="https://adwords.google.com/api/adwords/cm/v201509">
<requestId>1234554321</requestId>
<serviceName>ManagedCustomerService</serviceName>
<methodName>get</methodName>
<operations>1</operations>
<responseTime>115</responseTime>
</ns2:ResponseHeader>
</soap:Header>
<soap:Body>
<ns2:getResponse xmlns="https://adwords.google.com/api/adwords/cm/v201509" xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509">
<ns2:rval>
<totalNumEntries>2</totalNumEntries>
<Page.Type>ManagedCustomerPage</Page.Type>
<ns2:entries>
<ns2:name>Test1</ns2:name>
<ns2:customerId>1234566789</ns2:customerId>
</ns2:entries>
<ns2:entries>
<ns2:name>Test2</ns2:name>
<ns2:customerId>987654321</ns2:customerId>
</ns2:entries>
</ns2:rval>
</ns2:getResponse>
</soap:Body>
</soap:Envelope>`;
var doc = (new window.DOMParser()).parseFromString(data, "text/xml")
var entries = doc.querySelectorAll('rval > entries');
if (entries.length !== 2) {
console.log('Invalid number of entries, should be 2, got ' + entries.length);
} else {
console.log('Passed!');
} |
That test case "fails" in Chrome and Firefox for me: http://jsbin.com/timizagego/edit?html,console |
Yeah, Monday got the best of me. Should be 2 entries, not one. I updated the example: |
Ah great! I'll file a separate issue. |
@domenic Thanks for holding my hand through that one. You are an excellent OSS contributor! |
I am guessing that your tool is showing symbol keys as "undefined": const mySymbol = Symbol();
// try inspecting this object in your tool:
const obj = {};
obj[mySymbol] = 123; // your tool would probably display this as {undefined: 123}
// you can also play around with:
Object.getOwnPropertySymbols(obj); |
@DylanPiercey please file a separate issue with a test case. We already attempt to produce parsererror elements when possible, but I think there are some edge cases where it's not working. |
@domenic I was mostly just messing arround with stuff like <div>
<a></>
</div> Which didn't throw a parseerror. I can open a new issue, but if it's already implemented then thats good enough for me. |
@domenic is there any way to avoid the lower casing of tag names when using the workaround for XMLSerializer.serializeToString proposed by the original poster? |
I'm not aware of any, no. We need to use an actual XML serializer, not a HTML one like parse5, to get XML-correct casing. |
I managed to get mostly want using jsdom's DOMParser and https://github.com/cburgmer/xmlserializer One thorny one was that xmlserializer lower cases tags with the namespaceURI set to 'http://www.w3.org/1999/xhtml', which it seems that jsdom sets on all tags created via Document.createElement. Here's some rough code illustrating this:
Outputs:
Which is mostly want I want - case preserved - except for the xmlns="null" which I think is a recent bug in xmlserialiser |
It's missing: - <parsererror> elements when parsing XML strings fails - copying the active document's URL Also adds the empty-per-spec XMLDocument interface. Closes jsdom#1341. Part of jsdom#1368.
I tried to polyfill XMLParser as suggested:
Unfortunately My solution was to use
|
My current version looks like this: function XMLSerializer() {
};
XMLSerializer.prototype.serializeToString = function(node) {
if (!node) {
return '';
}
// Fix a jsdom issue where all SVG tagNames are lowercased:
// https://github.com/tmpvar/jsdom/issues/620
var text = node.outerHTML;
var tagNames = ['linearGradient', 'radialGradient', 'clipPath', 'textPath'];
for (var i = 0, l = tagNames.length; i < l; i++) {
var tagName = tagNames[i];
text = text.replace(
new RegExp('(<|</)' + tagName.toLowerCase() + '\\b', 'g'),
function(match, start) {
return start + tagName;
}
);
}
return text;
}; |
@lehni that doesn't return doctypes if you pass in document.documentElement as node, even though browsers do |
I'm not saying it's perfect, but it works for my use-case |
@domenic, @tommedema, @ianks, @lehni. There is a complete javascript xml-serializer that i created and is avaliable as an npm package. It follows the w3c spec and even added some improvements. its serialization is neat and accurate with 99% test coverage. |
Oh dear, I wish we'd known about that a month ago! @Sebmaster created our own package, w3c-xmlserializer, and #2282 has the code to integrate it all ready to go... I just need to find some time to work on jsdom -_-. |
Alright @domenic. sorry it came late. |
I think this is it! Closing due to the release of v13. |
All modern browsers expose these two useful classes with both a very simple API:
https://developer.mozilla.org/en/docs/Web/API/DOMParser
https://developer.mozilla.org/en/docs/XMLSerializer
Without knowing the details, I believe it shouldn't be too hard to implement these as thin shim to what jsdom already has available.
I am currently using quickly hacked together shims in paper.js that look like the code below.
I am happy to work on a better version that can find its way into jsdom, if you think this would be useful and could give me some pointers as to where to start looking.
The text was updated successfully, but these errors were encountered: