-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namespace behavior during reparenting should match XSLT's copy-of
semantics
#1200
Comments
Hello Gioele, Have you tried cloning the object before you called Does this fix it for you? https://gist.github.com/kbrock/468f9f5e987e3b60155b/revisions --Keenan |
Hi Keenan, thanks for the tip. Yes, cloning the node before adding it fixed the problem. Was the problem my fault ("nodes are supposed to be cloned before added to other documents", but I cannot find any reference to this in the Nokogiri documentation) or a bug in Nokogiri? What about the depth of the cloning? When using the default depth is not enough? |
Hi @gioele, I think libxml2 has trouble with a single node being in 2 documents. Personally, I was adding a node with children. The namespace of the parent node said default, but the child nodes were all correct. This was fixed with Best of luck, |
@kbrock , thank you very much for your reply. It resolves the same issue that I had. But issue is much worse because even adding nodes from same document can be broken if namespace is not defined in a parent element where node is being added. Sounds confusing. I modified original gist to create a reproducer: It would be very nice if somebody can file an upstream wrt nokogiri:
|
I'd like very much if @kbrock or @akostadinov would write a failing test case for the issue being described. A script isn't much use if a) I don't know what output you actually saw, nor b) what output you expected. Please write a failing test to help me understand and fix the issue. |
@flavorjones I converted my gist to use a comparison. Does this work for you? #!/usr/bin/env ruby
require 'nokogiri'
src =<<EOD
<wrapper xmlns="ns" xmlns:extra="extra">
<record xml:id="r1">
<field>aaa</field>
<field extra:type="second">bbb</field>
</record>
</wrapper>
EOD
tgt =<<EOD
<?xml version="1.0"?>
<summary xmlns="summary">
<record xmlns="ns" xmlns:extra="extra" xml:id="r1">
<field>aaa</field>
<field extra:type="second">bbb</field>
</record>
</summary>
EOD
src_doc = Nokogiri::XML(src)
record = src_doc.at('//base:record', {'base' => "ns"})
dest_doc = Nokogiri::XML("<summary xmlns='summary'/>")
# FIX: dest_doc.root.add_child(record.clone)
dest_doc.root.add_child(record)
puts "oh no" if dest_doc.to_xml != tgt |
@kbrock Great! Thank you for providing executable code. Now we can have a conversation! Your test expects that the If I made similar node moves within the same document, what would your expectation be? Specifically, this code:
What do you expect the Here's what Nokogiri (libxml2) does as of 1.7.0.1 / 2.9.4:
That is, the reparented node (originally using its parent node's default namespace) inherits the parent node's default namespace.
This seems correct to me, do you agree? I think it would be surprising behavior to define a namespace where it was only referencing one in its initial position in the document. If you disagree, I'd like to understand what you think the appropriate behavior should be. If you agree, then I'd like to understand why this behavior should be different when the node is moved to a new, separate document. Also, please keep in mind that the XML spec is pretty much silent on the topic of moving nodes around -- especially so about moving nodes between documents -- so we're all really fumbling around in the dark. I would like the behavior we implement to be consistent, so please do let me know if you feel Nokogiri (or libxml2) is behaving inconsistently. |
@flavorjones , I (possibly we) expect when node is added, to not be at the same time removed from it's original place. This is one thing that I understand might be hard to change at this state of the project without breaking too much existing code. It can be documented and would be mostly fine in my opinion. But if we add node from another doc to this one (or move node to a different place in same doc), but namespace is not defined in the new place (or defined to something else), then obviously the node changed by the move operation. A different namespace of element is actually a different element. For this I certainly can't imagine a valid use case and can't imagine anybody being happy with or expecting it. Moving or adding, I believe everybody would expect the node at it's new home to have all necessary namespaces defines such that same xpath for example can find it. |
@akostadinov We'll have to agree to disagree. |
Project owners are the masters obviously. It would be interesting to know though what valid use cases you have for moving a node while namespaces are not retained or changed in an undefined way? |
redoing this: @flavorjones lets focus on namespaces, not on whether it was removed from the source. I expect the namespaces to stay the same. expect Would you have expected the |
@kbrock Again, revisiting the difference between referencing a namespace and defining a namespace, the I'm not sure at this point whether continuing this conversation is at all constructive. We adopted some conventions around how reparenting nodes should work years ago; we tried to make it as consistent as we could; we may have failed. And, as I said, the XML spec is silent on how reparenting nodes should work, particularly when moving between documents. I have reasons for why it works the way it does; I'm not sure it's constructive to go into those reasons, given we all seem pretty far apart on this issue. |
Maybe we could follow the XSLT lead, where node always carry all their namespace information available to them (definitions, references, defaults) regardless of where they are copied to with For reference: https://www.w3.org/TR/xslt20/#element-copy-of
|
I believe this is what we're doing, to the best of the ability of the
underlying parsers.
…On Feb 14, 2017 4:08 AM, "Gioele" ***@***.***> wrote:
And, as I said, the XML spec is silent on how reparenting nodes should
work, particularly when moving between documents.
Maybe we could follow the XSLT lead, where node always carry all their
namespace information available to them (definitions, references, defaults)
regardless of where they are copied to with <copy> and <copy-of/>. XSLT
processors take care of renaming xmlns prefixes so that the resulting
document fragment is 100% equivalent to the original (even though it may
have a different serialized form).
For reference: https://www.w3.org/TR/xslt20/#element-copy-of
The new element will also have namespace nodes copied from the original
element node, unless they are excluded by specifying copy-namespaces="no".
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1200 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAgDyPCHq_oDBYRpt6jIjyvKcQq0O3iks5rcW72gaJpZM4DAkBC>
.
|
Actually: I'd really like to understand what happens to namespaces if we
use xslt to move the node in the original example. That might be a good
precedent to look to. I'm definitely open to behavior changes here.
…On Feb 14, 2017 7:13 AM, "Mike Dalessio" ***@***.***> wrote:
I believe this is what we're doing, to the best of the ability of the
underlying parsers.
On Feb 14, 2017 4:08 AM, "Gioele" ***@***.***> wrote:
> And, as I said, the XML spec is silent on how reparenting nodes should
> work, particularly when moving between documents.
>
> Maybe we could follow the XSLT lead, where node always carry all their
> namespace information available to them (definitions, references, defaults)
> regardless of where they are copied to with <copy> and <copy-of/>. XSLT
> processors take care of renaming xmlns prefixes so that the resulting
> document fragment is 100% equivalent to the original (even though it may
> have a different serialized form).
>
> For reference: https://www.w3.org/TR/xslt20/#element-copy-of
>
> The new element will also have namespace nodes copied from the original
> element node, unless they are excluded by specifying copy-namespaces="no"
> .
>
> —
> You are receiving this because you modified the open/close state.
> Reply to this email directly, view it on GitHub
> <#1200 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAAgDyPCHq_oDBYRpt6jIjyvKcQq0O3iks5rcW72gaJpZM4DAkBC>
> .
>
|
This is what As I user, I'd expect the I think this issue should be changed only once a change has been done: either a change in the implementation of |
Here's what XLST does, and I'll preface this by saying that I had no idea this was the behavior ... #!/usr/bin/env ruby
require 'nokogiri'
xml = <<EOX
<root xmlns:root="http://common.parent.ns">
<orig-parent xmlns="orig-parent.default.ns"
xmlns:foo="http://orig-parent.foo.ns"
xmlns:bar="http://orig-parent.bar.ns"
xmlns:unused="http://orig-parent.unused.ns"
>
<root:child>
ns is common.parent.us
<bar:grandchild/>
</root:child>
<child>
ns is orig-parent.default.ns
<bar:grandchild/>
</child>
<foo:child>
ns is orig-parent.foo.ns
<bar:grandchild/>
</foo:child>
<bar:child>
ns is orig-parent.bar.ns
<bar:grandchild/>
</bar:child>
</orig-parent>
<final-parent-with-no-default-ns>
<replace-me/>
</final-parent-with-no-default-ns>
<final-parent-with-same-default-ns xmlns="orig-parent.default.ns">
<replace-me/>
</final-parent-with-same-default-ns>
<final-parent-with-different-default-ns xmlns="http://final.parent.default.ns">
<replace-me/>
</final-parent-with-different-default-ns>
</root>
EOX
xslt1 = <<-EOX
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:root="http://common.parent.ns"
xmlns:orig-parent="orig-parent.default.ns"
xmlns:foo="http://orig-parent.foo.ns"
xmlns:bar="http://orig-parent.bar.ns"
xmlns:unused="http://orig-parent.unused.ns"
xmlns:final-parent="http://final.parent.default.ns"
>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//final-parent-with-no-default-ns/replace-me">
<xsl:copy-of select="//*[local-name()='child']"/>
</xsl:template>
<xsl:template match="//orig-parent:final-parent-with-same-default-ns/orig-parent:replace-me">
<xsl:copy-of select="//*[local-name()='child']"/>
</xsl:template>
<xsl:template match="//final-parent:final-parent-with-different-default-ns/final-parent:replace-me">
<xsl:copy-of select="//*[local-name()='child']"/>
</xsl:template>
</xsl:stylesheet>
EOX
orig_doc = Nokogiri::XML xml
ss = Nokogiri::XSLT xslt1
final_doc = ss.transform orig_doc
puts final_doc.to_xml output: <?xml version="1.0"?>
<root xmlns:root="http://common.parent.ns">
<orig-parent xmlns="orig-parent.default.ns" xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
<root:child>
ns is common.parent.us
<bar:grandchild/></root:child>
<child>
ns is orig-parent.default.ns
<bar:grandchild/></child>
<foo:child>
ns is orig-parent.foo.ns
<bar:grandchild/></foo:child>
<bar:child>
ns is orig-parent.bar.ns
<bar:grandchild/></bar:child>
</orig-parent>
<final-parent-with-no-default-ns>
<root:child xmlns="orig-parent.default.ns" xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is common.parent.us
<bar:grandchild/></root:child>
<child xmlns="orig-parent.default.ns" xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is orig-parent.default.ns
<bar:grandchild/></child>
<foo:child xmlns="orig-parent.default.ns" xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is orig-parent.foo.ns
<bar:grandchild/></foo:child>
<bar:child xmlns="orig-parent.default.ns" xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is orig-parent.bar.ns
<bar:grandchild/></bar:child>
</final-parent-with-no-default-ns>
<final-parent-with-same-default-ns xmlns="orig-parent.default.ns">
<root:child xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is common.parent.us
<bar:grandchild/></root:child>
<child xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is orig-parent.default.ns
<bar:grandchild/></child>
<foo:child xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is orig-parent.foo.ns
<bar:grandchild/></foo:child>
<bar:child xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is orig-parent.bar.ns
<bar:grandchild/></bar:child>
</final-parent-with-same-default-ns>
<final-parent-with-different-default-ns xmlns="http://final.parent.default.ns">
<root:child xmlns="orig-parent.default.ns" xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is common.parent.us
<bar:grandchild/></root:child>
<child xmlns="orig-parent.default.ns" xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is orig-parent.default.ns
<bar:grandchild/></child>
<foo:child xmlns="orig-parent.default.ns" xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is orig-parent.foo.ns
<bar:grandchild/></foo:child>
<bar:child xmlns="orig-parent.default.ns" xmlns:foo="http://orig-parent.foo.ns" xmlns:bar="http://orig-parent.bar.ns" xmlns:unused="http://orig-parent.unused.ns">
ns is orig-parent.bar.ns
<bar:grandchild/></bar:child>
</final-parent-with-different-default-ns>
</root> I think this behavior makes sense, and I'm open to adjusting Nokogiri's DOM node replacement behavior to match (though I worry that proper namespace-handling may be expensive, we'll have to see). |
Thank you @flavorjones for the insightful and thorough example. |
add_child
copy-of
semantics
The code in lib/metadata/saml.rb raw_entity_descriptor creates new root element and rehomes all existing elements. In this case, Nokogiri however drops namespace prefixes from element attributes, changing <ContactPerson contactType="other" remd:contactType="http://refeds.org/metadata/contactType/security"> into <ContactPerson contactType="other" contactType="http://refeds.org/metadata/contactType/security"> Which renders the XML invalid - and causes a 500 Internal Server error in the MDQ interface: ``` Metadata::SchemaInvalidError (Metadata is not schema valid [#<Nokogiri::XML::SyntaxError: 120:0: ERROR: Element '{urn:oasis:names:tc:SAML:2.0:metadata}ContactPerson', attribute 'contactType': The attribute 'contactType' is not allowed.>]) ``` This affects all MDQ `specific_entity` and `specific_entity_sha1` requests - if the `EntityDescriptor` has any elements with attributes in a different namespace. (And a security contact is such a case). This is a known issue with Nokogiri: sparklemotion/nokogiri#1200 And the recommended workaround is to clone the elements being rehomed (tested, works). All tests pass.
Scheduling this for the v2.0 milestone since it seems like this would be a breaking change in some cases. |
Not all the namespaces referenced inside a XML node are carried over by
Node#add_child
.Reproducible testcase: https://gist.github.com/gioele/2c88ac73f4f28f79fbc6#file-add_child_ns-rb
Output:
Tested with Ruby 2.1.2 and Nokogiri 1.6.3.1
BTW, I discovered this while searching for ways to create a standalone document from a XML node.
The text was updated successfully, but these errors were encountered: