-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow invalid values to be set in XML::Node #5200
Conversation
src/xml/node.cr
Outdated
raise XML::Error.new("Cannot escape string containing NUL character", 0) | ||
end | ||
|
||
string.gsub(SUBSTITUTIONS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use HTML.escape
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's not HTML?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it does the same thing, there is no need to implement this again. That's why XML.escape
was removed in favour of HTML.escape
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, it's not exactly the same: HTML.escape
encodes '
as &x39;
instead of '
because the named entity is not defined for HTML4. But it's still valid XML and no big deal. We could even consider changing the behaviour for HTML.escape
because in other places (like HTML.unescape
) we support only HTML5 anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, I really don't want to use a module called HTML
in a module for XML
. It's simply wrong. They're different standards and people working on the HTML escape method won't think about XML when they're modifying it. Besides, it makes very little difference either way because both methods are unlikely to change for a long long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, XML.escape was used because you shouldn't need it, because we have facilities to build XML built in. We don't have a HTML::Builder so HTML.escape has a purpose.
I wouldn't be against adding XML.escape back again though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the code is put in anyway (as suggested in this PR), it might as well be exposed in the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that necessarily follows, personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, true. But if it's there - why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there's no point in adding a method as a public API when it's not needed. It means that if later we find some other way to implement this we might need to remove XML.escape and introduce a breaking change. Do you need XML.escape? Does someone else requested it? No. So please let's stick to this PRs goal, without adding more and more APIs. We can't keep adding API, we need stabilization for once.
@asterite I don't understand. |
@RX14 The snippet in the issue is: # example.cr
require "xml"
xml = XML.build(indent: 2) do |xml|
xml.element("example", number: "1") do
xml.text "foo\0bar"
end
xml.element("example", number: "2") do
xml.text "foo\u{08}bar"
end
end
puts xml Does it work well with this PR? |
Yeah, looks like I didn't read my own issue properly. This does solve a valid issue though, so i'll just update it to fix them all. |
@asterite we essentially just need to raise on NUL for the builder, I think. |
src/xml/node.cr
Outdated
|
||
name = name.to_s | ||
|
||
if name.includes?('\0') || LibXML.xmlValidateNameValue(name) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can call name.check_no_null_byte
before this if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather that we raised a single exception type for the case in which the name is invalid, instead of different ones based on how it's invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. But for \0
a message like "contains a null byte" might be more helpful.
42dd787
to
c2152fb
Compare
c2152fb
to
62a1929
Compare
This pr is finished and ready to be reviewed again btw |
src/xml/builder.cr
Outdated
@@ -275,8 +275,17 @@ struct XML::Builder | |||
raise XML::Error.new("Error in #{msg}", 0) if ret < 0 | |||
end | |||
|
|||
private def string_to_unsafe(string) | |||
string ? string.to_unsafe : Pointer(UInt8).null | |||
private def string_to_unsafe(string : String?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just def string_to_unsafe(string : Nil)
since you have the String overload below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to comment exactly the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
62a1929
to
412ca48
Compare
Fixes #4089.
Previously, invalid
content
andname
values could be set for XML nodes, including values that had no valid encoding in XML (such as\0
).