-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add valid_encoding checking before inserting into xml object. #318
Conversation
@@ -148,7 +148,7 @@ def initialize(first, second = nil, parent = nil) | |||
rescue => err | |||
if err.class == ::Encoding::CompatibilityError | |||
second_utf8 = second.to_s.force_encoding('UTF-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this just be changed to something like this:
second_utf8 = second.to_s.encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => "")
The string may not be UTF-8, so you don't want to force the encoding. You want to transcode the string to UTF-8, removing the characters that can't be transcoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the above replacement. It didn't work for our case. The final result still has unacceptable chars. The test string I used is the 1st partition type: \xAF=\xC6\x0F\x83\x84rG\x8Ey=i\xD8G}\xE4.
irb(main)> "\xAF=\xC6\x0F\x83\x84rG\x8Ey=i\xD8G}\xE4".encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => "")
=> "=\u000FrGy=iG}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes you say it contains unacceptable characters? If you transcoded the string as I suggested, the result has to be a valid UTF-8 string. Did vaild_encoding?
on the resultant string return false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The valid_encoding? return true. But somehow we still got error message: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string) in evm.log
# Validate attributes before inserting into xml | ||
def self.validate_attrs(h) | ||
return nil if h.nil? | ||
h.inject({}) { |options, (key, value)| options[key.to_s] = value.to_s.force_encoding('UTF-8').valid_encoding? ? value : "Invalid encoding found"; options } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason above.
@blomquisg What's the reasoning behind making this a blocker? |
@roliveri I made changes based on your comments. Please review. |
@@ -62,6 +62,30 @@ def self.findElementInt(paths, ele) | |||
end | |||
|
|||
class XmlHelpers | |||
# reference from REXML::TEXT | |||
VALID_CHAR = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do these characters represent? Are they UTF-8 characters?
Why do we need to do this? Are there valid UTF-8 characters that are not valid in attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you making a copy of something that already exists? Prefer VALID_CHAR = REXML::Text::VALID_CHAR
(Or even better just don't create an alias for it at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roliveri After encoding with replacement, they are UTF-8 characters. But some inside chars are still marked as invalid in REXML::Text::check method call, when try to add them as element's attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fryguy I'll check to see if any existing codes can help us to do the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a strong feeling that there exists code already in REXML or Nokogiri that already does this and we should not reinvent the wheel.
Since there are no tests, though, I have no idea what this is trying to solve (Additional description in the PR body aside from a BZ link would be helpful too) . Please add tests at least for the XmlHelpers changes.
Both REXML and Nokegiri will raise exception in this case. They don't expect such control characters in XML data. I'll add a spec test for this change. |
Checked commits hsong-rh/manageiq-gems-pending@e612f39~...2d76f13 with ruby 2.3.3, rubocop 0.47.1, haml-lint 0.20.0, and yamllint 1.10.0 lib/gems/pending/util/xml/xml_utils.rb
|
|
@simaishi Sorry, Yes. |
Add valid_encoding checking before inserting into xml object. (cherry picked from commit e04c65f) https://bugzilla.redhat.com/show_bug.cgi?id=1530726
Gaprindashvili backport details:
|
Some special control characters, here '\u000F' and '\u001F', are valid chars in UTF-8 encoding, but unacceptable when we try to convert them into XML. A check is added to remove them first before xml conversion.
https://bugzilla.redhat.com/show_bug.cgi?id=1518249