Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add valid_encoding checking before inserting into xml object. #318

Merged
merged 3 commits into from
Dec 15, 2017

Conversation

hsong-rh
Copy link
Contributor

@hsong-rh hsong-rh commented Dec 1, 2017

Some special control characters, here '\u000F' and '\u001F', are valid chars in UTF-8 encoding, but unacceptable when we try to convert them into XML. A check is added to remove them first before xml conversion.

https://bugzilla.redhat.com/show_bug.cgi?id=1518249

@hsong-rh
Copy link
Contributor Author

hsong-rh commented Dec 1, 2017

@roliveri @jerryk55 Please review.

@@ -148,7 +148,7 @@ def initialize(first, second = nil, parent = nil)
rescue => err
if err.class == ::Encoding::CompatibilityError
second_utf8 = second.to_s.force_encoding('UTF-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be changed to something like this:

second_utf8 = second.to_s.encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => "")

The string may not be UTF-8, so you don't want to force the encoding. You want to transcode the string to UTF-8, removing the characters that can't be transcoded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the above replacement. It didn't work for our case. The final result still has unacceptable chars. The test string I used is the 1st partition type: \xAF=\xC6\x0F\x83\x84rG\x8Ey=i\xD8G}\xE4.

irb(main)> "\xAF=\xC6\x0F\x83\x84rG\x8Ey=i\xD8G}\xE4".encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => "")
=> "=\u000FrGy=iG}"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes you say it contains unacceptable characters? If you transcoded the string as I suggested, the result has to be a valid UTF-8 string. Did vaild_encoding? on the resultant string return false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The valid_encoding? return true. But somehow we still got error message: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string) in evm.log

# Validate attributes before inserting into xml
def self.validate_attrs(h)
return nil if h.nil?
h.inject({}) { |options, (key, value)| options[key.to_s] = value.to_s.force_encoding('UTF-8').valid_encoding? ? value : "Invalid encoding found"; options }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason above.

@blomquisg
Copy link
Member

@roliveri and @hsong-rh, this is for a G-release blocker bug. Try to resolve the discussion quickly so we can get this merged and move the BZ off the blocker list.

@roliveri
Copy link
Member

roliveri commented Dec 7, 2017

@blomquisg What's the reasoning behind making this a blocker?
As it only happens on a subset of VMs.

@hsong-rh
Copy link
Contributor Author

hsong-rh commented Dec 7, 2017

@roliveri I made changes based on your comments. Please review.

@@ -62,6 +62,30 @@ def self.findElementInt(paths, ele)
end

class XmlHelpers
# reference from REXML::TEXT
VALID_CHAR = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these characters represent? Are they UTF-8 characters?
Why do we need to do this? Are there valid UTF-8 characters that are not valid in attributes?

Copy link
Member

@Fryguy Fryguy Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you making a copy of something that already exists? Prefer VALID_CHAR = REXML::Text::VALID_CHAR (Or even better just don't create an alias for it at all).

Copy link
Contributor Author

@hsong-rh hsong-rh Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roliveri After encoding with replacement, they are UTF-8 characters. But some inside chars are still marked as invalid in REXML::Text::check method call, when try to add them as element's attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fryguy I'll check to see if any existing codes can help us to do the same thing.

Copy link
Member

@Fryguy Fryguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a strong feeling that there exists code already in REXML or Nokogiri that already does this and we should not reinvent the wheel.

Since there are no tests, though, I have no idea what this is trying to solve (Additional description in the PR body aside from a BZ link would be helpful too) . Please add tests at least for the XmlHelpers changes.

@hsong-rh
Copy link
Contributor Author

Both REXML and Nokegiri will raise exception in this case. They don't expect such control characters in XML data. I'll add a spec test for this change.

@miq-bot
Copy link
Member

miq-bot commented Dec 11, 2017

Checked commits hsong-rh/manageiq-gems-pending@e612f39~...2d76f13 with ruby 2.3.3, rubocop 0.47.1, haml-lint 0.20.0, and yamllint 1.10.0
3 files checked, 3 offenses detected

lib/gems/pending/util/xml/xml_utils.rb

@hsong-rh
Copy link
Contributor Author

@roliveri @Fryguy Please review.

@roliveri roliveri merged commit e04c65f into ManageIQ:master Dec 15, 2017
@roliveri roliveri added this to the Sprint 76 Ending Jan 1, 2018 milestone Dec 15, 2017
@simaishi
Copy link
Contributor

gaprindashvili/yes ?

@roliveri
Copy link
Member

@simaishi Sorry, Yes.

simaishi pushed a commit that referenced this pull request Jan 3, 2018
Add valid_encoding checking before inserting into xml object.
(cherry picked from commit e04c65f)

https://bugzilla.redhat.com/show_bug.cgi?id=1530726
@simaishi
Copy link
Contributor

simaishi commented Jan 3, 2018

Gaprindashvili backport details:

$ git log -1
commit d865ee63c000e355bb07ed2c4774be2ebe4cbd5b
Author: Richard Oliveri <[email protected]>
Date:   Fri Dec 15 10:21:24 2017 -0500

    Merge pull request #318 from hsong-rh/xml_encoding
    
    Add valid_encoding checking before inserting into xml object.
    (cherry picked from commit e04c65fce73dbeced56f66d724cdb79c69707573)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1530726

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants