-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Use masterminds/html5 for HTMLValue #10647
ENH Use masterminds/html5 for HTMLValue #10647
Conversation
df04994
to
a42e31e
Compare
a42e31e
to
444ec50
Compare
36b687f
to
dc42a5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Just a few minor tweaks are needed.
Do we want to include test with a DataObject with a pre-existing HTMLText field with some HTML4 content to see what the migration will looked like?
src/View/Parsers/HTMLValue.php
Outdated
|
||
// remove stray closing tags which are not required | ||
foreach (HTML::config()->get('void_elements') as $tag) { | ||
$res = str_replace("</$tag>", '', $res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the HTML5 parser to take care of this. When I comment it out locally, all the unit test still pass just fine.
If it is needed, then we should include a test case to cover it. Also, we might have a case sensitivity problem. So we probably should use str_ireplace
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
'<html><html><body><falsetag "attribute=""attribute""">' | ||
=> '<falsetag></falsetag>', | ||
'<body<body<body>/bodu>/body>' | ||
=> '/bodu>/body>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two test case still look relevant and should be copied over to HTMLValueTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
$invalid = [ | ||
'<p><div><a href="test-link"></p></div>', | ||
'<html><div><a href="test-link"></a></a></html_>', | ||
'""\'\'\'"""\'""<<<>/</<htmlbody><a href="test-link"<<>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is not in HTMLValueTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parser cannot parse it, I think it's fine to leave this last one out and not expect the parser to parse because it is wildly invalid HTML
); | ||
} | ||
|
||
public function testAttributeEscaping() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is not in HTMLValueTest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
dc42a5e
to
d7ddb00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Issue #10528
I simply used masterminds/html5 directly rather than symfony/dom-crawler (which itself uses masterminds/html5) because it seemed to work out better out of the box with our existing class and unit-tests. We don't need any of the additional API in symfony/dom-crawler in order to maintain our current functionality.
There is minimal behaviour difference between this parser and silverstripe/html5 so be no issues when upgrading. There were only small differences when parsing invalid HTML, usually it seems to remove the invalid HTML rather than trying to create an empty tag.
I feel pretty confident that we can simply upgrade the parser here and not worry about any build task to do content migration. I can't really forsee any legitimate content the first time someone saves a page in CMS 5 after upgrading from CMS 4