Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Use masterminds/html5 for HTMLValue #10647

Merged

Conversation

emteknetnz
Copy link
Member

@emteknetnz emteknetnz commented Jan 16, 2023

Issue #10528

I simply used masterminds/html5 directly rather than symfony/dom-crawler (which itself uses masterminds/html5) because it seemed to work out better out of the box with our existing class and unit-tests. We don't need any of the additional API in symfony/dom-crawler in order to maintain our current functionality.

There is minimal behaviour difference between this parser and silverstripe/html5 so be no issues when upgrading. There were only small differences when parsing invalid HTML, usually it seems to remove the invalid HTML rather than trying to create an empty tag.

I feel pretty confident that we can simply upgrade the parser here and not worry about any build task to do content migration. I can't really forsee any legitimate content the first time someone saves a page in CMS 5 after upgrading from CMS 4

@emteknetnz emteknetnz changed the title DEP Use masterminds/html5 for HTMLValue ENH Use masterminds/html5 for HTMLValue Jan 16, 2023
@emteknetnz emteknetnz force-pushed the pulls/5/dom-crawler branch 2 times, most recently from 36b687f to dc42a5e Compare January 16, 2023 04:08
Copy link
Contributor

@maxime-rainville maxime-rainville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just a few minor tweaks are needed.

Do we want to include test with a DataObject with a pre-existing HTMLText field with some HTML4 content to see what the migration will looked like?

_config/html.yml Outdated Show resolved Hide resolved
_config/html.yml Show resolved Hide resolved
src/View/Parsers/HTML4Value.php Show resolved Hide resolved

// remove stray closing tags which are not required
foreach (HTML::config()->get('void_elements') as $tag) {
$res = str_replace("</$tag>", '', $res);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the HTML5 parser to take care of this. When I comment it out locally, all the unit test still pass just fine.

If it is needed, then we should include a test case to cover it. Also, we might have a case sensitivity problem. So we probably should use str_ireplace instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines -21 to -24
'<html><html><body><falsetag "attribute=""attribute""">'
=> '<falsetag></falsetag>',
'<body<body<body>/bodu>/body>'
=> '/bodu&gt;/body&gt;'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two test case still look relevant and should be copied over to HTMLValueTest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

$invalid = [
'<p><div><a href="test-link"></p></div>',
'<html><div><a href="test-link"></a></a></html_>',
'""\'\'\'"""\'""<<<>/</<htmlbody><a href="test-link"<<>'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is not in HTMLValueTest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parser cannot parse it, I think it's fine to leave this last one out and not expect the parser to parse because it is wildly invalid HTML

);
}

public function testAttributeEscaping()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is not in HTMLValueTest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Contributor

@maxime-rainville maxime-rainville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maxime-rainville maxime-rainville merged commit a65d470 into silverstripe:5 Jan 17, 2023
@maxime-rainville maxime-rainville deleted the pulls/5/dom-crawler branch January 17, 2023 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants