Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix block validation: wrap in exception handler for malformed HTML #8304

Merged
merged 7 commits into from
Aug 21, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions packages/blocks/src/api/test/validation.js
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,35 @@ describe( 'validation', () => {

expect( isEquivalent ).toBe( true );
} );

it( 'should return true when comparing empty strings', () => {
const isEquivalent = isEquivalentHTML(
'',
'',
);

expect( isEquivalent ).toBe( true );
} );

it( 'should return false if supplied malformed HTML', () => {
const isEquivalent = isEquivalentHTML(
'<blockquote class="wp-block-quote">fsdfsdfsd<p>fdsfsdfsdd</pfd fd fd></blockquote>',
'<blockquote class="wp-block-quote">fsdfsdfsd<p>fdsfsdfsdd</p></blockquote>',
);

expect( console ).toHaveWarned();
expect( isEquivalent ).toBe( false );
} );

it( 'should return false if supplied two sets of malformed HTML', () => {
const isEquivalent = isEquivalentHTML(
'<div>fsdfsdfsd<p>fdsfsdfsdd</pfd fd fd></div>',
'<blockquote>fsdfsdfsd<p>fdsfsdfsdd</p a></blockquote>',
);

expect( console ).toHaveWarned();
expect( isEquivalent ).toBe( false );
} );
} );

describe( 'isValidBlock()', () => {
Expand Down
30 changes: 27 additions & 3 deletions packages/blocks/src/api/validation.js
Original file line number Diff line number Diff line change
Expand Up @@ -380,8 +380,27 @@ export function getNextNonWhitespaceToken( tokens ) {
}

/**
* Returns true if there is given HTML strings are effectively equivalent, or
* false otherwise.
* Tokenize an HTML string, gracefully handling any errors thrown during
* underlying tokenization.
*
* @param {string} html HTML string to tokenize.
*
* @return {Object[]|boolean} Array of valid tokenized HTML elements, or false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should describe in what cases the consumer would expect the function to return false. Currently it's not clear that this is tied to capturing errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: Why was false chosen, vs. another more semantically meaningful value for empty like null or undefined ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was false chosen, vs. another more semantically meaningful value for empty like null or undefined

Force of habit from PHPland. Swapped to null

*/
function getHTMLTokens( html ) {
try {
return tokenize( html );
} catch ( e ) {
log.warning( 'Malformed HTML detected: %s', html );
}

return false;
}

/**
* Returns true if the given HTML strings are effectively equivalent, or
* false otherwise. Invalid HTML is not considered equivalent, even if the
* strings directly match.
*
* @param {string} actual Actual HTML string.
* @param {string} expected Expected HTML string.
Expand All @@ -390,7 +409,12 @@ export function getNextNonWhitespaceToken( tokens ) {
*/
export function isEquivalentHTML( actual, expected ) {
// Tokenize input content and reserialized save content
const [ actualTokens, expectedTokens ] = [ actual, expected ].map( tokenize );
const [ actualTokens, expectedTokens ] = [ actual, expected ].map( getHTMLTokens );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to think this change could be simplified to something like:

try {
	const [
		actualTokens,
		expectedTokens,
	] = [ actual, expected ].map( tokenize );
} catch ( error ) {
	return false;
}

Logging the warning about the specific string becomes a bit trickier. We could still have getHTMLTokens which catches the specific failure, logs, then throws up to the catch here.

Or we could keep as-is. My only thought was avoiding the overloaded return value, where false is a bit of a semantically ambiguous value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the const's are scoped to the try block so I don't think is possible unless they were converted to let and defined outside, and it seemed better to keep the constness.

I've cleaned it up a bit already, and happy to make a further change, but otherwise will go as-is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I guess the corrected form would look something closer to:

let actualTokens, expectedTokens;
try {
	( [
		actualTokens,
		expectedTokens,
	] = [ actual, expected ].map( tokenize ) );
} catch ( error ) {
	return false;
}

Which starts to be a bit harder to read 😬 Good as it is.


// If either is malformed then stop comparing - the strings are not equivalent
if ( actualTokens === false || expectedTokens === false ) {
return false;
}

let actualToken, expectedToken;
while ( ( actualToken = getNextNonWhitespaceToken( actualTokens ) ) ) {
Expand Down