-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix block validation: wrap in exception handler for malformed HTML #8304
Conversation
@aduth, I think you worked on this bit of code so dragging you in for thoughts again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find. In 883687d I pushed a failing test case. When actual
and expected
are both malformed, the function will wrongly return true
since there's nothing to iterate from the empty return array of getHTMLTokens
.
If you paste some malformed HTML into a block the HTML tokenizer can break. Wrapping the tokenizer in an exception handler means we can control the error
Sentences ending in periods. Empty newline between description and parameters. Precise return type.
Add a check if both strings are invalid HTML
Empty strings are considered equivalent
883687d
to
192160e
Compare
Good catch. Changed the logic so it will fail if one or both strings are invalid. |
* | ||
* @param {string} html HTML string to tokenize. | ||
* | ||
* @return {Object[]|boolean} Array of valid tokenized HTML elements, or false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should describe in what cases the consumer would expect the function to return false
. Currently it's not clear that this is tied to capturing errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: Why was false
chosen, vs. another more semantically meaningful value for empty like null
or undefined
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was false chosen, vs. another more semantically meaningful value for empty like null or undefined
Force of habit from PHPland. Swapped to null
@@ -390,7 +409,12 @@ export function getNextNonWhitespaceToken( tokens ) { | |||
*/ | |||
export function isEquivalentHTML( actual, expected ) { | |||
// Tokenize input content and reserialized save content | |||
const [ actualTokens, expectedTokens ] = [ actual, expected ].map( tokenize ); | |||
const [ actualTokens, expectedTokens ] = [ actual, expected ].map( getHTMLTokens ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to think this change could be simplified to something like:
try {
const [
actualTokens,
expectedTokens,
] = [ actual, expected ].map( tokenize );
} catch ( error ) {
return false;
}
Logging the warning about the specific string becomes a bit trickier. We could still have getHTMLTokens
which catches the specific failure, logs, then throw
s up to the catch
here.
Or we could keep as-is. My only thought was avoiding the overloaded return value, where false
is a bit of a semantically ambiguous value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK the const's are scoped to the try block so I don't think is possible unless they were converted to let
and defined outside, and it seemed better to keep the constness.
I've cleaned it up a bit already, and happy to make a further change, but otherwise will go as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I guess the corrected form would look something closer to:
let actualTokens, expectedTokens;
try {
( [
actualTokens,
expectedTokens,
] = [ actual, expected ].map( tokenize ) );
} catch ( error ) {
return false;
}
Which starts to be a bit harder to read 😬 Good as it is.
A little less ambiguous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think we should wait until after the 3.6 release to merge this one. Also could use an update to the return type.
* | ||
* @param {string} html HTML string to tokenize. | ||
* | ||
* @return {Object[]|boolean} Array of valid tokenized HTML elements, or null on error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't return a boolean
type anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh.
👍 on 3.6
In some final testing, I happened to stumble upon an interesting issue.
I'd have expected that since I already chose to "Keep as HTML", I would no longer be presented with this prompt. I assume what's happening is that the malformed markup is triggering the invalidation on the HTML block. I'm not entirely sure how we accommodate this:
Thoughts? I'm also open to merging this one as-is, since the current state is an improvement over the previous error. |
Ah yes, interesting! I tried it with the previous code and the editor just white-screens. I'll merge this as an improvement over the current situation, and file a ticket for the invalid-on-load separately. A different error message sounds a good idea regardless. I also like the idea of not validating certain blocks - if the user wants the invalid HTML then that's fine. I've been looking at tweaking the validation for other blocks, so this would be a good use case. |
If you paste malformed HTML into a block the HTML tokenizer can break.
The underlying bug is in the
simple-html-tokenizer
package. Ideally it will be fixed there, but in the meantime (and also to protect against other unknown errors) this PR wraps the tokenizer in an exception handler.The fixed behaviour is that the block will show an invalid block warning message, allowing it to be cleaned up in a controlled way.
As far as I can tell this is the only place in Gutenberg where
simple-html-tokenizer
is used and so I haven't tried to generally wrap the library.How has this been tested?
An additional unit test has been added which tests the breakage. You can manually verify the problem by:
<blockquote class="wp-block-quote">fsdfsdfsd<p>fdsfsdfsdd</pfd fd fd></blockquote>
Types of changes
Non breaking bug fix
Checklist: