Skip to content

Commit

Permalink
WP_HTML_Tag_Processor: Make get_attribute reflect attribute set via…
Browse files Browse the repository at this point in the history
… `set_attribute`, even without updating (#46680)

* WP_HTML_Tag_Processor: Ensure that reads after writes reflect most-updated values. (#46680)

Currently there exists a data race in the tag processor when reading the attribute value
for an attribute that has previously been updated. This is due to the fact that when
reading attribute values we examine the input HTML document and overlook potential
enqueued updates. If we call `get_updated_html()` before reading then those updates are
all flushed out and our values will match what we expect, but if we skip this step then
we read stale information read during the first parse of the updated tag.

In this patch we're fixing that bug by inspecting the set of enqueued changes when
reading attribute values. If an update is already enqueued for a given attribute then
we read the value from that update instead of from the original HTML. This ensures that
the value we report is always the most-recent or most-updated value.

Co-Authored-By: Adam Zielinski <[email protected]>
Co-Authored-By: Dennis Snell <[email protected]>
  • Loading branch information
3 people authored Jan 25, 2023
1 parent ac4d1db commit f272fa6
Show file tree
Hide file tree
Showing 2 changed files with 599 additions and 32 deletions.
156 changes: 142 additions & 14 deletions lib/experimental/html/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -1059,7 +1059,7 @@ private function parse_next_attribute() {
return true;
}

/**
/*
* > There must never be two or more attributes on
* > the same start tag whose names are an ASCII
* > case-insensitive match for each other.
Expand Down Expand Up @@ -1116,24 +1116,33 @@ private function after_tag() {
* Converts class name updates into tag attributes updates
* (they are accumulated in different data formats for performance).
*
* This method is only meant to run right before the attribute updates are applied.
* The behavior in all other cases is undefined.
*
* @return void
* @since 6.2.0
*
* @see $classname_updates
* @see $lexical_updates
*/
private function class_name_updates_to_attributes_updates() {
if ( count( $this->classname_updates ) === 0 || isset( $this->lexical_updates['class'] ) ) {
$this->classname_updates = array();
if ( count( $this->classname_updates ) === 0 ) {
return;
}

$existing_class = isset( $this->attributes['class'] )
? substr( $this->html, $this->attributes['class']->value_starts_at, $this->attributes['class']->value_length )
: '';
$existing_class = $this->get_enqueued_attribute_value( 'class' );
if ( null === $existing_class || true === $existing_class ) {
$existing_class = '';
}

if ( false === $existing_class && isset( $this->attributes['class'] ) ) {
$existing_class = substr(
$this->html,
$this->attributes['class']->value_starts_at,
$this->attributes['class']->value_length
);
}

if ( false === $existing_class ) {
$existing_class = '';
}

/**
* Updated "class" attribute value.
Expand Down Expand Up @@ -1251,7 +1260,7 @@ private function apply_attributes_updates() {
return;
}

/**
/*
* Attribute updates can be enqueued in any order but as we
* progress through the document to replace them we have to
* make our replacements in the order in which they are found
Expand All @@ -1270,7 +1279,7 @@ private function apply_attributes_updates() {
}

foreach ( $this->bookmarks as $bookmark ) {
/**
/*
* As we loop through $this->lexical_updates, we keep comparing
* $bookmark->start and $bookmark->end to $diff->start. We can't
* change it and still expect the correct result, so let's accumulate
Expand Down Expand Up @@ -1370,6 +1379,69 @@ private static function sort_start_ascending( $a, $b ) {
return $a->end - $b->end;
}

/**
* Return the enqueued value for a given attribute, if one exists.
*
* Enqueued updates can take different data types:
* - If an update is enqueued and is boolean, the return will be `true`
* - If an update is otherwise enqueued, the return will be the string value of that update.
* - If an attribute is enqueued to be removed, the return will be `null` to indicate that.
* - If no updates are enqueued, the return will be `false` to differentiate from "removed."
*
* @since 6.2.0
*
* @param string $comparable_name The attribute name in its comparable form.
* @return string|boolean|null Value of enqueued update if present, otherwise false.
*/
private function get_enqueued_attribute_value( $comparable_name ) {
if ( ! isset( $this->lexical_updates[ $comparable_name ] ) ) {
return false;
}

$enqueued_text = $this->lexical_updates[ $comparable_name ]->text;

// Removed attributes erase the entire span.
if ( '' === $enqueued_text ) {
return null;
}

/*
* Boolean attribute updates are just the attribute name without a corresponding value.
*
* This value might differ from the given comparable name in that there could be leading
* or trailing whitespace, and that the casing follows the name given in `set_attribute`.
*
* Example:
* ```
* $p->set_attribute( 'data-TEST-id', 'update' );
* 'update' === $p->get_enqueued_attribute_value( 'data-test-id' );
* ```
*
* Here we detect this based on the absence of the `=`, which _must_ exist in any
* attribute containing a value, e.g. `<input type="text" enabled />`.
* ¹ ²
* 1. Attribute with a string value.
* 2. Boolean attribute whose value is `true`.
*/
$equals_at = strpos( $enqueued_text, '=' );
if ( false === $equals_at ) {
return true;
}

/*
* Finally, a normal update's value will appear after the `=` and
* be double-quoted, as performed incidentally by `set_attribute`.
*
* e.g. `type="text"`
* ¹² ³
* 1. Equals is here.
* 2. Double-quoting starts one after the equals sign.
* 3. Double-quoting ends at the last character in the update.
*/
$enqueued_value = substr( $enqueued_text, $equals_at + 2, -1 );
return html_entity_decode( $enqueued_value );
}

/**
* Returns the value of the parsed attribute in the currently-opened tag.
*
Expand Down Expand Up @@ -1397,12 +1469,43 @@ public function get_attribute( $name ) {
}

$comparable = strtolower( $name );

/*
* For every attribute other than `class` we can perform a quick check if there's an
* enqueued lexical update whose value we should prefer over what's in the input HTML.
*
* The `class` attribute is special though because we expose the helpers `add_class`
* and `remove_class` which form a builder for the `class` attribute, so we have to
* additionally check if there are any enqueued class changes. If there are, we need
* to first flush them out so can report the full string value of the attribute.
*/
if ( 'class' === $name ) {
$this->class_name_updates_to_attributes_updates();
}

// If we have an update for this attribute, return the updated value.
$enqueued_value = $this->get_enqueued_attribute_value( $comparable );
if ( false !== $enqueued_value ) {
return $enqueued_value;
}

if ( ! isset( $this->attributes[ $comparable ] ) ) {
return null;
}

$attribute = $this->attributes[ $comparable ];

/*
* This flag distinguishes an attribute with no value
* from an attribute with an empty string value. For
* unquoted attributes this could look very similar.
* It refers to whether an `=` follows the name.
*
* e.g. <div boolean-attribute empty-attribute=></div>
* ¹ ²
* 1. Attribute `boolean-attribute` is `true`.
* 2. Attribute `empty-attribute` is `""`.
*/
if ( true === $attribute->is_true ) {
return true;
}
Expand Down Expand Up @@ -1582,7 +1685,7 @@ public function set_attribute( $name, $value ) {
$updated_attribute = "{$name}=\"{$escaped_new_value}\"";
}

/**
/*
* > There must never be two or more attributes on
* > the same start tag whose names are an ASCII
* > case-insensitive match for each other.
Expand Down Expand Up @@ -1628,6 +1731,14 @@ public function set_attribute( $name, $value ) {
' ' . $updated_attribute
);
}

/*
* Any calls to update the `class` attribute directly should wipe out any
* enqueued class changes from `add_class` and `remove_class`.
*/
if ( 'class' === $comparable_name && ! empty( $this->classname_updates ) ) {
$this->classname_updates = array();
}
}

/**
Expand All @@ -1638,7 +1749,11 @@ public function set_attribute( $name, $value ) {
* @param string $name The attribute name to remove.
*/
public function remove_attribute( $name ) {
/**
if ( $this->is_closing_tag ) {
return false;
}

/*
* > There must never be two or more attributes on
* > the same start tag whose names are an ASCII
* > case-insensitive match for each other.
Expand All @@ -1647,7 +1762,20 @@ public function remove_attribute( $name ) {
* @see https://html.spec.whatwg.org/multipage/syntax.html#attributes-2:ascii-case-insensitive
*/
$name = strtolower( $name );
if ( $this->is_closing_tag || ! isset( $this->attributes[ $name ] ) ) {

/*
* Any calls to update the `class` attribute directly should wipe out any
* enqueued class changes from `add_class` and `remove_class`.
*/
if ( 'class' === $name && count( $this->classname_updates ) !== 0 ) {
$this->classname_updates = array();
}

// If we updated an attribute we didn't originally have, remove the enqueued update and move on.
if ( ! isset( $this->attributes[ $name ] ) ) {
if ( isset( $this->lexical_updates[ $name ] ) ) {
unset( $this->lexical_updates[ $name ] );
}
return false;
}

Expand Down
Loading

0 comments on commit f272fa6

Please sign in to comment.