Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim footnote anchors from excerpts #52518

Merged
merged 2 commits into from
Jul 12, 2023
Merged

Conversation

mcsf
Copy link
Contributor

@mcsf mcsf commented Jul 11, 2023

What?

Fixes #52035

Strips footnote anchors from post content whenever excerpts are being computed by WordPress. That is, this removes the <sup><a...></sup> marks, so that a piece of content like "Hello1, world2" becomes "Hello, world".

Why?

See parent issue.

How?

Hook into the the_content filter, but only when this is part of a wider get_the_excerpt filter.

Question: Is this the best approach to fix the issue?

Testing Instructions

  1. Start a post, add some text and anchors.
  2. Find a way to grab its excerpt, e.g. by rendering it in a template which uses the Post Excerpt block.

Screenshots or screencast

On the left: a post with footnotes. On the right: the index page with post excerpts stripped of footnote anchors.

single-and-home-with-footnotes

@mcsf mcsf added [Type] Bug An existing feature does not function as intended [Block] Footnotes Affects the Footnotes Block labels Jul 11, 2023
@mcsf mcsf requested review from carolinan and priethor July 11, 2023 16:24
@mcsf mcsf requested a review from spacedmonkey as a code owner July 11, 2023 16:24
@github-actions
Copy link

github-actions bot commented Jul 11, 2023

Flaky tests detected in bb26cac.
Some tests passed with failed attempts. The failures may not be related to this commit but are still reported for visibility. See the documentation for more information.

🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/5522773682
📝 Reported issues:

@bph
Copy link
Contributor

bph commented Jul 11, 2023

Tested

  • with Twenty-Twenty Three and Björk block themes
  • tested with multiple Footnotes in Excerpts as well.
  • Footnotes are removed from the excerpt in previews in the template editor as well as the site editor and the frontend.

All 🟢

Copy link
Contributor

@andrewserong andrewserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this one!

Hook into the the_content filter, but only when this is part of a wider get_the_excerpt filter.
Question: Is this the best approach to fix the issue?

Great question! I wasn't too familiar with how all this is hooked together, so took a look through the chain of calls and filters. I think on balance this is a pretty good approach so far, based on the following:

  • In core, the default excerpt is grabbed via a default filter on get_the_excerpt which calls wp_trim_excerpt.
  • wp_trim_excerpt is responsible for getting the post content, then strips shortcodes and strips blocks that shouldn't be present in the output (excerpt_remove_blocks), before applying the_content filter.
  • By the time the wp_trim_excerpt filter is run, wp_trim_words is already called, which strips <sup> but without removing its contents, resulting in stray 1 and 2 (etc) characters being retained in the post excerpt.

So, currently, it seems the only place to hook into removal is either within the_content or render_block callbacks, and the_content means the check will happen fewer times than if we check while rendering blocks 👍. (Technically, it might be possible to do it via a filter on wp_trim_words using the $original_text param, but that would involve re-doing the trimming behaviour, so would be a fair bit hackier IMO)

One potential alternative in core could be to still use the function introduced in this PR, but instead of applying it via the_content filter, add it as a direct call within wp_trim_excerpt. That could potentially make things slightly neater, but with the disadvantage that if anyone wanted to switch off this callback, they couldn't call remove_filter to remove the call to gutenberg_trim_footnoes.

So, all in all, on balance, I think you've found a good approach here. At first I was wondering if a CSS solution could work to set it to display: none, but the stripping of the HTML tags in wp_trim_words renders that impossible.

✅ Tests nicely, stripping the sup tags in excerpts
✅ Rendering is unaffected outside of rendering excerpts
✅ Regex looks solid to me, and is specific enough that it's not going to accidentally remove other markup that shouldn't be removed

When backporting to core, it might be worth adding a couple of unit tests to confirm the removal works as expected? Other than that, this LGTM! ✨

return $content;
}

static $footnote_pattern = '_<sup data-fn="[^"]+" class="[^"]+">\s*<a href="[^"]+" id="[^"]+">\d+</a>\s*</sup>_';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working pretty well, thanks! I'm not sure what else we could do here.

Looks like it handles most of the inline changes we can make to the footnote sup

Screenshot 2023-07-12 at 1 33 43 pm

I only think with regexes like these, especially for new functionality, it would be great to have unit tests. That way when breaking changes come down the line we can react immediately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also tested in Chinese and Cyrillic scripts just to make sure no i18n funny business was messing with things.

Screenshot 2023-07-12 at 1 43 34 pm

Looks good!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the HTML tag processor here? Cc @dmsnell

Copy link
Member

@ramonjd ramonjd Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also interesting to know this.

When doing the backport I had a quick look at whether it could remove tags like it could with attributes/classes and I couldn't find anything.

The other option would be a DOM parser to strip the <sup /> and its contents but not sure if that would be a faster way to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other possible tag inside of here besides <sup> and <a>?
The Tag Processor cannot easily remove this pattern, but the HTML Processor would be able to if I added <sup> support and the ability to remove a tag.

This could all plausibly happen for the WordPress 6.4 release, no promises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there's an "ugly" way to do it with the Tag Processor by subclassing and accessing internals, but the HTML Processor is working well enough I'm not experimenting with that approach anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the HTML tag processor here?

I don't believe so because WP_HTML_Tag_Processor doesn't have a remove_tag method. If it did, you could do something like this:

$p = new WP_HTML_Tag_Processor( $html );
while ( $p->next_tag( array( 'tag_name' => 'SUP' ) ) ) {
	if (
		$p->get_attribute( 'data-fn' ) &&
		$p->get_attribute( 'class' )
	) {
		$p->set_bookmark( 'footer_sup' );
		if (
			$p->next_tag() &&
			'A' === $p->get_tag() &&
			$p->get_attribute( 'href' ) &&
			$p->get_attribute( 'id' )
		) {
			$p->seek( 'footer_sup' );
			$p->remove_tag(); // 👈 This method doesn't exist.
		}
		$p->release_bookmark( 'footer_sup' );
	}
}

But @dmsnell would know better 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oops race condition)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just flagging this with @dmsnell as another place we're processing HTML 😄

@ramonjd
Copy link
Member

ramonjd commented Jul 12, 2023

When backporting to core, it might be worth adding a couple of unit tests to confirm the removal works as expected?

+1

Also, testing this PR made me notice that footnotes are not restored with previous revisions.

@ramonjd
Copy link
Member

ramonjd commented Jul 12, 2023

Also, testing this PR made me notice that footnotes are not restored with previous revisions.

I'll create an issue 👍🏻

@bph bph added Backport to WP 6.7 Beta/RC Pull request that needs to be backported to the WordPress major release that's currently in beta Backport to Gutenberg RC Pull request that needs to be backported to a Gutenberg release candidate (RC) labels Jul 12, 2023
@bph
Copy link
Contributor

bph commented Jul 12, 2023

Should this be merged in time, it would be great to have in WordPress 6.3 and also in GB 16.2.

@mcsf mcsf merged commit c286d1d into trunk Jul 12, 2023
@mcsf mcsf deleted the fix/footnotes-trim-from-excerpts branch July 12, 2023 13:28
@github-actions github-actions bot added this to the Gutenberg 16.3 milestone Jul 12, 2023
@mcsf
Copy link
Contributor Author

mcsf commented Jul 12, 2023

Thanks for the thorough reviewing, everyone, and thanks for the follow-up items, @ramonjd!

@andrewserong:

One potential alternative in core could be to still use the function introduced in this PR, but instead of applying it via the_content filter, add it as a direct call within wp_trim_excerpt. That could potentially make things slightly neater, but with the disadvantage that if anyone wanted to switch off this callback, they couldn't call remove_filter to remove the call to gutenberg_trim_footnoes.

Yeah, I agree that a few such solutions are on the table for Core itself, though I'm glad you're validating the current approach!

@ramonjd:

Looks like it handles most of the inline changes we can make to the footnote sup

Testing concurrent formats is something I completely missed and it was a great call of yours. I was worried for a second about such situations leading to certain markup that the regular expression would then miss, but I believe that in practice this can't happen, because:

  • In their current implementation, footnotes can only be inserted as a special RichText object that then gets the <sup><a>...</a></sup> wrapped around it.
    • In particular, users can't apply a footnote as a format around some preexisting piece of rich text, even though from a UX standpoint it feels that way. So we don't end up with <sup><a> <strong>preexisting</strong> </a></sup>.
  • Only other formats can be applied to a footnote anchor, causing the markup to look like <strong> <sup><a>...</a></sup> </strong>, which doesn't affect the regex matching.

ockham pushed a commit that referenced this pull request Jul 12, 2023
* Trim footnote anchors from excerpts

* Add comments, fix spacing, appease linter
@ockham
Copy link
Contributor

ockham commented Jul 12, 2023

I just cherry-picked this PR to the release/16.2 branch to get it included in the next release: 0c1a9c6

@ockham ockham removed the Backport to Gutenberg RC Pull request that needs to be backported to a Gutenberg release candidate (RC) label Jul 12, 2023
@ockham ockham modified the milestones: Gutenberg 16.3, Gutenberg 16.2 Jul 12, 2023
westonruter added a commit that referenced this pull request Jul 12, 2023
…dd/defer-script-loading-strategy

* 'trunk' of https://github.com/WordPress/gutenberg:
  Update Changelog for 16.2.0
  Adding support for defined IDs in `TextControl` component (#52028)
  Bump plugin version to 16.2.0
  Revert "Bump plugin version to 16.2.0"
  Bump plugin version to 16.2.0
  Add maxLength to LinkControl search items (#52523)
  [RNMobile] Update Editor block inserter button styles and default text input placeholder/selection styles (#52269)
  Site Editor: Reset device preview type when exiting the editing mode (#52566)
  Trim footnote anchors from excerpts (#52518)
  Add back old Navigation and File blocks JavaScript implementation when Gutenberg is not installed (#52553)
  Block Editor: Ensure synced patterns are accounted for in 'getAllowedBlocks' (#52546)
  Fix md5 class messed up with new block key (#52557)
  Fix entity cache misses for single posts due to string as recordKey (#52338)
  Make "My patterns" permanently visible (#52531)
  Hide site hub when resizing frame upwards to avoid overlap (#52180)
  Fix "Manage all patterns" link appearance (#52532)
  Update navigation menu title size & weight in detail panels (#52477)
  Site Editor Patterns: Ensure sidebar does not shrink when long pattern titles are used (#52547)
  Site Editor: Restore quick inserter 'Browse all' button (#52529)
  Patterns: update the title of Pattern block in the block inspector card  (#52010)
tellthemachines pushed a commit that referenced this pull request Jul 13, 2023
* Trim footnote anchors from excerpts

* Add comments, fix spacing, appease linter
@tellthemachines
Copy link
Contributor

I just cherry-picked this PR to the update/first-batch-pre-RC1 branch to get it included in the next release: 1591f04

@tellthemachines tellthemachines removed the Backport to WP 6.7 Beta/RC Pull request that needs to be backported to the WordPress major release that's currently in beta label Jul 13, 2023
tellthemachines added a commit that referenced this pull request Jul 14, 2023
* Site Editor: Restore quick inserter 'Browse all' button (#52529)

* Site Editor: Restore quick inserter 'Browse all' button

* Remove leftover comment

* Patterns: update the title of Pattern block in the block inspector card  (#52010)

* Site Editor Pages: load the appropriate template if posts page set (#52266)

* This commit:
- links the posts page to the homepage template when a post page is set
- abstracts logic to get page item props

* The Posts Page resolves to display the Home or Index template only. Adding a check to skip the Front Page

* Showing homepage settings for posts pages that are set as the post page in reading settings

* Post pages that have been set to display posts will redirect to first the home template, then the index template. The fallback is the post id of the page.

* Reverted refactor of packages/edit-site/src/components/sidebar-navigation-screen-page/index.js
Will do it in a follow up

* Allow editing existing footnote from formats toolbar (#52506)

* Block Editor: Display variation icon in the 'BlockDraggable' component (#52502)

* Patterns: add option to set sync status when adding from wp-admin patterns list (#52352)

* Show a modal to set sync status if adding pattern from pattern list page

* Make sure the modal loads if post settings panel not open

* don't load modal component at all if not new post

* Simplify the sync status so undefined always = synced

* Update packages/editor/src/components/post-sync-status/index.js

---------

Co-authored-by: Ramon <[email protected]>

* Revise LinkControl suggestions UI to use MenuItem (#50978)

* Use "link" instead of "URL" for URL_TYPE

* Use MenuItem for search create button

* Use sentence case for "Create page"

* Use a MenuGroup for search results

* Use MenuItem for search item

* Refactoring styles (WIP)

* Preserve whitespace in results text

* Reinstate result item information including permalink

* Remove debugging CSS code

* Reinstate CSS to control size of rich previews favicon

* Remove other commented out CSS code

* Reinstate selected styles

* Remove more redundant CSS

* Add some basic results hover/focus styling.

Needs improving

* Improve icon alignment

* Update tests to handle wording changes

* Remove inconsistent hover/focus style

MenuItem already has hover/focus styles

* Reinstate is-selected visual state

* Update test to make sense in context of #51011

See #51011

* Fix locator for result text

---------

Co-authored-by: Dave Smith <[email protected]>

* Fix LinkControl mark highlight to bold (#52517)

* Add 'reusable' keyword to Pattern blocks (#52543)

* Fix missing Add Template Part button in Template Parts page (#52542)

* Tweak copy for the reusable block rename hint (#52581)

* Trim footnote anchors from excerpts (#52518)

* Trim footnote anchors from excerpts

* Add comments, fix spacing, appease linter

* Site Editor: Reset device preview type when exiting the editing mode (#52566)

* Make "My patterns" permanently visible (#52531)

* Hide site hub when resizing frame upwards to avoid overlap (#52180)

* Fix "Manage all patterns" link appearance (#52532)

* Fix "Manage all patterns" link

* Update focus style

* Update navigation menu title size & weight in detail panels (#52477)

* Update menu title size

* Adjust font weight

* Site Editor Patterns: Ensure sidebar does not shrink when long pattern titles are used (#52547)

* Edit Site: Select templateType from the store inside the useSiteEditorSettings hook (#52503)

* Add width to ensure ellipsis is applied (#52575)

* Cover Block: Fix block deprecation when fixed background is enabled (#51612)

* ResizableFrame: Make keyboard accessible (#52443)

* ResizableFrame: Make keyboard accessible

* Fix outline in Safari

* Use proper CSS modifier

* Add aria-label to button

* Keep handle enlarged when resizing (Safari)

* Add back visually hidden help text

* Don't switch to edit mode

* Make the handle a role="separator"

* Revert to `button`

* Switch description text to `div hidden`

* Prevent keydown event default when right/left arrow

* Change minimum frame width to 320px

* Mention shift key in description text

* Only render resize handle when in View mode

* Fix importing classic menus (#52573)

* use the same create hook for classic import

* Remove redundant arg to hook

---------

Co-authored-by: Dave Smith <[email protected]>

* Site Editor: Fix navigation menu sidebar actions order and label (#52592)

* Avoid errors in Dimension visualizers when switching between iframed and non-iframed editors (#52588)

* Patterns: Add client side pagination to patterns list (#52538)

Co-authored-by: Saxon Fletcher <[email protected]>

* Site Editor: Make sidebar back button go *back* instead of *up* if possible (#52456)

* Navigator: Add replace option to goTo() and goToParent()

* Site Editor: Make sidebar back button go back instead of up if possible

* Adapt template part hint copy (#52527)

* Try "panel" instead of "page"

* Update template-part-hint.js

---------

Co-authored-by: George Mamadashvili <[email protected]>
Co-authored-by: Glen Davies <[email protected]>
Co-authored-by: Ramon <[email protected]>
Co-authored-by: Miguel Fonseca <[email protected]>
Co-authored-by: Rich Tabor <[email protected]>
Co-authored-by: Dave Smith <[email protected]>
Co-authored-by: Robert Anderson <[email protected]>
Co-authored-by: James Koster <[email protected]>
Co-authored-by: Andrew Serong <[email protected]>
Co-authored-by: arthur791004 <[email protected]>
Co-authored-by: Aki Hamano <[email protected]>
Co-authored-by: Lena Morita <[email protected]>
Co-authored-by: Andrei Draganescu <[email protected]>
Co-authored-by: Saxon Fletcher <[email protected]>
@ramonjd ramonjd added the Needs PHP backport Needs PHP backport to Core label Jul 14, 2023
@ramonjd ramonjd removed the Needs PHP backport Needs PHP backport to Core label Jul 16, 2023
@ellatrix
Copy link
Member

Thanks @mcsf!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Block] Footnotes Affects the Footnotes Block [Type] Bug An existing feature does not function as intended
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Footnotes: Excerpts displays the link text as a * after the text content
9 participants