Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fetchLinkSuggestions to sort results by relevancy #62397

Merged
merged 8 commits into from
Jun 14, 2024

Conversation

noisysocks
Copy link
Member

@noisysocks noisysocks commented Jun 7, 2024

What?

Fixes #56478.

Updates fetchLinkSuggestions, which is used by LinkControl, which is used by the popover that appears when you insert a link, to sort its results by relevancy to the search query that the user typed in.

Why?

When inserting a link, the user can search for posts, pages, tags, categories, post formats, and media. The search is done via the /wp/v2/search REST API endpoint.

Unfortunately WordPress can only search one database table at a time. For example it can't search wp_posts and wp_terms using one SQL query. The REST API does not hide this limitation and forces you to call/wp/v2/search with just one type param.

To get around this limitation, fetchLinkSuggestions currently makes 4 requests (posts, taxonomies, post formats, media) at once using a Promise.all, and then concatenates the results together. There is a max of 20 results per request for a combined max of 80 results. The first 20 results of the combined results are shown to the user.

There's a problem with this approach. Say the user is searching for a category named "Travel Tips" and that there are 20 pages or posts containing the words "travel" or "tips". The "Travel Tips" category will never be shown because it is crowded out by the 20 posts that appear in the combined results before any single category appears.

How?

My fix for this issue is to sort the combined results before we take the first 20 and show them to the user.

I'm sorting the results by how similar the title is to the search query that the user provided. This is done using cosine similarity. We treat the search query as one document and the result title as another document. Then, we build term frequency map vectors for each document. Finally, we calculate the relevancy score by taking the cosine similarity of the two vectors.

This PR now sorts the results by scoring each result where the score is the number of tokens in the title that are also in the search query, divided by the total number of tokens in the title. This gives us a score between 0 and 1, where 1 is a perfect match. This achieves good enough results but is much simpler to understand than the previous approach described above.

Alternative approaches

Ideally this would all be handled at the database level using full text search. I don't think we can assume that every WordPress installation's MySQL database has full text search enabled, however.

We could look at doing logic similar to what's in this PR at the server level. I'm not sure if we should, though. The REST API is arguably correct to not encapsulate this limitation. Not all clients will want to order and combine results the same way.

Doing this in the client doesn't prevent us from moving the logic to the server in the future, so it's a good place to start.

I tried simple keyword matching as an alternative algorithm for ranking results. This is where you award 1 point per keyword that the title and search term have in common. It didn't work as well for me in testing, though, because cosine similarity will give smaller titles that are similar to the query an edge over long titles that contain lots of irrelevant words in addition to the query.

House keeping

This PR also contains some house keeping while I'm in this part of the codebase:

  • Rename __experimentalFetchLinkSuggestions to fetchLinkSuggestions.
  • Rewrite fetchLinkSuggestions in TypeScript. (It was already partially JSDoc typed.)

Testing Instructions

You really don't notice this bug unless you're testing with real data and have more than 20 posts, tags, etc.

If you don't have any real data, you use WP CLI to create some realistic-enough posts, pages, tags, and categories. Here's some test data that I had ChatGPT spit out:

Test data

# If you're using wp-env, this alias is helpful.
alias wp='npx wp-env run cli wp'

wp post create --post_type=post --post_title="Exploring the Streets of Paris" --post_status=publish
wp post create --post_type=post --post_title="A Weekend in Rome: What to See and Do" --post_status=publish
wp post create --post_type=post --post_title="Hidden Beaches of Thailand" --post_status=publish
wp post create --post_type=post --post_title="Hiking Trails in the Swiss Alps" --post_status=publish
wp post create --post_type=post --post_title="Cultural Delights of Tokyo" --post_status=publish
wp post create --post_type=post --post_title="Road Trip Through the Australian Outback" --post_status=publish
wp post create --post_type=post --post_title="Discovering Ancient Ruins in Greece" --post_status=publish
wp post create --post_type=post --post_title="Foodie Adventures in Mexico City" --post_status=publish
wp post create --post_type=post --post_title="Safari Experiences in Kenya" --post_status=publish
wp post create --post_type=post --post_title="Island Hopping in the Philippines" --post_status=publish
wp post create --post_type=post --post_title="Exploring the Markets of Marrakech" --post_status=publish
wp post create --post_type=post --post_title="Cityscape Views from New York" --post_status=publish
wp post create --post_type=post --post_title="The Nightlife of Berlin" --post_status=publish
wp post create --post_type=post --post_title="Adventure Sports in New Zealand" --post_status=publish
wp post create --post_type=post --post_title="Luxury Travel in Dubai" --post_status=publish
wp post create --post_type=post --post_title="Historical Sites in Egypt" --post_status=publish
wp post create --post_type=post --post_title="Wine Tasting in Napa Valley" --post_status=publish
wp post create --post_type=post --post_title="Cruising the Caribbean" --post_status=publish
wp post create --post_type=post --post_title="Exploring National Parks in the USA" --post_status=publish
wp post create --post_type=post --post_title="City Guide to Barcelona" --post_status=publish

wp post create --post_type=page --post_title="Top Travel Destinations 2024" --post_status=publish
wp post create --post_type=page --post_title="Ultimate Packing Guide for Travelers" --post_status=publish
wp post create --post_type=page --post_title="How to Plan a Budget Trip" --post_status=publish
wp post create --post_type=page --post_title="Travel Safety Tips for Solo Travelers" --post_status=publish
wp post create --post_type=page --post_title="Family Travel: Best Destinations" --post_status=publish
wp post create --post_type=page --post_title="Romantic Getaways Around the World" --post_status=publish
wp post create --post_type=page --post_title="Top Cultural Festivals to Attend" --post_status=publish
wp post create --post_type=page --post_title="Best Road Trips in the USA" --post_status=publish
wp post create --post_type=page --post_title="Luxury Hotels and Resorts" --post_status=publish
wp post create --post_type=page --post_title="Backpacking Tips for Beginners" --post_status=publish
wp post create --post_type=page --post_title="How to Travel Sustainably" --post_status=publish
wp post create --post_type=page --post_title="Best Cruise Lines for 2024" --post_status=publish
wp post create --post_type=page --post_title="Guide to Adventure Travel" --post_status=publish
wp post create --post_type=page --post_title="Exploring Local Cuisines" --post_status=publish
wp post create --post_type=page --post_title="Top Travel Apps You Need" --post_status=publish
wp post create --post_type=page --post_title="Best Beaches in the World" --post_status=publish
wp post create --post_type=page --post_title="Traveling with Pets: What You Need to Know" --post_status=publish
wp post create --post_type=page --post_title="City Guides: Where to Go and What to See" --post_status=publish
wp post create --post_type=page --post_title="Travel Insurance: Do You Need It?" --post_status=publish
wp post create --post_type=page --post_title="How to Travel with Kids" --post_status=publish

wp term create category "European Adventures" --description="Explore the best of Europe."
wp term create category "Asian Escapades" --description="Discover the wonders of Asia."
wp term create category "African Safaris" --description="Experience the wildlife of Africa."
wp term create category "American Road Trips" --description="Best road trips in the USA."
wp term create category "Oceania Discoveries" --description="Explore Australia and New Zealand."
wp term create category "South American Journeys" --description="Adventure through South America."
wp term create category "City Guides" --description="Top cities to visit around the world."
wp term create category "Beach Holidays" --description="Best beach destinations."
wp term create category "Cultural Experiences" --description="Immerse in different cultures."
wp term create category "Historical Travels" --description="Visit historical sites."
wp term create category "Luxury Travels" --description="Travel in luxury."
wp term create category "Budget Travels" --description="Travel on a budget."
wp term create category "Family Vacations" --description="Best destinations for families."
wp term create category "Romantic Getaways" --description="Perfect destinations for couples."
wp term create category "Adventure Travel" --description="For the adventurous souls."
wp term create category "Food and Travel" --description="Explore culinary delights."
wp term create category "Sustainable Travel" --description="Eco-friendly travel tips."
wp term create category "Solo Travel" --description="Tips for traveling alone."
wp term create category "Travel Tips" --description="General travel advice."
wp term create category "Cruise Holidays" --description="Best cruises to take."

wp term create post_tag "Travel Tips" --description="Essential tips for travelers."
wp term create post_tag "Adventure" --description="For the thrill-seekers."
wp term create post_tag "Beaches" --description="Best beaches around the world."
wp term create post_tag "Cultural" --description="Cultural experiences and festivals."
wp term create post_tag "Historical" --description="Visit historical sites and monuments."
wp term create post_tag "Luxury" --description="Luxury travel experiences."
wp term create post_tag "Budget" --description="How to travel on a budget."
wp term create post_tag "Family" --description="Best destinations for families."
wp term create post_tag "Romantic" --description="Romantic getaways for couples."
wp term create post_tag "Solo" --description="Tips for solo travelers."
wp term create post_tag "Foodie" --description="Explore local cuisines."
wp term create post_tag "Nature" --description="Nature and wildlife experiences."
wp term create post_tag "Urban" --description="City travel guides."
wp term create post_tag "Road Trip" --description="Best road trips."
wp term create post_tag "Cruise" --description="Cruise holidays."
wp term create post_tag "Hiking" --description="Best hiking trails."
wp term create post_tag "Beach" --description="Top beach destinations."
wp term create post_tag "Mountain" --description="Mountain adventures."
wp term create post_tag "Island" --description="Island hopping experiences."
wp term create post_tag "Festival" --description="Top festivals to attend."

Now:

  1. Edit a template or post.
  2. Insert a link. A good place to test this is in the Navigation block.
  3. Search for something that's not a post, e.g. a tag or a category.

Screenshots or screencast

Before:

Kapture.2024-06-07.at.16.16.16.mp4

After:

Kapture.2024-06-07.at.16.17.18.mp4

@noisysocks noisysocks added [Type] Bug An existing feature does not function as intended [Block] Navigation Affects the Navigation Block [Feature] Link Editing Link components (LinkControl, URLInput) and integrations (RichText link formatting) labels Jun 7, 2024
@noisysocks noisysocks requested a review from nerrad as a code owner June 7, 2024 06:18
Copy link

github-actions bot commented Jun 7, 2024

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Unlinked Accounts

The following contributors have not linked their GitHub and WordPress.org accounts: @scrobbleme.

Contributors, please read how to link your accounts to ensure your work is properly credited in WordPress releases.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Unlinked contributors: scrobbleme.

Co-authored-by: noisysocks <[email protected]>
Co-authored-by: ellatrix <[email protected]>
Co-authored-by: ntsekouras <[email protected]>
Co-authored-by: ramonjd <[email protected]>
Co-authored-by: andrewserong <[email protected]>
Co-authored-by: skorasaurus <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@ellatrix
Copy link
Member

ellatrix commented Jun 7, 2024

Is this really a bug? Should this be backported for WP 6.6? If so, please add the Backport to WP Beta/RC label. Thanks!

@noisysocks
Copy link
Member Author

Is this really a bug? Should this be backported for WP 6.6? If so, please add the Backport to WP Beta/RC label. Thanks!

I think so. The existing experience is broken. I don't think it needs to be backported to 6.6.

@noisysocks

This comment was marked as outdated.

@skorasaurus
Copy link
Member

Thanks for this; As I understand, this would also close #39964

@noisysocks noisysocks self-assigned this Jun 11, 2024
@noisysocks noisysocks added the No Core Sync Required Indicates that any changes do not need to be synced to WordPress Core label Jun 12, 2024
- Rename __experimentalFetchLinkSuggestions to fetchLinkSuggestions.
- Rewrite fetchLinkSuggestions in TypeScript.
- Sort results by relevancy using cosine similiarty between term frequency vectors.
Copy link

github-actions bot commented Jun 12, 2024

Size Change: +922 B (+0.05%)

Total Size: 1.76 MB

Filename Size Change
build/block-editor/index.min.js 262 kB -242 B (-0.09%)
build/block-editor/style-rtl.css 15.6 kB -63 B (-0.4%)
build/block-editor/style.css 15.5 kB -62 B (-0.4%)
build/block-library/index.min.js 219 kB +224 B (+0.1%)
build/core-data/index.min.js 72.6 kB +68 B (+0.09%)
build/edit-site/index.min.js 207 kB -90 B (-0.04%)
build/edit-site/posts-rtl.css 6.35 kB +25 B (+0.4%)
build/edit-site/posts.css 6.35 kB +25 B (+0.4%)
build/edit-site/style-rtl.css 11.7 kB +32 B (+0.27%)
build/edit-site/style.css 11.7 kB +30 B (+0.26%)
build/editor/index.min.js 98 kB +35 B (+0.04%)
build/patterns/index.min.js 7.22 kB +755 B (+11.67%) ⚠️
build/patterns/style-rtl.css 687 B +93 B (+15.66%) ⚠️
build/patterns/style.css 685 B +92 B (+15.51%) ⚠️
ℹ️ View Unchanged
Filename Size
build/a11y/index.min.js 951 B
build/annotations/index.min.js 2.26 kB
build/api-fetch/index.min.js 2.31 kB
build/autop/index.min.js 2.12 kB
build/blob/index.min.js 579 B
build/block-directory/index.min.js 7.31 kB
build/block-directory/style-rtl.css 1.02 kB
build/block-directory/style.css 1.02 kB
build/block-editor/content-rtl.css 4.57 kB
build/block-editor/content.css 4.57 kB
build/block-editor/default-editor-styles-rtl.css 394 B
build/block-editor/default-editor-styles.css 394 B
build/block-library/blocks/archives/editor-rtl.css 61 B
build/block-library/blocks/archives/editor.css 60 B
build/block-library/blocks/archives/style-rtl.css 90 B
build/block-library/blocks/archives/style.css 90 B
build/block-library/blocks/audio/editor-rtl.css 149 B
build/block-library/blocks/audio/editor.css 151 B
build/block-library/blocks/audio/style-rtl.css 125 B
build/block-library/blocks/audio/style.css 125 B
build/block-library/blocks/audio/theme-rtl.css 126 B
build/block-library/blocks/audio/theme.css 126 B
build/block-library/blocks/avatar/editor-rtl.css 115 B
build/block-library/blocks/avatar/editor.css 115 B
build/block-library/blocks/avatar/style-rtl.css 104 B
build/block-library/blocks/avatar/style.css 104 B
build/block-library/blocks/button/editor-rtl.css 310 B
build/block-library/blocks/button/editor.css 310 B
build/block-library/blocks/button/style-rtl.css 538 B
build/block-library/blocks/button/style.css 538 B
build/block-library/blocks/buttons/editor-rtl.css 336 B
build/block-library/blocks/buttons/editor.css 336 B
build/block-library/blocks/buttons/style-rtl.css 328 B
build/block-library/blocks/buttons/style.css 328 B
build/block-library/blocks/calendar/style-rtl.css 240 B
build/block-library/blocks/calendar/style.css 240 B
build/block-library/blocks/categories/editor-rtl.css 113 B
build/block-library/blocks/categories/editor.css 112 B
build/block-library/blocks/categories/style-rtl.css 124 B
build/block-library/blocks/categories/style.css 124 B
build/block-library/blocks/code/editor-rtl.css 53 B
build/block-library/blocks/code/editor.css 53 B
build/block-library/blocks/code/style-rtl.css 121 B
build/block-library/blocks/code/style.css 121 B
build/block-library/blocks/code/theme-rtl.css 122 B
build/block-library/blocks/code/theme.css 122 B
build/block-library/blocks/columns/editor-rtl.css 108 B
build/block-library/blocks/columns/editor.css 108 B
build/block-library/blocks/columns/style-rtl.css 420 B
build/block-library/blocks/columns/style.css 420 B
build/block-library/blocks/comment-author-avatar/editor-rtl.css 124 B
build/block-library/blocks/comment-author-avatar/editor.css 124 B
build/block-library/blocks/comment-content/style-rtl.css 90 B
build/block-library/blocks/comment-content/style.css 90 B
build/block-library/blocks/comment-template/style-rtl.css 200 B
build/block-library/blocks/comment-template/style.css 199 B
build/block-library/blocks/comments-pagination-numbers/editor-rtl.css 122 B
build/block-library/blocks/comments-pagination-numbers/editor.css 121 B
build/block-library/blocks/comments-pagination/editor-rtl.css 221 B
build/block-library/blocks/comments-pagination/editor.css 211 B
build/block-library/blocks/comments-pagination/style-rtl.css 234 B
build/block-library/blocks/comments-pagination/style.css 231 B
build/block-library/blocks/comments-title/editor-rtl.css 75 B
build/block-library/blocks/comments-title/editor.css 75 B
build/block-library/blocks/comments/editor-rtl.css 832 B
build/block-library/blocks/comments/editor.css 832 B
build/block-library/blocks/comments/style-rtl.css 632 B
build/block-library/blocks/comments/style.css 631 B
build/block-library/blocks/cover/editor-rtl.css 668 B
build/block-library/blocks/cover/editor.css 669 B
build/block-library/blocks/cover/style-rtl.css 1.62 kB
build/block-library/blocks/cover/style.css 1.6 kB
build/block-library/blocks/details/editor-rtl.css 65 B
build/block-library/blocks/details/editor.css 65 B
build/block-library/blocks/details/style-rtl.css 86 B
build/block-library/blocks/details/style.css 86 B
build/block-library/blocks/embed/editor-rtl.css 314 B
build/block-library/blocks/embed/editor.css 314 B
build/block-library/blocks/embed/style-rtl.css 411 B
build/block-library/blocks/embed/style.css 411 B
build/block-library/blocks/embed/theme-rtl.css 126 B
build/block-library/blocks/embed/theme.css 126 B
build/block-library/blocks/file/editor-rtl.css 326 B
build/block-library/blocks/file/editor.css 326 B
build/block-library/blocks/file/style-rtl.css 278 B
build/block-library/blocks/file/style.css 279 B
build/block-library/blocks/file/view.min.js 324 B
build/block-library/blocks/footnotes/style-rtl.css 198 B
build/block-library/blocks/footnotes/style.css 197 B
build/block-library/blocks/form-input/editor-rtl.css 229 B
build/block-library/blocks/form-input/editor.css 229 B
build/block-library/blocks/form-input/style-rtl.css 342 B
build/block-library/blocks/form-input/style.css 342 B
build/block-library/blocks/form-submission-notification/editor-rtl.css 344 B
build/block-library/blocks/form-submission-notification/editor.css 341 B
build/block-library/blocks/form-submit-button/style-rtl.css 69 B
build/block-library/blocks/form-submit-button/style.css 69 B
build/block-library/blocks/form/view.min.js 470 B
build/block-library/blocks/freeform/editor-rtl.css 2.6 kB
build/block-library/blocks/freeform/editor.css 2.6 kB
build/block-library/blocks/gallery/editor-rtl.css 958 B
build/block-library/blocks/gallery/editor.css 962 B
build/block-library/blocks/gallery/style-rtl.css 1.71 kB
build/block-library/blocks/gallery/style.css 1.71 kB
build/block-library/blocks/gallery/theme-rtl.css 108 B
build/block-library/blocks/gallery/theme.css 108 B
build/block-library/blocks/group/editor-rtl.css 402 B
build/block-library/blocks/group/editor.css 402 B
build/block-library/blocks/group/style-rtl.css 103 B
build/block-library/blocks/group/style.css 103 B
build/block-library/blocks/group/theme-rtl.css 79 B
build/block-library/blocks/group/theme.css 79 B
build/block-library/blocks/heading/style-rtl.css 188 B
build/block-library/blocks/heading/style.css 188 B
build/block-library/blocks/html/editor-rtl.css 346 B
build/block-library/blocks/html/editor.css 347 B
build/block-library/blocks/image/editor-rtl.css 890 B
build/block-library/blocks/image/editor.css 889 B
build/block-library/blocks/image/style-rtl.css 1.52 kB
build/block-library/blocks/image/style.css 1.51 kB
build/block-library/blocks/image/theme-rtl.css 137 B
build/block-library/blocks/image/theme.css 137 B
build/block-library/blocks/image/view.min.js 1.54 kB
build/block-library/blocks/latest-comments/style-rtl.css 355 B
build/block-library/blocks/latest-comments/style.css 354 B
build/block-library/blocks/latest-posts/editor-rtl.css 204 B
build/block-library/blocks/latest-posts/editor.css 204 B
build/block-library/blocks/latest-posts/style-rtl.css 509 B
build/block-library/blocks/latest-posts/style.css 510 B
build/block-library/blocks/list/style-rtl.css 104 B
build/block-library/blocks/list/style.css 104 B
build/block-library/blocks/media-text/editor-rtl.css 304 B
build/block-library/blocks/media-text/editor.css 303 B
build/block-library/blocks/media-text/style-rtl.css 506 B
build/block-library/blocks/media-text/style.css 504 B
build/block-library/blocks/more/editor-rtl.css 427 B
build/block-library/blocks/more/editor.css 427 B
build/block-library/blocks/navigation-link/editor-rtl.css 663 B
build/block-library/blocks/navigation-link/editor.css 664 B
build/block-library/blocks/navigation-link/style-rtl.css 192 B
build/block-library/blocks/navigation-link/style.css 191 B
build/block-library/blocks/navigation-submenu/editor-rtl.css 295 B
build/block-library/blocks/navigation-submenu/editor.css 294 B
build/block-library/blocks/navigation/editor-rtl.css 2.2 kB
build/block-library/blocks/navigation/editor.css 2.21 kB
build/block-library/blocks/navigation/style-rtl.css 2.25 kB
build/block-library/blocks/navigation/style.css 2.24 kB
build/block-library/blocks/navigation/view.min.js 1.03 kB
build/block-library/blocks/nextpage/editor-rtl.css 392 B
build/block-library/blocks/nextpage/editor.css 392 B
build/block-library/blocks/page-list/editor-rtl.css 378 B
build/block-library/blocks/page-list/editor.css 378 B
build/block-library/blocks/page-list/style-rtl.css 175 B
build/block-library/blocks/page-list/style.css 175 B
build/block-library/blocks/paragraph/editor-rtl.css 236 B
build/block-library/blocks/paragraph/editor.css 236 B
build/block-library/blocks/paragraph/style-rtl.css 341 B
build/block-library/blocks/paragraph/style.css 340 B
build/block-library/blocks/post-author/style-rtl.css 175 B
build/block-library/blocks/post-author/style.css 176 B
build/block-library/blocks/post-comments-form/editor-rtl.css 96 B
build/block-library/blocks/post-comments-form/editor.css 96 B
build/block-library/blocks/post-comments-form/style-rtl.css 506 B
build/block-library/blocks/post-comments-form/style.css 506 B
build/block-library/blocks/post-content/editor-rtl.css 74 B
build/block-library/blocks/post-content/editor.css 74 B
build/block-library/blocks/post-date/style-rtl.css 62 B
build/block-library/blocks/post-date/style.css 62 B
build/block-library/blocks/post-excerpt/editor-rtl.css 71 B
build/block-library/blocks/post-excerpt/editor.css 71 B
build/block-library/blocks/post-excerpt/style-rtl.css 141 B
build/block-library/blocks/post-excerpt/style.css 141 B
build/block-library/blocks/post-featured-image/editor-rtl.css 729 B
build/block-library/blocks/post-featured-image/editor.css 726 B
build/block-library/blocks/post-featured-image/style-rtl.css 341 B
build/block-library/blocks/post-featured-image/style.css 341 B
build/block-library/blocks/post-navigation-link/style-rtl.css 215 B
build/block-library/blocks/post-navigation-link/style.css 214 B
build/block-library/blocks/post-template/editor-rtl.css 99 B
build/block-library/blocks/post-template/editor.css 98 B
build/block-library/blocks/post-template/style-rtl.css 399 B
build/block-library/blocks/post-template/style.css 398 B
build/block-library/blocks/post-terms/style-rtl.css 96 B
build/block-library/blocks/post-terms/style.css 96 B
build/block-library/blocks/post-time-to-read/style-rtl.css 70 B
build/block-library/blocks/post-time-to-read/style.css 70 B
build/block-library/blocks/post-title/style-rtl.css 100 B
build/block-library/blocks/post-title/style.css 100 B
build/block-library/blocks/preformatted/style-rtl.css 125 B
build/block-library/blocks/preformatted/style.css 125 B
build/block-library/blocks/pullquote/editor-rtl.css 134 B
build/block-library/blocks/pullquote/editor.css 134 B
build/block-library/blocks/pullquote/style-rtl.css 342 B
build/block-library/blocks/pullquote/style.css 342 B
build/block-library/blocks/pullquote/theme-rtl.css 167 B
build/block-library/blocks/pullquote/theme.css 167 B
build/block-library/blocks/query-pagination-numbers/editor-rtl.css 121 B
build/block-library/blocks/query-pagination-numbers/editor.css 118 B
build/block-library/blocks/query-pagination/editor-rtl.css 220 B
build/block-library/blocks/query-pagination/editor.css 208 B
build/block-library/blocks/query-pagination/style-rtl.css 287 B
build/block-library/blocks/query-pagination/style.css 283 B
build/block-library/blocks/query-title/style-rtl.css 64 B
build/block-library/blocks/query-title/style.css 64 B
build/block-library/blocks/query/editor-rtl.css 502 B
build/block-library/blocks/query/editor.css 502 B
build/block-library/blocks/query/view.min.js 958 B
build/block-library/blocks/quote/style-rtl.css 238 B
build/block-library/blocks/quote/style.css 238 B
build/block-library/blocks/quote/theme-rtl.css 221 B
build/block-library/blocks/quote/theme.css 225 B
build/block-library/blocks/read-more/style-rtl.css 138 B
build/block-library/blocks/read-more/style.css 138 B
build/block-library/blocks/rss/editor-rtl.css 101 B
build/block-library/blocks/rss/editor.css 101 B
build/block-library/blocks/rss/style-rtl.css 288 B
build/block-library/blocks/rss/style.css 287 B
build/block-library/blocks/search/editor-rtl.css 183 B
build/block-library/blocks/search/editor.css 183 B
build/block-library/blocks/search/style-rtl.css 684 B
build/block-library/blocks/search/style.css 683 B
build/block-library/blocks/search/theme-rtl.css 113 B
build/block-library/blocks/search/theme.css 113 B
build/block-library/blocks/search/view.min.js 475 B
build/block-library/blocks/separator/editor-rtl.css 100 B
build/block-library/blocks/separator/editor.css 100 B
build/block-library/blocks/separator/style-rtl.css 248 B
build/block-library/blocks/separator/style.css 248 B
build/block-library/blocks/separator/theme-rtl.css 195 B
build/block-library/blocks/separator/theme.css 195 B
build/block-library/blocks/shortcode/editor-rtl.css 286 B
build/block-library/blocks/shortcode/editor.css 286 B
build/block-library/blocks/site-logo/editor-rtl.css 806 B
build/block-library/blocks/site-logo/editor.css 803 B
build/block-library/blocks/site-logo/style-rtl.css 218 B
build/block-library/blocks/site-logo/style.css 218 B
build/block-library/blocks/site-tagline/editor-rtl.css 87 B
build/block-library/blocks/site-tagline/editor.css 87 B
build/block-library/blocks/site-title/editor-rtl.css 123 B
build/block-library/blocks/site-title/editor.css 123 B
build/block-library/blocks/site-title/style-rtl.css 71 B
build/block-library/blocks/site-title/style.css 71 B
build/block-library/blocks/social-link/editor-rtl.css 338 B
build/block-library/blocks/social-link/editor.css 338 B
build/block-library/blocks/social-links/editor-rtl.css 676 B
build/block-library/blocks/social-links/editor.css 675 B
build/block-library/blocks/social-links/style-rtl.css 1.5 kB
build/block-library/blocks/social-links/style.css 1.5 kB
build/block-library/blocks/spacer/editor-rtl.css 346 B
build/block-library/blocks/spacer/editor.css 346 B
build/block-library/blocks/spacer/style-rtl.css 48 B
build/block-library/blocks/spacer/style.css 48 B
build/block-library/blocks/table/editor-rtl.css 394 B
build/block-library/blocks/table/editor.css 394 B
build/block-library/blocks/table/style-rtl.css 640 B
build/block-library/blocks/table/style.css 639 B
build/block-library/blocks/table/theme-rtl.css 145 B
build/block-library/blocks/table/theme.css 145 B
build/block-library/blocks/tag-cloud/style-rtl.css 266 B
build/block-library/blocks/tag-cloud/style.css 265 B
build/block-library/blocks/template-part/editor-rtl.css 393 B
build/block-library/blocks/template-part/editor.css 393 B
build/block-library/blocks/template-part/theme-rtl.css 113 B
build/block-library/blocks/template-part/theme.css 113 B
build/block-library/blocks/term-description/style-rtl.css 108 B
build/block-library/blocks/term-description/style.css 108 B
build/block-library/blocks/text-columns/editor-rtl.css 95 B
build/block-library/blocks/text-columns/editor.css 95 B
build/block-library/blocks/text-columns/style-rtl.css 165 B
build/block-library/blocks/text-columns/style.css 165 B
build/block-library/blocks/verse/style-rtl.css 98 B
build/block-library/blocks/verse/style.css 98 B
build/block-library/blocks/video/editor-rtl.css 553 B
build/block-library/blocks/video/editor.css 554 B
build/block-library/blocks/video/style-rtl.css 186 B
build/block-library/blocks/video/style.css 186 B
build/block-library/blocks/video/theme-rtl.css 126 B
build/block-library/blocks/video/theme.css 126 B
build/block-library/classic-rtl.css 179 B
build/block-library/classic.css 179 B
build/block-library/common-rtl.css 1.11 kB
build/block-library/common.css 1.11 kB
build/block-library/editor-elements-rtl.css 75 B
build/block-library/editor-elements.css 75 B
build/block-library/editor-rtl.css 12 kB
build/block-library/editor.css 11.9 kB
build/block-library/elements-rtl.css 54 B
build/block-library/elements.css 54 B
build/block-library/reset-rtl.css 470 B
build/block-library/reset.css 470 B
build/block-library/style-rtl.css 14.6 kB
build/block-library/style.css 14.6 kB
build/block-library/theme-rtl.css 698 B
build/block-library/theme.css 703 B
build/block-serialization-default-parser/index.min.js 1.12 kB
build/block-serialization-spec-parser/index.min.js 2.87 kB
build/blocks/index.min.js 52.2 kB
build/commands/index.min.js 15.2 kB
build/commands/style-rtl.css 955 B
build/commands/style.css 952 B
build/components/index.min.js 223 kB
build/components/style-rtl.css 12 kB
build/components/style.css 12 kB
build/compose/index.min.js 12.9 kB
build/core-commands/index.min.js 2.74 kB
build/customize-widgets/index.min.js 10.9 kB
build/customize-widgets/style-rtl.css 1.35 kB
build/customize-widgets/style.css 1.35 kB
build/data-controls/index.min.js 641 B
build/data/index.min.js 8.99 kB
build/date/index.min.js 18 kB
build/deprecated/index.min.js 458 B
build/dom-ready/index.min.js 325 B
build/dom/index.min.js 4.65 kB
build/edit-post/classic-rtl.css 578 B
build/edit-post/classic.css 580 B
build/edit-post/index.min.js 12.4 kB
build/edit-post/style-rtl.css 2.31 kB
build/edit-post/style.css 2.31 kB
build/edit-widgets/index.min.js 17.6 kB
build/edit-widgets/style-rtl.css 4.19 kB
build/edit-widgets/style.css 4.19 kB
build/editor/style-rtl.css 9.22 kB
build/editor/style.css 9.22 kB
build/element/index.min.js 4.83 kB
build/escape-html/index.min.js 537 B
build/format-library/index.min.js 8.1 kB
build/format-library/style-rtl.css 494 B
build/format-library/style.css 493 B
build/hooks/index.min.js 1.54 kB
build/html-entities/index.min.js 445 B
build/i18n/index.min.js 3.58 kB
build/interactivity/debug.min.js 16.5 kB
build/interactivity/file.min.js 447 B
build/interactivity/image.min.js 1.68 kB
build/interactivity/index.min.js 13.4 kB
build/interactivity/navigation.min.js 1.16 kB
build/interactivity/query.min.js 742 B
build/interactivity/router.min.js 2.8 kB
build/interactivity/search.min.js 615 B
build/is-shallow-equal/index.min.js 526 B
build/keyboard-shortcuts/index.min.js 1.31 kB
build/keycodes/index.min.js 1.46 kB
build/list-reusable-blocks/index.min.js 2.17 kB
build/list-reusable-blocks/style-rtl.css 846 B
build/list-reusable-blocks/style.css 846 B
build/media-utils/index.min.js 2.92 kB
build/modules/importmap-polyfill.min.js 12.3 kB
build/notices/index.min.js 946 B
build/nux/index.min.js 1.58 kB
build/nux/style-rtl.css 749 B
build/nux/style.css 745 B
build/plugins/index.min.js 1.81 kB
build/preferences-persistence/index.min.js 2.06 kB
build/preferences/index.min.js 2.89 kB
build/preferences/style-rtl.css 715 B
build/preferences/style.css 715 B
build/primitives/index.min.js 829 B
build/priority-queue/index.min.js 1.54 kB
build/private-apis/index.min.js 994 B
build/react-i18n/index.min.js 630 B
build/react-refresh-entry/index.min.js 9.47 kB
build/react-refresh-runtime/index.min.js 6.76 kB
build/redux-routine/index.min.js 2.69 kB
build/reusable-blocks/index.min.js 2.72 kB
build/reusable-blocks/style-rtl.css 256 B
build/reusable-blocks/style.css 256 B
build/rich-text/index.min.js 10.1 kB
build/router/index.min.js 1.95 kB
build/server-side-render/index.min.js 1.94 kB
build/shortcode/index.min.js 1.4 kB
build/style-engine/index.min.js 2.01 kB
build/token-list/index.min.js 579 B
build/url/index.min.js 3.85 kB
build/vendors/react-dom.min.js 42.8 kB
build/vendors/react-jsx-runtime.min.js 560 B
build/vendors/react.min.js 2.65 kB
build/viewport/index.min.js 965 B
build/warning/index.min.js 250 B
build/widgets/index.min.js 7.19 kB
build/widgets/style-rtl.css 1.16 kB
build/widgets/style.css 1.16 kB
build/wordcount/index.min.js 1.03 kB

compressed-size-action

@noisysocks
Copy link
Member Author

Rebased this and fixed the tests. It's ready for review.

I tried simple keyword matching as an alternative algorithm for ranking results. This is where you award 1 point per keyword that the title and search term have in common. It didn't work as well for me in testing, though, because cosine similarity will give smaller titles that are similar to the query an edge over long titles that contain lots of irrelevant words in addition to the query.

The simpler approach achieves good enough results so I am happy to switch to it if cosine similarity is too difficult to understand or more than we need. Let me know, I won't be offended.

Copy link
Member

@ramonjd ramonjd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been testing this with lots and lots of taxonomy terms, posts and pages in English, German and Japanese.

It's hard to see the benefit of these changes at first — trunk behaves mostly the same until you have 100s of pages/posts with similar keywords.

Here's me looking for "Paris"

Kapture.2024-06-13.at.14.37.58.mp4

Great stuff - overall I think it's a big improvement to have tags/cats surfaced this way.

packages/core-data/src/fetch/fetch-link-suggestions.ts Outdated Show resolved Hide resolved
@ramonjd
Copy link
Member

ramonjd commented Jun 13, 2024

The simpler approach achieves good enough results so I am happy to switch to it if cosine similarity is too difficult to understand or more than we need. Let me know, I won't be offended.

Chat-GPT and "ELI5" set me straight 😄

But it is very clever, so I paused at whether this will be accessible to folks who want to iterate on the feature.

Is the "simpler" approach less code? Does it perform as well? Easier to read?

If the answer is "yes" to these questions I'd probably consider using the simpler approach, or at least stating a convincing reason to go with cosine similarity, e.g., it produces much better results more of the time.

What do you think?

Copy link
Contributor

@andrewserong andrewserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here, and thanks for the detailed explanations for how this works, and for the test wp cli commands 👍

It didn't work as well for me in testing, though, because cosine similarity will give smaller titles that are similar to the query an edge over long titles that contain lots of irrelevant words in addition to the query.

IMO I think the advantage of smaller titles that more directly match the query could be worth it, especially if you wind up having a simple category like Travel where you'd expect it to be at the top. With this PR applied, it's working very nicely for me:

image

Whereas on trunk I get pages first of all, and categories and tags are way down at the bottom of the list.

Trunk top of list Trunk bottom of list
image image

Just left a few questions, but overall I think this is a big improvement, and I also like the idea of stabilising the API for it. The function has been around for a long time and it seems general purpose enough to be useful in situations outside of Gutenberg to me (i.e. I could imagine a plugin in an admin area wanting a quick way of fetching link suggestions).

Would it be worth getting a second opinion / see if anyone objects to the more complex cosine similarity approach?

packages/core-data/src/fetch/fetch-link-suggestions.ts Outdated Show resolved Hide resolved
packages/core-data/src/fetch/fetch-link-suggestions.ts Outdated Show resolved Hide resolved
packages/core-data/src/fetch/fetch-link-suggestions.ts Outdated Show resolved Hide resolved
@noisysocks

This comment was marked as outdated.

@ramonjd

This comment was marked as outdated.

@noisysocks
Copy link
Member Author

I tried a scoring approach where we simply divide the number of tokens in the title that are also in the search query, divided by the total number of tokens in the title. This means shorter titles receive higher scores, fixing the problem I noted in #62397 (comment).

It seems to work well enough in my testing and is much simpler to understand. I'd appreciate if you can test it with various search terms, etc. again though 🙂

@noisysocks
Copy link
Member Author

The more I look at the API of fetchLinkSuggestions the less I like it 😅 so I'm going to keep it experimental in this PR and come back to stabilising it / cleaning it up in a follow-up PR.

@andrewserong
Copy link
Contributor

This is still testing nicely for me after the latest change 👍

Keep experimental for now

Is this since we're still iterating on the logic? Sounds reasonable to defer stabilising it for now.

@noisysocks
Copy link
Member Author

Updated PR description. This is altogether a much simpler PR now 😅

Is this since we're still iterating on the logic? Sounds reasonable to defer stabilising it for now.

No but that's not a bad reason either.

Copy link
Contributor

@andrewserong andrewserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still testing nicely for me, and I like that the logic is simpler, easier to read and to maintain. Not for now, but another potential future enhancement could be to look at partial matches with tokens / fuzzy search, too, as I need to type the whole word "travel" before the travel category gets to the top of the list:

"trave"

image

"travel"

image

This is already a big improvement, though, so just jotting this down as a thought, nothing to worry about for now.

LGTM! 🚀

Copy link
Member

@ramonjd ramonjd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran through similar tests as before. 500+ database entities with different languages.
Tags and categories are surfaced as expected. Snappy results.

Kapture.2024-06-14.at.16.22.11.mp4

Very nice. 🚢

@noisysocks noisysocks enabled auto-merge (squash) June 14, 2024 06:28
@noisysocks noisysocks merged commit 18676a8 into trunk Jun 14, 2024
66 of 67 checks passed
@noisysocks noisysocks deleted the fix/link-suggestions branch June 14, 2024 06:30
@github-actions github-actions bot added this to the Gutenberg 18.6 milestone Jun 14, 2024
@noisysocks
Copy link
Member Author

noisysocks commented Jun 14, 2024

Thanks for bearing 🐻 with me while I ventured unnecessarily deep into rabbit 🐰 holes!

patil-vipul pushed a commit to patil-vipul/gutenberg that referenced this pull request Jun 17, 2024
…2397)

* Update fetchLinkSuggestions to sort results by relevancy

- Rename __experimentalFetchLinkSuggestions to fetchLinkSuggestions.
- Rewrite fetchLinkSuggestions in TypeScript.
- Sort results by relevancy using cosine similiarty between term frequency vectors.

* Fix tsc errors

* Update @wordpress/core-data imports

* Make tokenize unicode aware

* Remove unnecessary mutation

* Add tests for all the helper functions

* Simpler scoring function

* Keep experimental for now

Unlinked contributors: scrobbleme.

Co-authored-by: noisysocks <[email protected]>
Co-authored-by: ellatrix <[email protected]>
Co-authored-by: ntsekouras <[email protected]>
Co-authored-by: ramonjd <[email protected]>
Co-authored-by: andrewserong <[email protected]>
Co-authored-by: skorasaurus <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Block] Navigation Affects the Navigation Block [Feature] Link Editing Link components (LinkControl, URLInput) and integrations (RichText link formatting) No Core Sync Required Indicates that any changes do not need to be synced to WordPress Core [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Link suggestions in editor “never” shows taxonomies
6 participants